You are here

QuickStart Guide for Visualization

Table of Contents

About this QuickStart Guide

If you find yourself with a large amount of data and are scratching your head about what you can do with it, you have come to the right place. This quickstart guide is intended as a high-level overview of what your options are if you find yourself in the following situation:

  1. You have a bunch of data (from a simulation, a sensor, a database, etc.)
  2. You need to understand that data better
  3. You don't know what to do next...

Visualization is the process of making images out of data. Given that our visual system is excellent at extracting patterns and relationships from raw data, visualization is an excellent tool for helping in the understanding of complex data. Most researchers in the Natural Sciences think of visualization as being synonymous with Scientific Visualization, the visualization of data with a physical representation. Researchers from the Humanities are more likely to have abstract data (e.g. raw text, social networks, etc.) where the representation of the data and its relationships have no physical representation. This is the domain of Information Visualization.

It is worth noting that when visualization is being discussed on these pages, we mean much more than simply creating static images of your data. Visualization is an interactive and exploratory process. Our goal, with these pages, is to provide you with the tools to help you extract information and derive knowledge from your data.

There are basically three things you need to determine to effectively visualize your data. You need to consider:

  1. What kind of data you have (the type of data, how much data, where the data is located)?
  2. What software is the most appropriate for the type of data you have?
  3. What hardware is going to be most effective for your visualization?

We try to answer those questions here.

It is all about the data!

The most important thing that drives the tools you use to visualize your data is the data itself. The type of data you have will drive the type of software that you use. We explore this in more detail in the software section below.

The size of your data will determine what type of graphics/visualization hardware you will need. Large datasets may require high end graphics cards or computers with large memory.

The size of your data will also impact how easy it is to move the data. If your data is generated through a computer simulation on a given system (e.g. bugaboo @ SFU) you have several choices:

  1. Visualize the data in-situ on the machine where the data is stored (e.g. bugaboo.westgrid.ca).
  2. Move the data to a special purpose visualization machine (e.g. the WestGrid GPU cluster parallel.westgrid.ca).
  3. Move the data to a local machine to for visualization (e.g. your office workstation, your laptop).

The same holds true if your data is generated by a sensor, is gathered from a database, or is mined from the Web.  How should I visualize this data? Should I visualize locally on my desktop or should I move the data to a WestGrid machine? If the latter, which WestGrid machine is the appropriate machine for me to use?

We explore your options in the hardware section below.

What software do I use?

The type of data you have drives the type of software you use. If you are doing simple graphing or plotting, simple tools such as Gnuplot and Grace might suffice. Many application domains have software that is customized for visualizing data. For example, there are a wide range of molecular visualization packages (such as VMD) that are designed for visualizing molecular structures. For those application domains that do not have custom visualization tools, you will probably need to use a general purpose visualization package (such as ParaView and VisIt). These are powerful, flexible, interactive data exploration tools that can be used in a wide set of circumstances. Last, but not least, if you are coming from the humantites, your visualization task might involve abstract data (such as text or social networks), which would most appropriate for an Information Visualization tool rather than a Scientific Visualization tool. General visualization tools (such as ParaView and VisIt) can be used for Information Visualization, but there are other tools that might be better suited to your needs. In general, WestGrid has more Scientific Visualization tools installed than Information Visualization tools.

As with anything regarding WestGrid, if you have any questions about the right tool to use, you need to have some software installed, and/or you need some help, please email support@westgrid.ca. In addition, the WestGrid software page is an excellent resource for discovering what software is available (search for "Graphics" in the software category).

 What hardware do I use?

As mentioned above, you have three fundamental choices when deciding which hardware to use for your visualization. You can visualize the data in-situ (on the machine where the data resides e.g. bugaboo.westgrid.ca), you can move the data to a specialized visualization machine (e.g. the WestGrid GPU cluster parallel.westgrid.ca), or you can move the data to a local machine for visualization. The choice is primarily driven by the data you have and how interactive you want the visualization to be. The pluses and minuses of each of these approaches are explored below:

Visualizing data In-Situ on a WestGrid machine

In-situ visualization essentially means you are visualizing the data on the machine where the data resides and displaying the results on your desktop. On WestGrid machines, this would typically mean running an X server on your local computer, connecting via SSH to a WestGrid machine (e.g. bugaboo) with X forwarding turned on (see the X Windows connection documentation for details), and running a visualization package on the WestGrid machine (e.g. gnuplot). In this case, the application windows from the remote WestGrid machine would appear on your local desktop. The pluses and minuses of this approach are listed below.

+ Don't have to move the data (e.g. generate the data on bugaboo, visualize from bugaboo)
+ Utilize installed software on the local machine (e.g. you don't have to install any visualization software)
+ Move pixels (images) to the desktop, not data
- Initial setup requires software install on some platforms
- Quite poor performance, not interactive (e.g. good for looking at a static images, not good for data exploration)
- No hardware acceleration, poor performance on all but the smallest data sets.

Although this technique gives you a "quick and dirty" view of your data, because of the inefficiencies of the X protocol over networks with latency it is very difficult to use X windows with any but the simplest of applications. I would only recommend this solution for static visualizations of small data sets.

Visualizating data using specialized WestGrid hardware

WestGrid provides a number of hardware visualization resources for its users. These include the GPU cluster parallel.westgrid.ca with its interactive, remote visualization capabilities (see the remote visualization page for details) and a number of specialized visualization machines installed at some of the WestGrid sites (see the visualization facilities page for details). To use either of these capabilities your data must be transfered to these machines.

Using remote visualization on parallel.westgrid.ca requires a similar process to using the in-situ visualization described above. Using remote visualization means that you are running the visualization software remotely on the WestGrid machine (parallel.westgrid.ca) and displaying the results on your desktop computer. The benefits of this are that you can take advantage of the high-end graphics cards, the extensive visualization software stack, the large number of CPUs, and the large amount of storage available on parallel.westgrid.ca. Like in-situ visualization, you have to run a client (in this case a VNC client) on your workstation, make a tunneled connection to parallel.westgrid.ca, start the VNC server on a GPU node, and run the visualization application. This is a very effective and powerful mechanism for visualizing large data sets (see the remote visualization page for details). The pluses and minuses are summarized below:

+ High performance, interactive visualization even on large data sets
+ Utilize advanced GPUs (don't need your own hardware)
+ Utilize advance visualization software stack (no software installs for you!)
+ Utilize multiple cores if required
+ Utilize storage
+ Move pixels to the desktop, not data
+ Can be used anywhere, on any computer
- Have to move the data to parallel.westgrid.ca
- Initial set up requires software installs
- Connection process is not rocket science, but not trivial either...

Using a visualization system at your local WestGrid institution is an option at some, but not all WestGrid institutions. WestGrid has established visualization labs at some of its partner insitutions (see the visualization facilities page for details). The benefit of using such a system means that you can take advantage of the high-end hardware, advanced software, and local support. The drawbacks of using such a system are that you need to copy your data to the local machine in the lab and you need to physically go to lab at your institution to use the hardware/software. This is in many ways similar to copying the data to your own local workstation except you don't have to buy and maintain the visualization system and you have to go to the lab to use that system. These pluses and minuses are summarized below:

+ High performance, interactive visualization even on large data sets
+ Utilize advanced GPUs (don't need your own hardware)
+ Utilize advance visualization software stack (no software installs for you!)
- Have to move the data to local visualization machine.
- Have to physically go to the visualization lab to use the resource.

Using parallel.westgrid.ca as a remote visualization resource is a very effective mechanism for visualizing large data sets without the hassle of having to purchase, manage, and maintain an advanced visualization workstation in your lab/office (see below). In my experience, remote visualization is approximately equivalent to running the application on your own local computer in terms of interactive performance. Certainly this is the case when you have a good network connection as you should if doing remote visualization between WestGrid insitutions. If you are doing this from home, again, my experience is that the remote visualization can be quite functional, and in many cases more effective than downloading the data and rendering it on my laptops graphics card! Your mileage may vary of course, and your performance will depend heavily on the quality of your home network connection.

Visualizing data on your local machine

Sometimes it is easiest to copy your data to your local office workstation or your laptop for visualization and analysis. The benefit of this approach is that your data and your visualization software is on your local machine. In this regard, performance will only be limited by the power of your computer and the size of your data. If you have large data sets and do not have a high-end graphics card in your computer, then this option is not particularly feasible. At the same time, if your data sets are small and/or if you have a decent graphics card, this may work very well. The benefit of this approach is that you control all aspects of the visualization, giving you the most flexibility (assuming it is practical for you to copy the data to your computer). The down side of this is that you have to manage all aspects of the visualization, including purchasing, setting up, and maintaining all aspects of the visualization system. Sorry, there is no ideal solution! The pluses and minuses of a local workstation for visualization are summarized below:

+ Ease of use, no remote connection mumbo-jumbo (yes, that is a technical term) required
+ High performance if small data and/or high-end graphics card in your computer
- Have to move the data to your local computer
- Have to purchase/install/administer computer and graphics card
- Have to install/maintain software
- Have to be in the office/lab where the computer is located (unless you are using your laptop)

Need Help? Contact Us

We hope that this quickstart guide was useful to you in helping you choose a way to visualize your data. If you have any questions about visualization, please send an email to support@westgrid.ca and provide a description of:

  • the type of data 
  • the size of data
  • any other information that would help us understand what you would like to achieve with your visualization