Following are some hints for using Data Explorer more effectively, debugging visual programs, and using memory efficiently.
One of the most useful tools for debugging visual programs is the Print module. For example, if you are getting an error from a module that a particular field is inappropriate for processing, you can print out the object to see if it is what you expect it to be. Print can be used to see the structure of and data values in any object. The options parameter is used to set the level of detail printed: the default "o" prints just the top level object: for example "Field with 4 (four) components." If options is set to "r," more information about each component is printed, for example, how many items in each component, and the data type. You can also print out some or all of the values in the components.
The output of Print appears in the Message window.
There are two ways to stop execution of a visual program:
In addition, modifying a visual program (for example by disconnecting an arc or adding a new tool) will cause execution to stop after the currently executing module.
If you find yourself "lost" in the Image window; for example, you have a black picture and don't know where your data object is, you can always "reset the camera" by using the Reset Camera option in the View Control dialog box of the Image window. This zooms out so that you can see all of your object, from a "front and center" view.
It is also often helpful to use ShowBox to display the bounding box of your entire data set. Collect this with the rest of your visualization, and then you will be able to see how the part you are looking at relates to the entire data set.
Image, Display, and Render all render an object (i.e. create an image).
Render, given an object and a camera, creates as output an image. This image can be sent directly to Display for display to the screen, sent to WriteImage to be written to a file, or collected with other images into a single window using Arrange.
Display, given an object and a camera, both renders the object (using the camera) and displays it to the screen.
Display, given only an image, simply displays it to the screen.
Image, given an object, renders it and displays it to the screen. The camera information is provided via direct interactors (rotate, zoom, etc.) or through the camera mode option in the View Control dialog box. Image has two outputs: the object to be rendered (including any AutoAxes that may have been added via menu choices) and the camera used.
You would use Render if you needed the image itself, for example, for the Arrange or Filter modules, or if you wanted to use WriteImage. (For the Image tool, the WriteImage function is available through the Save Image and Print Image commands in the Image window, or through the hidden recordEnable, recordFormat, and recordFile parameters to the Image tool).
You would use Display without a camera if your object is already an image, and you simply want to display it. You do not need (or want) to render it. You would also use Display without a camera to display a set of Arranged images.
You would use Display with a camera if you wanted to directly control the camera, for example, for a computed fly-through path. You would also use Display if you wanted to define your own direct interaction modes (see SuperviseWindow and SuperviseState in IBM Visualization Data Explorer User's Reference), rather than using the predefined direct interaction modes of the Image tool.
Data Explorer uses an object cache to store intermediate results of modules. Caching systems are intended to fill up and then reclaim memory by throwing things out of the cache. The size of the cache defaults to a large percentage of the physical memory on the machine. You can control the size of the cache with the -memory command line option to the dx command. The minimum cache size needed is on the order of the maximum amount of memory required for a program execution.
The Data Explorer "executive" schedules module execution. It does detailed graph analysis, implements distributed processing of the modules, and implements the Switch and Route modules. It also provides optimization by caching the intermediate outputs of modules. For example, if you run Import twice in a row with the same inputs, Import will not actually run the second time, and instead the executive will use the cached output from the previous execution. The Image and Display tools also cache their images internally.
To implement the caching scheme, Data Explorer will allocate memory up to some fixed size. This memory is referred to as the arena. When the arena fills up and more memory is required, Data Explorer looks for objects to discard from the cache. When it does this it may mean that subsequent executions will have to execute larger portions of the program.
The arena is of fixed size for any one instance of Data Explorer. The size of this arena is chosen by default based on the size of the physical memory in the system.
For some data sets, the default arena size will not be sufficient. In those cases, one can use the -memory option to increase the size of the arena, with the limitation that your can't increase the arena size to be larger than the amount of real plus virtual memory (page or swap space) on your machine. Talk to your system administrator if you think you need to increase the amount of swap space on your system.
If, after using the -memory option as described above, you find you still lack sufficient memory to perform your visualization, there are a number of strategies that can be used to reduce the amount of memory that is required by your program.
A common mistake is to render image data (i.e. 2-dimensional grids) using Render, Image, or Display with a camera input. This results in Data Explorer interpreting the image as a very large number of quads, in which case much memory and CPU is used.
Instead, one can AutoColor or Color the image and pass it directly to Display (without a camera input), or for even more memory savings, convert the data to unsigned bytes (see below) and AutoColor or Color the data with delayed colors (see below).
If you are coloring your objects (using AutoColor, AutoGrayScale, or Colormap/Color), you might want to use "delayed" colors.
To do this, convert the data component to unsigned bytes and set the "delayed" parameter of the coloring module to 1. Using delayed colors means that rather than a 3-vector being used for each data point, a single scalar byte is used to index into a color table with 256 entries.
If you are using ReadImage, you may want to set the DXDELAYEDCOLORS environment variable. See ReadImage in IBM Visualization Data Explorer User's Reference.
In many cases it may be acceptable to convert your data components to smaller sized types using Compute. For example, you might change your floating point data to bytes. This has the advantage that all downstream modules will require less memory.
When working with series data, if you are importing the entire series and then selecting members out of the series, it may be that your program can be changed so that you only import one member at a time. Do this using the start and end parameters to Import.
This reduces memory requirements by not having the whole series in memory at once.
If you are using glyphs (AutoGlyph or Glyph), you may want to use less "spiffy" glyphs. A less spiffy glyph is one that has fewer positions and connections (facets), and therefore consumes less memory. To use less spiffy glyphs, use the type parameter of either AutoGlyph or Glyph, and set it to "speedy" or to a small fraction of 1.
If you can sacrifice resolution in your data set, you may want to use the Reduce module (usually just after Import) to reduce the number of points in your data set. Reduce filters the data set before reducing the number of points. Remember that it is of little use to process 5000x5000 points if your final image is only 1000x1000 pixels.
You can create 24-bit images (instead of the default 96-bit images) by setting the environment variable DXPIXELTYPE to DXByte. See ReadImage and Render in IBM Visualization Data Explorer User's Reference.
In general, it is not necessary to change how the executive caches intermediate results. However, in a few cases, it may be advantageous to do so. For example, if you are reading a live data feed into your program, it is probably not necessary to cache the downstream outputs.
You can change how and if the executive caches intermediate output values by opening the Configuration dialog box of a module and changing the option menu to the right of each output. You can also choose Output Cacheability from the Edit menu of the VPE, and set the cacheability of a group of modules, show the cacheability of a group of modules, or ask Data Explorer to use a heuristic to automatically optimize the caching for the current visual program.
In general, it is most efficient to cache only the results of the last module in a single file line of modules; for example to cache the output of Isosurface, but not Import. Note that if you do this, however, if you need to change the isosurface value, the data file will need to be reimported, slowing execution.
If you want to turn off caching altogether you can use the -cache off command-line option to Data Explorer.
Some modules use the caching system to cache their own data. The Display and Image tools are such tools. When using software rendering, they cache the images they display in the X windows. This is an optimization that can be seen when using the Sequencer. When this tool starts repeating itself (in loop or palindrome mode), the images are displayed much faster. That is, Display (or Image) is pulling them out of the cache instead of rerendering the input objects each time. You can observe this effect by running the example program MovingCamera.net with software rendering.
Most of the time this caching behavior is desirable, but in some cases it is better turned off. To do that, use the Options module to add a "cache" attribute with the integer value of 0 (zero), as follows:
o = Options(o, "cache", 0); Display(o, camera);
The Image tool's Configuration dialog box has an option menu that lets you control its caching. This can be useful when one is running a batch job to generate an animation in which none of the frames will be displayed a second time.
Note that the -cache off command line option mentioned above has no effect on the internal caching that modules themselves perform.
Note: You can use the Data Explorer command line option -optimize memory, which will automatically set the DXDELAYEDCOLORS and DXPIXELTYPE environment variable to the options that consume the least memory. The alternative is -optimize precision.
Except where noted in the architecture-specific README (in /usr/local/dx), by default Data Explorer will be allowed to grow to use all but 8 megabytes of the physical memory when there is less than 64 megabytes of physical memory.
If there are more than 64 megabytes of physical memory, then Data Explorer will, by default, be allowed to grow to 7/8 of the amount of physical memory.
Users may wish to alter this default amount of memory by using the -memory option to the dx command, or the "Memory" field of the Connect to Server Options dialog box.
Since it is possible for Data Explorer to use a large amount of virtual memory, users should configure systems with paging space at least two or three times the total physical memory in their system.
If you do not have enough paging space, the operating system may kill Data Explorer (or other processes), sometimes without warning, depending on the architecture. Your system administrator can increase your paging space.
Some systems may enforce per process limits on such things as data segment size, stack size and so forth. These may need to be adjusted to run Data Explorer with large amounts of memory to avoid paging. Your system administrator can adjust your per-process limits.
[ OpenDX Home at IBM | OpenDX.org ]