QuickStart Guide - Importing Data

Chapter 5. Importing Data

Partial Table-of-Contents

5.1 General Array Importer

Describing the Data

Creating a Header File

Some Notes on General Array Importer Format

5.2 Importing Data: Header File Examples

Record Style: Single-Variable Data

Record Style: Multivariable Data

Columnar Style

5.3 Header File Syntax: Keyword Statements

end

Simplified Data Prompter

Full Data Prompter

5.5 Data Prompter Browser

5.6 Using the Header File to Import Data

Importing data into Data Explorer is the first step in creating a visualization of that data. Data Explorer supports the importation of a number of data formats: General Array Importer, Data Explorer native, CDF, netCDF, and HDF (see Appendix B. "Importing Data: File Formats" in IBM Visualization Data Explorer User's Guide). The General Array Importer is discussed here not only because it can import a variety of data types but because its supporting interface makes it useful to the broadest range of users. This interface consists of the Data Prompter, for describing the data to be imported, and the Data Browser, for viewing the data.

This chapter deals with the importation of data in the following sections:

An Important Note on Fields

An Important Note on Fields
Importing data into Data Explorer requires some knowledge of the Data Explorer data model and at least a working knowledge of a field. Fields are the fundamental objects in the Data Explorer data model. A field represents a mapping from some domain to some data space. The domain of the mapping is specified by a set of positions and (generally) a set of connections that allow interpolation of data values for points between positions. Positions represent what can be thought of as (and often really are) locations in space; the data are the values associated with the space of the positions. The mapping at all points in a domain (not just those specified by the given positions) is represented implicitly by specifying that the data are dependent on (located at) the sample points or on the connections between points. This simple abstraction is sufficient for representing a wide range of information. For example, you can describe 3-dimensional volumetric data whose domain is the region specified by positions and whose data space is the set of values associated with those positions. The domain of a 2-dimensional image on a monitor screen is a set of pixel locations, and the data space consists of the pixel color. For 2-dimensional surfaces imbedded in 3-dimensional space (e.g., traditional graphical models) the domain may be a set of positions on the surface, and the data space a set of data values on that surface. In Data Explorer the positions and data are said to be components of a field, and every field must contain at least a "positions" component and a "data" component. Fields may also contain other components (e.g., "connections"). Thus a Data Explorer field consists of data and the additional components needed to describe that data so that Data Explorer can process it. `(cont.)`

Importing data into Data Explorer requires some knowledge of the Data Explorer data model and at least a working knowledge of a field.

Fields are the fundamental objects in the Data Explorer data model. A field represents a mapping from some domain to some data space. The domain of the mapping is specified by a set of positions and (generally) a set of connections that allow interpolation of data values for points between positions. Positions represent what can be thought of as (and often really are) locations in space; the data are the values associated with the space of the positions. The mapping at all points in a domain (not just those specified by the given positions) is represented implicitly by specifying that the data are dependent on (located at) the sample points or on the connections between points.

This simple abstraction is sufficient for representing a wide range of information. For example, you can describe 3-dimensional volumetric data whose domain is the region specified by positions and whose data space is the set of values associated with those positions. The domain of a 2-dimensional image on a monitor screen is a set of pixel locations, and the data space consists of the pixel color. For 2-dimensional surfaces imbedded in 3-dimensional space (e.g., traditional graphical models) the domain may be a set of positions on the surface, and the data space a set of data values on that surface.

In Data Explorer the positions and data are said to be components of a field, and every field must contain at least a "positions" component and a "data" component. Fields may also contain other components (e.g., "connections"). Thus a Data Explorer field consists of data and the additional components needed to describe that data so that Data Explorer can process it.

(cont.)

An Important Note on Fields (cont.)
Components are represented as arrays of numbers with some auxiliary information specifying attributes (e.g., type of data dependency). The syntax of defining fields in the General Array format is described in 5.3 , "Header File Syntax: Keyword Statements". The various components are described in IBM Visualization Data Explorer User's Guide.

5.1 General Array Importer

Describing the Data

To import data through the General Array Importer, you must be able to answer the following questions.

What are the independent and dependent variables? For example, if temperature and wind velocity are measured on a latitude-longitude grid, then latitude and longitude are the independent variables, temperature and wind velocity the dependent variables. In the case of resistance measurements versus the voltage applied to a semiconductor, voltage is the independent variable and resistance the dependent variable.

Components and Variables
In Data Explorer terminology, the values of the independent variable constitute the "positions" component of a data field. In the examples above, the first independent variable consists of locations in space and the second does not, but both would be represented as "positions" in a data field. The independent variable is always represented by the "positions" component. The values of the dependent variable constitute the "data" component.

Components and Variables

In Data Explorer terminology, the values of the independent variable constitute the "positions" component of a data field. In the examples above, the first independent variable consists of locations in space and the second does not, but both would be represented as "positions" in a data field. The independent variable is always represented by the "positions" component.

The values of the dependent variable constitute the "data" component.

What is the dimensionality of the positions and data components? In the first example above, latitude and longitude are represented by 2-dimensional positions, the temperature by scalar data, and the wind velocity by 2- or 3-dimensional vectors. In the second example, voltage is represented by 1-dimensional positions and the resistance by scalar data.
How is the independent variable (the set of positions) to be described? By a regular grid (which can be completely described by an origin and a set of deltas) or by an explicit list (which may or may not be part of the data file)? For example, data measurements might be on a grid of 1-degree increments in latitude and 5-degree increments in longitude; voltage levels might be a set of unrelated values stored with the resistances in the data file.
How are the positions connected to one another, if they are connected? For example, a regular grid of positions might be connected by a regular grid of connections (lines, quads, or cubes). The connections specify how data values should be interpolated between positions. Positions that are explicitly specified (i.e., not regular) can also be connected by a regular grid of connections (e.g., if the grid is deformed, or warped). See Figure 11.
Figure 11. Examples of Grid Types. The three grids in the top row represent surfaces; those in the bottom row, volumes. Reading from left to right, the three types of grid are: irregular (irregular positions, irregular connections), deformed regular (irregular positions, regular connections), and regular (regular positions, regular connections),

Figure 12. Examples of Data Dependency. In the visualization on the left, data correspond one-to-one with positions. Other data values (and colors) are interpolated linearly between positions. In the visualization on the right, the elements connecting positions are quads. Data (and colors) correspond one-to-one with, and are constant within, each quad.

Note: The General Array Importer supports only regular connections (lines, quads, and cubes) or scattered data. For irregular connections such as triangles or tetrahedra, you can use the Data Explorer native format to import your data. (See IBM Visualization Data Explorer User's Guide.)
What is the format of the stored data values, ASCII or binary? Are they floating point, integer, signed or unsigned byte, etc.?
Are the data dependent on "positions" or on "connections"? That is, are the data values associated one-to-one with positions or with the connections between positions? See Figure 12. (Data associated with connections are often referred to as "cell-centered.") With position-dependent data, values between positions are interpolated within the connection element. With connection-dependent data, values are assumed to be constant within the connection element.
Do these data values represent "series data" or do they constitute only a single frame of data? In the example of resistance levels versus voltage, data may exist for each of a number of different doping levels. Each doping level could be considered a single data field and the collection treated as a series.
Is the data in "record" or "spreadsheet" style? (See Figure 14.)
If the data are on a grid, what is the order of the data items with respect to the grid? Is it column majority (first index varies fastest) or row majority (last index varies fastest)? (See Figure 13.)
What kind of embedded text (comments, etc.) in the data file must be "skipped" when the data values are read?

With the answers to these questions, you can now use the General Array Importer to describe your data.

Figure 13. Row- versus Column-Majority Grids. The two grids shown here are generated from the same data file, consisting simply of the numbers 1, 2, 3, ..., 20. The associated header files differ only in the specification of the grids' majority.

Creating a Header File

The General Array Importer uses a "header file" to describe the structure and location of data to be imported. This file consists of keyword statements that identify important characteristics of that data (including grid structure, format, and data type, along with the path name of the file containing the data).

A header file can be created with a text editor or, more easily, with the Data Prompter, which prompts for the necessary information. (See 3.3 , "Importing Data" for an example that uses the Data Prompter and 5.4 , "Data Prompter" for a detailed description of how to use it.) The Data Prompter also checks for incorrect syntax, such as conflicting keywords (see 5.3 , "Header File Syntax: Keyword Statements").

Once a header file has been created, the data it describes can be imported into Data Explorer by the Import module. To identify a header file to Data Explorer through the Import dialog box:

Enter the path name of the header file in the name parameter field.
Enter "general" in the format parameter field. (If the file has the extension ".general," it is not necessary to specify the format to Import. Header files created with the Data Prompter are automatically given this extension.)

Some Notes on General Array Importer Format

The General Array Importer imports ASCII or binary data that is organized in one of two general "styles": block or columnar. Block style requires that the data be organized in records, or blocks. Columnar style requires that the data be organized in vertical columns (see Figure 14).

Figure 14. Block and Columnar Styles of Data Organization. The three horizontal data blocks at left illustrate the block style; the three vertical columns at right, the columnar style. A, B, and C represent separate variables.

The following set of FORTRAN I/O statements generate a record-style data file:

write(15,20) A(i),i=1,100
write(15,20) B(i),i=1,100
write(15,20) C(i),i=1,100
 20  format(10(f10.3))

An equivalent example in C is shown on the next page.

for(i=0; i<100, i++) printf("%10.3f",A[i]);
for(i=0; i<100, i++) printf("%10.3f",B[i]);
for(i=0; i<100, i++) printf("%10.3f",C[i]);

The following FORTRAN I/O statement produces a columnar-style data file:

write(15,10) (A(i),B(i),C(i),i=1,100)
 10 format(3(2x,f10.3))

An equivalent example in C is:

for (i=0; i<100; i++)
  printf("  %10.3f  %10.3f  %10.3f\n",A[i],B[i],C[i]);

For both the block and columnar styles, the information in the file can be positions as well as data. The data can be:

scalar or vector
a time series
gridded or scattered (for gridded data the grid structure can be regular or warped, but the connection elements must be regular; i.e., lines, quads, or cubes)
position dependent (associated with the grid positions) or connection dependent (associated with the grid connections).

[ OpenDX Home at IBM | OpenDX.org ]