User's Guide - Execution Model

Chapter 4. Data Explorer Execution Model

Partial Table-of-Contents

4.1 Data Flow

4.2 Iterative Execution and Caching of Intermediate Results

4.3 Conditional Execution: Route and Switch

4.4 Iteration using the Sequencer

4.5 Iteration using Looping

4.6 Preserving Explicit State

4.7 Advanced Looping Constructs

4.8 External Asynchronous Data Sources

4.9 Parallelism using Distributed Processing

4.10 Parallelism for Data Explorer SMP

Data Explorer's execution model is based on the data flow concept. However, features are provided that extend the data flow concept to allow you to create a visual program that could not be supported by simple data flow. For example, there are tools that allow you to explicitly save partial results of a visual program to be used in a subsequent execution. Data Explorer also provides you with various tools to control the flow of execution of your visual programs. Most of these tools are analogous to constructs found in commonly used programming languages. For example, tools are provided that perform the function of IF statements, CASE statements and FOR (or DO) loops. This chapter discusses flow-control tools (with several examples of their use) both individually and in combination.

Although it is not necessary to understand all the details of the Data Explorer execution model in order to build and run visual programs, you may find it helpful as you build visual programs. Other topics in this chapter include caching of intermediate results, conditional execution using the Route and Switch modules, iteration using the Sequencer, simple iteration using the looping modules, preservation of state using the pairs GetLocal/SetLocal or GetGlobal/SetGlobal, creating advanced looping constructs using combinations of tools, asynchronous data sources, and parallelism. The tools that control the flow of execution of a visual program are found in the Data Explorer category Flow Control.

4.1 Data Flow

In a true data-flow implementation, all modules are pure functions (i.e., their outputs are fully defined by their inputs). Hence, processes are stateless with no side effects. When a module's inputs are received, it runs, and when finished it distributes its results to modules waiting downstream. Note that in Data Explorer, results are communicated between modules by passing pointers to data objects, not by copying. Of course, when running in distributed mode or when using outboard modules, data must be sent by socket since the processing may occur on another host.

Consider the example illustrated in Figure 18.

Figure 18. Example 1

The Collect module waits for inputs from the Isosurface and MapToPlane modules. Import would send its results to the waiting Isosurface and MapToPlane modules. In effect, this execution model is entirely data-driven and top-down: the execution of modules is dependent solely on the passage of data through the system.

While this simple data-flow execution model seems a natural mechanism for the execution of visual programs, a closer examination reveals that real-world problems are more complex. In order to function efficiently, it is vital that the system avoid unnecessary work. In general, there are two reasons why modules present in a visual program may not need to be executed when their turn comes: 1) their results are not actually required by a result of the network and 2) their inputs are unchanged from the last time the module was executed (i.e. the result will be the same).

The outputs of a visualization network occur in modules that have side effects. They produce results outside of the visual program itself such as the display of images on a workstation or the creation of output files. Unless the result of a module ultimately affects a module that produces a side effect, that module does not need to be executed.

Eliminating modules that are not ancestors (i.e., not upstream) of modules with side effects is done in Data Explorer by preprocessing the network before the actual data-flow network evaluation commences. This is done by traversing the graph bottom-up, beginning at each module known to have side effects and flagging each module as it is encountered. Once this is complete, modules that have not been flagged do not have to be executed.

Note that the exact order in which modules will be executed cannot be controlled by the user; for example, modules in two parallel branches may execute in any order with respect to one another; it is only guaranteed that a module that depends on the results of one or more modules will wait for them to be complete before it is executed.

[ OpenDX Home at IBM | OpenDX.org ]