SAS Processing with InfoSphere DataStage - an Example Flow


  



Parallelizing a SAS data step that executes a SAS DATA step in parallel.

The step takes a single SAS data set as input and writes its results to a single SAS data set as output. The DATA step recodes the salary field of the input data to replace a dollar amount with a salary-scale value.This DATA step requires little effort to parallelize because it processes records without regard to record order or relationship to any other record in the input. Also, the step performs the same operation on every input record and contains no BY clauses or RETAIN statements.
Executing this DATA step in parallel: 
 
The SAS operator can then do one of three things: use the sasout operator with its -schema option to output the results as a standard InfoSphere DataStage data set, output the results as a Parallel SAS data set, or pass the output directly to another sas operator as an SAS data set. The default output format is SAS data set. When the output is to a Parallel SAS data set or to another sas operator, for example, as a standard InfoSphere DataStage data set, the liborch statement must be used.
 
-Ritesh 
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions