Parallelizing a SAS data step that executes a SAS DATA step in parallel.
The step takes a single SAS data set as input and writes its results to a single SAS data set as output. The DATA step recodes the salary field of the input data to replace a dollar amount with a salary-scale value.This DATA step requires little effort to parallelize because it processes records without regard to record order or relationship to any other record in the input. Also, the step performs the same operation on every input record and contains no BY clauses or RETAIN statements.
Executing this DATA step in parallel:
- Get the input from a SAS data set using a sequential sas operator;
- Execute the DATA step in a parallel sas operator;
- Output the results as a standard InfoSphere DataStage data set (you must provide a schema for this) or as a parallel SAS data set. You might also pass the output to another sas operator for further processing. The schema required might be generated by first outputting the data to a Parallel SAS data set, then referencing that data set. InfoSphere DataStage automatically generates the schema.