Tuesday, September 27, 2011

Using InfoSphere DataStage to run SAS code

InfoSphere DataStage lets SAS users optimally exploit the performance potential of parallel relational database management systems (RDBMS) running on scalable hardware platforms. InfoSphere DataStage extends SAS by coupling the parallel data transport facilities of InfoSphere DataStage with the rich data access, manipulation, and analysis functions of SAS.
While InfoSphere DataStage allows you to execute your SAS code in parallel, sequential SAS code can also take advantage of InfoSphere DataStage to increase system performance by making multiple connections to a parallel database for data reads and writes, as well as through pipeline parallelism.
The SAS system consists of a powerful programming language and a collection of ready-to-use programs called procedures (PROCS). This section introduces SAS application development and explains the differences in execution and performance of sequential and parallel SAS programs.
Writing SAS Programs
You develop SAS applications by writing SAS programs. In the SAS programming language, a group of statements is referred to as a SAS step. SAS steps fall into one of two categories: DATA steps and PROC steps. SAS DATA steps usually process data one row at a time and do not look at the preceding or following records in the data stream. This makes it possible for the sas operator to process data steps in parallel. SAS PROC steps, however, are precompiled routines which cannot always be parallelized.
Will discuss SAS to DS and DS to SAS in next 2 series
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

No comments:

Post a Comment