Saturday, October 29, 2011

DataStage Jobs on Single Processor and Multiple Processor Systems

The default behavior when compiling DataStage jobs is to run all adjacent active stages in a single process. This makes good sense when you are running the job on a single processor system. When you are running on a multi-processor system it is better to run each active stage in a separate process so the processes can be distributed among available processors and run in parallel. It can be achieved either by inserting IPC stages between connected active stages or by turning on inter-process row buffering either project wide (using theDataStage Administrator) or for individual jobs (in the Job Properties dialog box)The IPC facility can also be used to produce multiple processes where passive stages aredirectly connected. This means that an operation reading from one data source and writing to another could be divided into a reading process and a writing process able totake advantage of multiprocessor systems.
Behavior of Passive Stages
Behavior of Active Stages
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

No comments:

Post a Comment