Conductor and Compute Nodes in Datastage - II


DataStage job development is platform independent  and job execution completely relies on parallel configuration file which can be set as  APT_CONFIG_FILE for each or set of jobs. This configuration file provides mapping of real resources/infrastructure which can be used to execute the DataStage Job at run time based on these logical processing nodes.
Information Server parallel framework is based on process based architecture which can scale up by adding more resources to it and making changes in this configuration file to use the new infrastructure.

Every DataStage job starts (Refer APT_CONFIF_FILE) multiple processes and based on sample configuration file it will start
1 Conductor  Process (Started on Conductor Node)
4 Section leaders (4 nodes * 1 section leader per node) Create & manage player player process.
8 Player processes (2 stages * 4 nodes) Perform real execution, process value changes based on optimization

We should look at Dump Score for details and based on sample configuration file DataStage Job creates 13 processes. 
Conductor Node meant for Job Start-up, assigning resources, Create Section leader, Coordinate among various processes and status and even stopping all processes in the event of failure.
Primary Process which gets triggered when DataStage job starts is called conductor and reads the job design and parallel execution configuration file specified to start a coordinating process called a “section leader” for each node. Each section leader based on score (number of Stages) triggers separate process called “player”.
Communication between the conductor, section leaders and player processes in a parallel job is effected via TCP.

-Ritesh 
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions