Sunday, September 1, 2013

Conductor and Compute Nodes in Datastage - II


DataStage job development is platform independent  and job execution completely relies on parallel configuration file which can be set as  APT_CONFIG_FILE for each or set of jobs. This configuration file provides mapping of real resources/infrastructure which can be used to execute the DataStage Job at run time based on these logical processing nodes.
Information Server parallel framework is based on process based architecture which can scale up by adding more resources to it and making changes in this configuration file to use the new infrastructure.

Every DataStage job starts (Refer APT_CONFIF_FILE) multiple processes and based on sample configuration file it will start
1 Conductor  Process (Started on Conductor Node)
4 Section leaders (4 nodes * 1 section leader per node) Create & manage player player process.
8 Player processes (2 stages * 4 nodes) Perform real execution, process value changes based on optimization

We should look at Dump Score for details and based on sample configuration file DataStage Job creates 13 processes. 
Conductor Node meant for Job Start-up, assigning resources, Create Section leader, Coordinate among various processes and status and even stopping all processes in the event of failure.
Primary Process which gets triggered when DataStage job starts is called conductor and reads the job design and parallel execution configuration file specified to start a coordinating process called a “section leader” for each node. Each section leader based on score (number of Stages) triggers separate process called “player”.
Communication between the conductor, section leaders and player processes in a parallel job is effected via TCP.

-Ritesh 
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

8 comments:

  1. this is very useful information and it is used for datastage learners.123trainings also provides data stage online training.

    ReplyDelete
  2. Hi Ritesh,

    In what situations, if any, the conductor and section leader processes are not created? How many processes will be there in the above example you provided if the job is run on a SMP system or a MPP system. Does it matter?

    Thanks,
    Ambadas.

    ReplyDelete
  3. Thank you provide valuable informations and iam seacrching same informations,and saved my time SAS Online Training

    ReplyDelete
  4. Great Information admin thanks For Your Information and Any body wants
    learn SAS through Online for Details Please go through the LinkSAS(Statistical Analysis System) Online Training 

    ReplyDelete
  5. I was reading your blog this morning and noticed that you have a awesome
    resource page. I actually have a similar blog that might be helpful or useful
    to your audience.

    Regards
    sap sd and crm online training
    sap online tutorials
    sap sd tutorial
    sap sd training in ameerpet

    ReplyDelete
  6. I really enjoy the blog.Much thanks again. Really Great.
    Very informative article post.Really looking forward to read more. Will read on…


    sap online training
    software online training
    sap sd online training
    hadoop online training
    sap-crm-online-training

    ReplyDelete
  7. I appreciate you sharing this article. Really thank you! Much obliged.
    This is one awesome blog article. Much thanks again.


    oracle online training
    sap fico online training
    dotnet online training
    qa-qtp-software-testing-training-tutorial

    ReplyDelete