What is Node with in InfoSphere DataStage

In a Grid environment a node is the place where the jobs are executes. Nodes are like processors , if we have more nodes when running the job , the performance will be good to run parallel to make the job efficient. In simple terms nodes is the concept used for InfoSphere DataStage for Parallelism. Many nodes does not mean better performance. You need to find optimal number based on requirement. The degree of parallelism of a DataStage Job is determined by the number of nodes that is defined in the Configuration File, for example, four-node, eight –node etc. A configuration file with a larger number of nodes will generate a larger number of processes and will in turn add to the processing overheads as compared to a configuration file with a smaller number of nodes. Therefore, while choosing the configuration file one must weigh the benefits of increased parallelism against the losses in processing efficiency (increased processing overheads and slow start up time).Ideally , if the amount of data to be processed is small , configuration files with less number of nodes should be used while if data volume is more , configuration files with larger number of nodes should be used. 


-Ritesh

Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions