Friday, September 30, 2011

InfoSphere DataStage Job What is it?

In ETL world Flow/Job/Process all points to combination of various data processing modules in integrated form. IBM InfoSphere DataStage job is no different. It consists of stages linked together on its design canvas named Designer which describe the flow of data from a data source to a data target. Stage usually has at least one data input and/or one data output. Some stages can accept more than one data input, and output to more than one stage. These stages in simple terms support multiple input or output links. Each stage has a set of predefined and editable properties which might include the file name for the Sequential File stage, the columns to sort, the transformations to perform, and the database table name for the DB stages. 

Stages and links can be grouped in a shared container and Instances of the shared container can then be reused in different parallel jobs. A local container within a job can also be defined by grouping stages and links into a single unit, though limit its usage within the Job where it is defined. Different Business Logic requires to use different set of stages. Stages available on canvas varies from General, Database, Development, Processing and so on...Set of these stages used to define a flow and this flow is called Job.

Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

No comments:

Post a Comment