Tuesday, September 27, 2011

Some insight on InfoSphere DataStage OSH

Use can create InfoSphere DataStage jobs by logging into its client "Designer". On compilation of the Job its gets converted into parallel job flows, and reusable components that execute on the parallel Information Server engine. These generated flows are meant for extracting, cleansing, transforming, integrating, and loading data into target files, target systems, or packaged applications.For any Parallel Job, InfoSphere DataStage generates the OSH (Orchestrate Shell Script) and C++ code for any Transformer stages used.
To summarize Designer performs the following tasks:
  • Validates link requirements, mandatory stage options, transformer logic, used parameters etc.
  • Generates OSH representation of data flows and stages (representations of framework “operators”).
  • Generates transform code for each Transformer stage which is then compiled into C++ and then to corresponding native operators.
  • Reusable BuildOp stages can be compiled using the Designer GUI or from the command line.
 More on OSH:
  • Comment blocks introduce each operator
  • OSH uses the familiar syntax of the UNIX shell. such as Operator name, schema, operator options (“-name value” format), input (indicated by n<where n is the input#), and output (indicated by the n> where n is the output #).
  • For every operator, input and/or output data sets are numbered sequentially starting from zero.
  • Virtual data sets (in memory native representation of data links) are generated to connect operators.
How it is represented
  • Schema corresponds to table definition
  • Property corresponds to format
  • Type corresponds to SQL type and length
  • Virtual data set corresponds to link
  • Record/field corresponds to row/column
  • Operator corresponds to stage
The real execution order of operators is dictated by input/output designators, and not by their placement on the diagram. The data sets connect the OSH operators. These are “virtual data sets”, that is, in memory data flows. Link names are used in data set names — it is therefore good practice to give the links meaningful names.

-Ritesh 
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

No comments:

Post a Comment