Friday, December 16, 2011

What is Modify Stage

With a Modify stage you can perform
  • null handling
  • date/time manipulation 
  • date/time interval calculation (limited) 
  • string trimming and substring 
  • data type conversion
One major limitation is that Modify stage functions are limited to a single argument - straight away arithmetic and concatenation are ruled out.
As pointed by experience from many you can get bitten by the transformer stage rejecting rows in which some fields were participating in derivations or stage variables and the dang things had NULLS in them. Solving the NULLS problem with IF IsNull() for every single field being evaluated in some way can get overly complex and very messy. Instead put a Modify stage before the Transformer, call the stage MhandleNull and handle_null() for all fields being evaluated in the transformer. This simplifies the already cumbersome IF THEN ELSE syntax of the transformer and/or the stage variables.

Courtesy: Vincent McBurney

The Modify stage is a metadata panel beater. It does just one thing: converts columns from one data type to another. It is quite restrictive, you cannot nest functions or use constants as values. Almost all the functions in the Modify stage are also available in the all rounder Transformer stage. The Transformer is an automatic, the Modify Stage is a manual

Transformers are easy to use, which is one of the reasons why DataStage has been successful as it is the most commonly used stage. When adding derivation / mapping / transformation code into the text entry windows there is a right mouse click menu for finding a list of transformation functions.
Back when the parallel engine was a product called Orchestrate there was a Transform function that worked much the same way as other parallel commands. It required a lot of manual coding. When Ascential turned Orchestrate into parallel jobs they took the transformer stage from Server Edition and sat it over the top of the parallel transform function. This is when the trouble began. The transformer stage and the Transform function didn't quite fit together, so each parallel transformer needs a C++ compilation, a kind of automated custom stage build, even in the latest versions. The message is that made it slower then other parallel stages.
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

No comments:

Post a Comment