Tuesday, December 13, 2011

How to detect Last Row in Group within DataStage

When Data  is grouped on a column, and also is sorted on this column, you can detect when you process the last row before the value in that column changes. The data must be sorted in the job, so that the Transformer stage can detect that it is sorted. If the data is sorted before being input to the job, you can include a dummy Sort stage and set the stage's Sort Key property to Don't Sort (previously sorted - as always) for every column on which the data was previously sorted.

To detect key breaks, & then process data accordingly use new function LastRowInGroup(InputColumn). It can be used when processing an input row, and the function returns TRUE if the value in the named column changes after this row (that is, this row is the last row in a group). The function also returns TRUE if this row is the last row in the input data. This function can be called even if input data is sorted by more than one column, or If the argument specified is the primary sorted key, then the behavior is as described for a single column.

If the argument specified is the secondary or other sorted key, then LastRowInGroup() returns TRUE if the value of the specified column is about to change, or if any of its higher level of key columns are about to change. For example, if the primary sorted column is Col1 and the secondary sorted column is Col2, then LastRowInGroup(Col2) returns true when either the value in Col2 is about to change, or the value in Col1 is about to change. Therefore it can return true when Col2 is not about to change, but Col1 is, because Col1 is a higher level sorted key than Col2

Please refer previous Blog on Caching with-in Transformer for example and usage.
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

No comments:

Post a Comment