Friday, December 9, 2011

Caching 'Time Saver' in DataStage Transformer

This blog discusses simplification of even the most complicated data integration challenges.  When we can achieve that, and make data processing more efficent, it's the best of both worlds.  The new cache mechanism is a benefit to both of those goals. 
The Transformer Cache is an in-memory storage mechanism that is available from within the Transformer stage and is used to help solve complex data integration scenarios. The cache is a first-in/first-out (i.e. FIFO) construct and is accessible to the developer via two new functions:
  • SaveInputRecord: stores an input row to back of the cache
  • GetInputRecord: retrieves a saved row from the front of the cache 
These functions should be called from the stage variable or loop variable sections of the transformer in most cases. Developers will find the cache most useful when a set of records need to be analyzed as a single unit and then have a result of that data appended to each record in the group. 
Here are few scenarios discussed by Tony in detail where using a cache will prove VERY helpful:
  • The input data set is sorted by fund id and valuation date in ascending order. We have an unknown number of records for each fund.  The requirement is to output the five most recent valuations for any fund and if there are not at least five, do not output any.
  • There is a varying number of clients (N) related to each salesperson.  The requirement is to label each such client detail record with a label that reads "1 of N". 
  • An input file contains multiple bank accounts for each customer. The requirement is to show the percentage of the total balance for each individual account record.
    Perhaps one or more of these sounds familiar to you. You may also refer to the Information Server InfoCenter for more detail on this solution.
 -Ritesh
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions


2 comments:

  1. I am William..
    I just browsing through some blogs and came across yours!
    Excellent blog, good to see someone actually uses for quality posts.
    Your site kept me on for a few minutes unlike the rest :)
    Keep up the good work!Thanks for sharing a important information on datastage

    ReplyDelete