Wednesday, December 14, 2011

Why I need to use Run Time Column Propagation (RCP)

While designing DataStage jobs need to consider only columns which are required and unnecessary column propagation is not done. Columns, which are not needed in the job flow, should not be propagated from one stage to another and from one job to the next. It means keep RCP away from the Job "BUT" there are areas where this comes handy in processing.When you need to handle undefined columns that you encounter when the job is run, and propagate these columns through to the rest of the job RCP (Runtime Column Propagation) is the way forward. This check box enables the feature, to actually use it you need to explicitly select the option on each stage
If runtime column propagation is enabled in the DataStage Administrator, you can select the Runtime column propagation to specify that columns encountered by a stage in a parallel job can be used even if they are not explicitly defined in the meta data. You should always ensure that runtime column propagation is turned on if you want to use schema files to define column meta data.

RCP Set at DataStage Adminstrator:
RCP Set at DataStage Stage Output:
Here are few areas where RCP can make difference

Merge Stage & Lookup Stage: Ensure required column meta data has been specified (this may be omitted altogether if you are relying on Runtime Column Propagation).

Shared Container: When inserted in a job, a shared container instance already has meta data defined for its various links. This meta data must match that on the link that the job uses to connect to the container exactly in all properties. The Inputs page enables you to map meta data as required. The only exception to this is where you are using runtime column propagation (RCP) with a parallel shared container. If RCP is enabled for the job, and specifically for the stage whose output connects to the shared container input, then meta data will be propagated at run time, so there is no need to map it at design time.

For parallel shared containers, you can take advantage of runtime column propagation to avoid the need to map the meta data. If you enable runtime column propagation, then, when the jobs runs, meta data will be automatically propagated across the boundary between the shared container and the stage(s) to which it connects in the job.

Again "Do use RCP" when you need to propagate these "un-knowns" else processing additional columns for large data means utilization of precious resources and time on "un-wanted data".

-Ritesh
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

No comments:

Post a Comment