Use Java classes from with-in DataStage Job

Came across "Real Time" and ETL queries so thought of sharing some basic information. Service Oriented Architecture (SOA) is the key for enterprises today and it includes Web Services, messaging, and XML all of them leads to "real time" processing. Integrated Java with ETL processing leads to discussion of handling some functionality, some existing algorithms worth re-using or java managed system or message queue that providing valuable source data (or would be a valuable target). How all of this can be integrated into a data integration flow. InfoSphere DataStage can easily be extended to include existing Java functionality or integrate with various flows executed outside ETL developed using Java and related Technology.

Information Server provides two Stages that used to be referred to as JavaPack Java Client Stage (JCL) and Java Transformer Stage (JTE).Both these stages allow you to integrate the functionality of a java class into the flow of a DataStage Job. Java Client is used for a sources or targets (only an output link or only an input link), and the Java Transformer is used for row-by-row processing where you have something you’d like to invoke for each row that passes through.

InfoSphere DataStage provides a simple API for including java classes into your Jobs.  This API allows your class to directly interact with the DataStage engine at run-time — to obtain meta data about the columns and links that exist in the current executing job, and to read and write rows from and to those links when called upon to do so. Need to define special methods in your class, such as Process(), that the engine calls whenever it needs a row, or is giving your class control because it’s ready to give you a row.  Within that method you have various calls to make, such as readRow [from an input link] and writeRow [to an output link]. Processing in-out can be controlled and also can integrate rejections. In simple terms can read from JMS queues, invoke remote EJBs or read Web Service directly sending data, being processed and write to a JMS queue for next processing.


-Ritesh

Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions