Wednesday, April 27, 2011

Parameter Sets a time saving Feature for DataStage Developers

InfoSphere DataStage has a great function called Parameter Sets which allows to group job parameters and store default values in files. With the help of Parameter Sets,

Developers and companies can group a set of related parameters together and add them in one click to any Sequence or Server or Parallel DataStage job and maintain them from a single parameter definition.It also reduce efforts to create Sequence jobs and Shared Containers. It reduces efforts required when we add a Server or Parallel job to a Sequence Job as need to set less properties provided both jobs are synchronizing parameters via Parameter Sets. When you add a Shared Container to a job you only need to link in the Parameter Set name and not each individual parameter.

You can take the storage of parameter values completely out of the hands of individual sequence, server or parallel jobs and put them into centralised files or objects.
To elaborate usually we have 10 job parameters per job once you have file, database, processing, date and source system job parameters. Add this job onto a Sequence job canvas you've got to pass through every parameter value via manual clicks, it takes 10 clicks to add and configure a job. Time consuming when you just want to throw together a Sequence job for some testing. With ParameterSet you do one value per ParameterSet rather than one value per Parameter - many less clicks and less efforts over the period. Easy to manage as well.

Here is detailed Information on how to create and use Parameter Sets.
How to Create, Use and Maintain DataStage Parameter Sets

-Ritesh
Disclaimer: "The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions"

Optimzie your DataStage Job Performance with relevant Environment Variables

DataStage has many parameters which can be tweaked and used to optimize the performance of various DataStage Jobs. Even many available to collect more information during the event of crash to get more traces.
For any DataStage Job if you run into problem or want to get more details need to check following variables.

$APT_CONFIG_FILE: This allows you to define Configuration file based on your requirement. You can keep many configuration files with n-node combination and assign it dynamically for Job based in criteria or time.
$APT_SCORE_DUMP: It creates a job run report that shows the partitioning used, degree of parallelism, data buffering and inserted operators. It is Useful for finding out what your high volume job is doing.
$APT_PM_PLAYER_TIMING: This option lets you see what each operator in a job is doing, especially how much data they are handling and how much CPU they are consuming. It helps in identifying various bottlenecks.
Please refer to following link from IBM for a detailed list of Environment Variables which can help you in various areas. It covers Buffering, Building Custom Stages, Compiler, Debugging, Disk I/O, General Job Administration and many other relevant areas in the field.
Here is Extensive List of DataStage_Environment_Variables with each of them categorized and explained and how to use it is discussed.

Here is more details on Environment Variables and Parameter Sets.
Using DataStage 8 Parameter Sets to Tame Environment Variables

-Ritesh
Disclaimer: "The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

Wednesday, April 13, 2011

Sending Notification on specific action via DataStage Jobs

DataStage Job can use notification activity to send email or SMS on DataStage Job Failure or completion or based on any condition mentioned.
  • create notification activity
  • create start loop activity
  • create end loop activity
  • set start loop activity to iterate through a comma separated list of emails
  • set the email recipient field in the notification activity to the counter activity variable of start loop stage
 DataStage Notification Activity Page contains:
• SMTP Mail Server Name where we can provide the server name or its IP or any other Valid Address.
• Senders email address. Given in the form ops@mymaildomain.com.
• Recipients email address where email is to be sent to, given in the form of admin@mymaildomain.com. This address can be used to send SMS by providing details as provided by Service Provider. 
• Email subject to provide the text to appear as the email subject.
• Attachments for the files to be sent with the email. Specify a path name, or a comma-separated list of path names. Can use Parameter as well
• Email body where the message we want to send
• Include job status in email to provide job status information in the message.
• Do not checkpoint notification can be used to specify that DataStage does not record checkpoint information for this particular notification operation. It means if a job later in the sequence fails, and the sequence is restarted, this notification operation will be re executed regardless of the fact that it was executed successfully before. This option is only visible if the sequence as a whole is checkpointed.
-Ritesh
Disclaimer: "The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

A Note on Error Handling in Datastage Job Design

There are multiple ways to handle Errors in Data or in DataStage Jobs.

We can use the Reject link Option via Transformer Stage and also Reject Link Option from within Connector Stages. If we face issue in Job Sequences, We can use the "Exception Handler" activity. Here is how we can call use this activity from with in DataStage Job Sequence.

You can check the Check box named "Automatically handle activities that fail" at properties of master sequence. As you might want to have a Check Point, check "Restart job from the failure point".

In DataStage Job sequence use a exception handler activity. Post exception handler activity can include a email notification activity (same for SMS). On Job Failure the handle will go to the exception handler activity and an email/SMS willl be sent notifying the user that a sequence has failed. It also provides information on failure code as we select part of Job Design.
More on "How to use Notification Activity" in next one.

-Ritesh
Disclaimer: "The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

A Detailed list of DataStage Error Codes

As InfoSphere™ DataStage® throws various warnings/Errors at run time and field teams and customer find it hard to co-relate with what is the problem and possible reasons. Here is a brief list of various errors and their small description for reference.  Please comment if you find other Error Codes so I can consolidate the list.

Few Error Codes:

InfoSphere DataStage API Error Codesr
Code Error Token Description
0 DSJE_NOERROR No InfoSphere DataStage API error has occurred.
-1 DSJE_BADHANDLE Invalid JobHandle .
-2 DSJE_BADSTATE Job is not in the right state (compiled, not running).
-3 DSJE_BADPARAM ParamName is not a parameter name in the job.
-4 DSJE_BADVALUE Invalid MaxNumber value.
-5 DSJE_BADTYPE Information or event type was unrecognized.
-6 DSJE_WRONGJOB Job for this JobHandle was not started from a call to DSRunJob by the current process.
-7 DSJE_BADSTAGE StageName does not refer to a known stage in the job.
-8 DSJE_NOTINSTAGE Internal engine error.
-9 DSJE_BADLINK LinkName does not refer to a known link for the stage in question.
-10 DSJE_JOBLOCKED The job is locked by another process.
-11 DSJE_JOBDELETED The job has been deleted.
-12 DSJE_BADNAME Invalid project name.
-13 DSJE_BADTIME Invalid StartTime or EndTime value.
-14 DSJE_TIMEOUT The job appears not to have started after waiting for a reasonable length of time. (About 30 minutes.)
-15 DSJE_DECRYPTERR Failed to decrypt encrypted values.
-16 DSJE_NOACCESS Cannot get values, default values or design default values for any job except the current job.
-99 DSJE_REPERROR General engine error.
-100 DSJE_NOTADMINUSER User is not an administrator.
-101 DSJE_ISADMINFAILED Failed to determine whether user is an administrator.
-102 DSJE_READPROJPROPERTY Failed to read property.
-103 DSJE_WRITEPROJPROPERTY Property not supported.
-104 DSJE_BADPROPERTY Unknown property name.
-105 DSJE_PROPNOTSUPPORTED Unsupported property.
-106 DSJE_BADPROPVALUE Invalid value for this property.
-107 DSJE_OSHVISIBLEFLAG Failed to get value for OSHVisible.
-108 DSJE_BADENVVARNAME Invalid environment variable name.
-109 DSJE_BADENVVARTYPE Invalid environment variable type.
-110 DSJE_BADENVVARPROMPT No prompt supplied.
-111 DSJE_READENVVARDEFNS Failed to read environment variable definitions.
-112 DSJE_READENVVARVALUES Failed to read environment variable values.
-113 DSJE_WRITEENVVARDEFNS Failed to write environment variable definitions.
-114 DSJE_WRITEENVVARVALUES Failed to write environment variable values.
-115 DSJE_DUPENVVARNAME Environment variable being added already exists.
-116 DSJE_BADENVVAR Environment variable does not exist.
-117 DSJE_NOTUSERDEFINED Environment variable is not user-defined and therefore cannot be deleted.
-118 DSJE_BADBOOLEANVALUE Invalid value given for a boolean environment variable.
-119 DSJE_BADNUMERICVALUE Invalid value given for an integer environment variable.
-120 DSJE_BADLISTVALUE Invalid value given for a list environment variable.
-121 DSJE_PXNOTINSTALLED Environment variable is specific to parallel jobs which are not available.
-122 DSJE_ISPARALLELLICENCED Failed to determine if parallel jobs are available.
-123 DSJE_ENCODEFAILED Failed to encode an encrypted value.
-124 DSJE_DELPROJFAILED Failed to delete project definition.
-125 DSJE_DELPROJFILESFAILED Failed to delete project files.
-126 DSJE_LISTSCHEDULEFAILED Failed to get list of scheduled jobs for project.
-127 DSJE_CLEARSCHEDULEFAILED Failed to clear scheduled jobs for project.
-128 DSJE_BADPROJNAME Invalid project name supplied.
-129 DSJE_GETDEFAULTPATHFAILED Failed to determine default project directory.
-130 DSJE_BADPROJLOCATION Invalid path name supplied.
-131 DSJE_INVALIDPROJECTLOCATION Invalid path name supplied.
-132 DSJE_OPENFAILED Failed to open UV.ACCOUNT file.
-133 DSJE_READUFAILED Failed to lock project create lock record.
-134 DSJE_ADDPROJECTBLOCKED Another user is adding a project.
-135 DSJE_ADDPROJECTFAILED Failed to add project.
-136 DSJE_LICENSEPROJECTFAILED Failed to license project.
-137 DSJE_RELEASEFAILED Failed to release project create lock record.
-138 DSJE_DELETEPROJECTBLOCKED Project locked by another user.
-139 DSJE_NOTAPROJECT Failed to log to project.
-140 DSJE_ACCOUNTPATHFAILED Failed to get account path.
-141 DSJE_LOGTOFAILED Failed to log to UV account.
-201 DSJE_UNKNOWN_JOBNAME The supplied job name cannot be found in the project.
-1001 DSJE_NOMORE All events matching the filter criteria have been returned.
-1002 DSJE_BADPROJECT ProjectName is not a known InfoSphere DataStage project.
-1003 DSJE_NO_DATASTAGE InfoSphere DataStage is not installed on the system.
-1004 DSJE_OPENFAIL The attempt to open the job failed - perhaps it has not been compiled.
-1005 DSJE_NO_MEMORY Failed to allocate dynamic memory.
-1006 DSJE_SERVER_ERROR An unexpected or unknown error occurred in the engine.
-1007 DSJE_NOT_AVAILABLE The requested information was not found.
-1008 DSJE_BAD_VERSION The engine does not support this version of the InfoSphere DataStage API.
-1009 DSJE_INCOMPATIBLE_SERVER The engine version is incompatible with this version of the InfoSphere DataStage API.

 
API Communication Layer Error Codes
Error Number Description
39121 The InfoSphere DataStage license has expired.
39134 The InfoSphere DataStage user limit has been reached.
80011 Incorrect system name or invalid user name or password provided.
80019 Password has expired.

-Ritesh
Disclaimer: "The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

A Big Bunch of DataStage Complaints and Solutions

Here is a detailed list of various problems field team face while using InfoSphere DataStage and solutions for these problems as suggested by Vincent Mcburney.
This blog refers many other decent blogs which can highlight areas like DataStage Reports, Job Parameters, Parallel Transformer and so on. Refer these extended blogs to get a clear picture.
A Big Bunch of DataStage Complaints and Solutions

-Ritesh