Processing InfoSphere DataStage DataSet from Command Line (Orchadmin Utility)

I recently discussed Dataset which is the core of any ETL Design with-in InfoSphere DataStage. Now this Dataset need to be maintained from command line is key for Automation purpose. We can't rely on GUI for automation and validation purpose.

We can use "orchadmin"  utility shipped with InfoSphere DataStage for delete, copy, describe and dump dataset aliad ORCHESTRATE file. As it is command line utility can be used for automation purpose and read input from a file or standard input.

As usual environment need to be set like below before we use orchadmin utility.

export DSHOME=$(cat /.dshome)
. $DSHOME/dsenv

export LD_LIBRARY_PATH=$APT_ORCHHOME/lib
export APT_CONFIG_FILE=$DSHOME/../Configurations/default.apt
export PATH=$DSHOME/bin:$APT_ORCHHOME/bin:/$PATH

Lets see how to Delete all DataSet in a specified directory.
orchadmin rm *.ds

Direct rm will not delete all the related contents of Dataset which is spread
across multiple nodes.

Removing Data from specific DataSet: orchadmin truncate -n 10 input.ds
Remove all data from input.ds:    orchadmin truncate input.ds
Dump all records of all partitions:  orchadmin dump -name input.ds
Dump value of the name field of the first n-records of partition 0 of input.ds
orchadmin dump -part 0 -n 17 -field name input.ds
List the partitioning info, data files and schema:  orchadmin ll file1 file2
Describe disk pool pl1 in node pool ritnodes: orchadmin diskinfo -np ritnodes pl1
Check command for orchadmin checks the configuration file for any problems

-Ritesh
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions