Data Cleansing - Consistancy or Standardization Phase- QualityStage-IV


Post confirming the goals and analyzing the source data we can start creating the process which will generate cleansed data and is known as "Design & Develop Jobs". Once these Jobs are designed need to run in specified sequence with-in Sequencer. Designing the components that are required to build data quality jobs with InfoSphere® QualityStageâ„¢ involves one or more of the following steps. 
Standardizing data: Standardizing data involves preparing and conditioning data by using various stages and reports with-in InfoSphere DataStage & QualityStage which helps to correctly parse and identify each element or token, and place them in the appropriate column in the output file, the Standardize stage uses rule sets that are designed to comply with standards or conventions. The Standardize rule sets can assimilate the data and append additional information from the input data, such as gender. 
Matching data: After the data is standardized here comes matching to identify either duplicates or cross-references to other files. Data cleansing assignment determines matching strategy whether it is to match individuals, match companies, perform house-holding, or reconcile inventory transactions. 
Identifying surviving data: After the data matching is complete need to identify which records (or columns of a set of duplicate records) from the match data survive and become available for formatting, loading, or reporting. Survivorship facilitates that the best available data survives and is correctly prepared for the target destination. Thus, survivorship consolidates duplicate records, creating a best-of-breed representation of the matched data, enabling organizations to cross-populate all data sources with the best available data. In this step, when you have duplicate records, you must make these decisions: 
InfoSphere QualityStage provides survivorship to perform one or more of the following functions on your data:
  -Ritesh
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions