With the growth in Data and requirement of processing it optimally Filtering Duplicate data is becoming amajor concern for the Enterprises. InfoSphere DataStage recently released Bloom Filter stage iwhich s based on the algorithm developed by Burton Howard Bloom and provides highly performant and resource efficient duplicate key filtering. This is particularly useful in Telco data integration patterns involving the enormous volumes of call detail records. It can be used to perform key lookups more efficiently.You might find false positives but Bloom Filter never generates false negatives in your output dataset.Bloom Filter Stage takes a single input dataset, and can generate multiple output sets depending on the operating mode.
Will cover detailed Job Design and Usage in a follow-up.