With the growth in Data and requirement of processing it
optimally Filtering Duplicate data is becoming amajor concern for the
Enterprises. InfoSphere DataStage recently released Bloom Filter stage
iwhich s based on the algorithm developed by Burton Howard Bloom and
provides highly performant and resource efficient duplicate key
filtering. This is particularly useful in Telco data integration
patterns involving the enormous volumes of call detail records. It can
be used to perform key lookups more efficiently.You might find false
positives
but Bloom Filter never generates false negatives in your output
dataset.Bloom Filter Stage takes a single input dataset, and can
generate
multiple output sets depending on the operating mode.
Will cover detailed Job Design and Usage in a follow-up.