InfoSphere Information Server How To: BigData Discussion

Thought of taking MapReduce vs ETL tools discussion to next level and discuss how IBM making initiatives. IBM DataStage already provides integration for BigData processing within Hadoop, with Information Server 9.1, launches an Oozie workflow within Hadoop which can process unstructured data and return the results into your ETL workflow. As mentioned in previous blog, custom coding is not way forward, leveraging tools like Data Stage enables developer integrate BigData into their ETL workflow using same tooling. Here is link to Information Server 9.1

Using IBM's BigInsights Enterprise Hadoop platform, structured data from an external RDBMS can be brought into BigInsights cluster to be processed next to your unstructured data. Post processing results can be stored at desired location. BigInsights 2.1 enable developers to leverage BigSQL an ANSI SQL interface to Hive, HBase, HDFS, CSV(delimited, sequence files), and JSON. Even existing reporting and BI can leverage this in addition to Data Analysts who don't know Map Reduce or Pig. Data Analysts can also use the BigSheets functionality that gives them a similar spreadsheet interface with a good number of spreadsheet functions. Here is you can refer features of BigInsights 2.1

Tooling enables any new developer focus into desired requirements than understanding multiple functions, scripts, utilities with no or very minimal documentation and may or may be nor any standards followed. With these tools, start contributing immediately and time to ramp up is greatly diminished.

Based on my interactions and working with customers, MapReduce customer base is that Hadoop is primarily augmenting the existing data warehouse environment with no exploration of replacing it. It enabled them to process unstructured and structured data at scale. Off-course if existing ETL tooling vendors not move towards innovative business propositions and create seamless hadoop and ETL infrastructure, Hadoop certainly has the potential to overcome many of the potential limitations of traditional ETL/ELT tools. Organizations are exploring to remove the ETL bottlenecks and only challenge they have is Hadoop in current form not a complete ETL solution. While it offers powerful utilities and massive horizontal scalability, it does not provide all the functionality/capabilities required to deploy ETL/ELT. Existing vendors already moving towards filling this gap, it need to happen soon.

-Ritesh

Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

BigData Discussion - MapReduce vs ETL tools? Series 2