What Data Integration really means
My colleague, Ariel, recently wrote a post about tips for creating SAP BusinessObjects Data Integrator jobs. As the name suggests this is a tool for Data Integration (DI), so his article inspired me to learn more about DI. As a marketing person, my knowledge is not deep, but I know that Data Integration is an important part of business intelligence and data warehousing, and I know that it’s about aligning data from multiple sources. Beyond that, my understanding was flawed.
Data Integration is not ETL
I thought that Data Integration meant ETL, and was therefore confined to Data Warehousing. The difference between ETL and DI, as it turns out, is important. I started with two articles I can highly recommend:
- HowStuffWorks “How Data Integration Works” by Jonathan Strickland. Page 1-3 are extremely basic, but it gets better. (aside: this website also produces an amusing podcast)
- TDWI “Data Integration In a Nutshell: Four Essential Guidelines” by Philip Russom
A Broader Definition Of DI
Both articles lead me to expand my understanding of Data Integration beyond ETL. We’ll start with my explanation of ETL:
- ETL stands for Extract Transform Load. It’s the process of taking information from each data source, doing some calculations (like summing sales figures for instance) and preparation (like rejecting bad data), and loading it into a data warehouse — a central location where it’s fast and easy for user to find answers.
Another two ways to use Data Integration:
- There are some situations where a data warehouse is not the best way to go. You can still give users the answers they need by pulling data together on the fly whenever someone needs information. Importantly, you have to define a schema for how the data relates to each-other, and also define what the data means to the user. This is also data integration. Strickland refers to Networked Databases here — and there are a couple of different approaches.
- A theme underlying Russom’s article points towards a third type of Data Integration: Operational DI. Where traditional DI is concerned with summarizing data to present to a user for analysis, Operational DI is not. If I understand it correctly, Operational DI is concerned with moving and exchanging data between operational systems. For instance: making sure that the customer data in the CRM is the same as the customer data in the billing system, or synchronizing the marketing database in Canada and the USA.
Next week I will write about two related concepts — meta data and abstraction. While not required for a definition of Data Integration, they should be a required accompaniment to a real-world implementation of DI.
-IainR
Sr. Marketing Manager – The BI Builders
Related Posts
