ETL and Metadata

ETLGuru has posted a couple of articles on some of the same subjects we have been touching on.

I would recommend taking a look at:

(Ironically, I have a posting that is half done that was titled “What is ETL”, he beat me to the punch!)

The first article is a pretty good rundown of ETL for the data warehouse, although I would probably expand the scope of what exactly ETL covers. I don’t think that ETL is a data warehouse specific activity, although it is often focused around warehousing. Personally, I think a lot of people are doing ETL development, either on their own or with commercial tools, but call it something different, like “report writing”, “scripting”, or “database maintenance.” ETL is far more than populating a warehouse with a commercially produced tool like Ab Initio, Ascential, or Informatica.

One thing, I take issue with is the assertion (in the latter article) that an ETL process never creates new data. I would argue that a well architected process should create metadata, telling unambiguously what was done, how it was accomplished, and what was affected. Not only is it critically important data in its own right, metadata becomes disproportionately more valuable over time.

Why? Because the true cost of operating any system is calculated over the time of operation, not just the time of development. With many systems of scale, it isn’t possible to see a problem and follow it back to where it came from (sometimes called traceability). This is because the longer a process runs, the less an organization understands what it really does. The people who created it move on to other tasks and memories fade. Often (especially with legacy sources) metadata, is the first and only clue we have on how to fix things. Instead, we have to use clues and the breadcrumbs that were left for us to follow.

Without proper metadata,, maintaining a “black box” system is inefficient, expensive, and leaves customers frustrated. For example, if it takes two weeks for a problem to be resolved (say from a customer noticing a problem on a report all the way to a developer fixing the problem and an operator re-running the process) it is expensive. We are talking real money, costing an organization salaries (for all the people fixing the problem) and lost opportunities (for delayed or incorrect decisions due to the problem).

technorati tags: ,

Share and earn some karma ...These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • NewsVine
  • Reddit
  • Spurl
Digg this     Create a del.icio.us Bookmark     Add to Newsvine

No Responses to “ETL and Metadata”

No comments yet

Leave a Reply