Focus on ETL — Introduction

As a consultant, most of my time is spent working in, on, and around ETL projects and systems. It is a growing niche that is very useful and makes a lot of data warehousing and analysis possible. I enjoy the work and it pays pretty well.
As time has gone on, I have been on the lookout for some type of “first principles” for ETL, some method behind the madness. At first, I just figured I didn’t have the right website or book and just needed to dig further. However, I am at the point now where I think there just isn’t a consistent defintion of exactly what ETL is.

Some of this is probably because ETL is dominated by consultants, and when you are paid by the hour there is no need to speed things up things with total consistency. However, I think that there is no common definition for ETL because it is a unique discipline that. So, as the first part of my “focus on ETL”, I want to try and pin down some things about the discipline and how I see it.

A Very Visible Definition

Wikipedia defines ETL as:

… a process in data warehousing that involves

  • extracting data from outside sources,
  • transforming it to fit business needs, and ultimately
  • loading it into the data warehouse.

This isn’t a terrible start, although I believe that it is too narrow and only reflects the current state of the industry from the point of view of tool vendors and consultants. This is really limiting, and doesn’t fully describe everything that ETL seems to cover.

My Definition

After a lot of thought, I have come up with a definition of my own:

ETL is the art, science, and magic of building coherent, useful information from disparate data sources. It encompasses everything from:

  • The undertsanding and use of source systems and formats.
  • The code and logic needed to manipulate and transform the data.
  • The medium of transformation

In other words, ETL is data-oriented programming.

I don’t consider ETL to be an activity. Instead, I consider it to be a technical discipline that requires a lot of training, effort, experience, flexibility, and creativity.  It spans across multiple platforms, languages, skillsets, and disciplines and delivers something unique and not well understood.  While ETL is at the nexus of several different ideas, it stands on its own as something that is very useful.
Well, enough for now. In my next few posts, I will discuss the implications of data-oriented programming and look at how ETL compares to other areas of computer science and information technology.

Share and earn some karma ...These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • NewsVine
  • Reddit
  • Spurl
Digg this     Create a del.icio.us Bookmark     Add to Newsvine

No Responses to “Focus on ETL — Introduction”

No comments yet

Leave a Reply