September 19th, 2006 by morgan
As I was browsing this morning I found a couple of intertwined nuggets of wisdom from Chris Dowse (CEO of Neochange) and Dogbert (yes, Dilbert’s dog).
Chris wrote a nice article on Sandhill.com titled “Beware of the Missing Value Dialogue”, which discusses a subtle but important communication issue: connecting with your existing customers. All too often, we are focused on growth, which means appealing to new customers. However, if our existing customers aren’t happy then we have to grow faster just to keep even. We are forced to tread water faster and faster just to keep afloat. You might call this the AOL effect (although not the AIM effect).
Just after I read this article I caught the RSS feed of Dilbert’s latest missive on the Blackberry.
Although these might seem very different, both the CEO and the comic book highlight how easy it is to let things get inbetween us and what is most important. As a consultant, I am forced to be a lot more focused on my customers than I was as a permanent employee. However, communication and customer service are still a struggle, every single day. The “now” distracts us from the “eternal”, and most of the time we don’t even notice it.
It might be form, function, ideas, or technology, but whatever it is it saps our energy and forces us to stay in place instead of heading for shore. Let’s keep focused on who we are and where we need to go, for the sake of ourselves and our customers!
Posted in People, Practices, Understanding | No Comments »
September 18th, 2006 by morgan
James Taylor (no, not that James Taylor, the other one) had an interesting article about SOA’s, agility, and architecture. While the article is a riff on another article (which makes this a meta-riff, I suppose) , it got me to thinking about the development lifecycle.
I think it is very ironic that in ETL and data-oriented programming we run into the same contradictions all the time:
- Development time is the smallest cost in the entire process in terms of time, resources, and money.
- Software development is scrutinized to death.
- On-time delivery is significantly more important than long-term cost savings, even if it impacts long-term functionality.
Now, I don’t think this is done out of malice or spite for IT. A lot of it may simply be because development is the one part of the development lifecycle that can be influenced by the project sponsor. However, as practitioners we need to make sure that information architecture is focused on consistently delivering tangible value to our organization. This means effectively communicating the true overall cost for systems development and making sure that the organization as a whole understands what we are doing.
Posted in ETL, Information Architecture, People, Practices | No Comments »
September 15th, 2006 by morgan
Computerworld has a great article on dealing with projects in jeopardy. While this is more managment than architecture, anyone who has been involved with a bad project will appreciate this practical advice.
Most interesting is that while there are 10 steps covering everything from risk assessment to management and reporting, there isn’t one for assigning blame. Focusing on assigning blame is like trying to recover the sunk cost in a bad investment. While some projects seem to have the ability to make time stop, in reality time is only forward, and the best thing we can do is try to learn from our mistakes and make things better in the future.
Posted in People, Practices, Relationships | 1 Comment »
September 15th, 2006 by morgan
Classifying ETL
It will help to take a bit of time to discuss how software development is classified. Historically, classification of software development were done around methodology and/or representations. Some common ways to look at development are:
Looking at things through the lens of methodology is a more academic view of things, and more prevalent in the early days of computing.
Another Way to Look at Things
Practitioners often look at things a bit differently, often through the functionality of what is being created. Some ways to look at development this way are:
- Web Programming (like PHP, AJAX, DHTML, etc)
- Glue Programming (PERL, Python, TCL, and too many scripting languages to list)
- UI Development (TK, XUL, UIML)
- Mathematics (MatLab, SAS, R, many others)
This is a more practical view of things, more prevalent today, especially in the IT world.
Where We Fall
ETL is function, so they are most easily classifed in a functional way. However Data Oriented Programming is more of a methodology (although more of a hybrid than anything else). So, it is tough to encompass this in just one category. It probably makes most sense to say that ETL should be viewed from the functional point of view, while the things that are used to build ETL processes should be viewed from a methodological point of view.
Next in the “Focus on ETL” series we will be looking at what goes into an ETL process.
Posted in ETL, Information Architecture, Transformation | 1 Comment »
September 11th, 2006 by morgan
As a consultant, most of my time is spent working in, on, and around ETL projects and systems. It is a growing niche that is very useful and makes a lot of data warehousing and analysis possible. I enjoy the work and it pays pretty well.
As time has gone on, I have been on the lookout for some type of “first principles” for ETL, some method behind the madness. At first, I just figured I didn’t have the right website or book and just needed to dig further. However, I am at the point now where I think there just isn’t a consistent defintion of exactly what ETL is.
Some of this is probably because ETL is dominated by consultants, and when you are paid by the hour there is no need to speed things up things with total consistency. However, I think that there is no common definition for ETL because it is a unique discipline that. So, as the first part of my “focus on ETL”, I want to try and pin down some things about the discipline and how I see it.
A Very Visible Definition
Wikipedia defines ETL as:
… a process in data warehousing that involves
- extracting data from outside sources,
- transforming it to fit business needs, and ultimately
- loading it into the data warehouse.
This isn’t a terrible start, although I believe that it is too narrow and only reflects the current state of the industry from the point of view of tool vendors and consultants. This is really limiting, and doesn’t fully describe everything that ETL seems to cover.
My Definition
After a lot of thought, I have come up with a definition of my own:
ETL is the art, science, and magic of building coherent, useful information from disparate data sources. It encompasses everything from:
- The undertsanding and use of source systems and formats.
- The code and logic needed to manipulate and transform the data.
- The medium of transformation
In other words, ETL is data-oriented programming.
I don’t consider ETL to be an activity. Instead, I consider it to be a technical discipline that requires a lot of training, effort, experience, flexibility, and creativity. It spans across multiple platforms, languages, skillsets, and disciplines and delivers something unique and not well understood. While ETL is at the nexus of several different ideas, it stands on its own as something that is very useful.
Well, enough for now. In my next few posts, I will discuss the implications of data-oriented programming and look at how ETL compares to other areas of computer science and information technology.
Posted in ETL, Practices | No Comments »
|

This is the about me section, you will prob. want to edit this. If you want to change the image you may do so by changing the avatar.jpg located in the NewZen images directory.
|