May 14th, 2006 by morgan
As I mentioned earlier, In 2005, I was fortunate enough to get the experience of designing a data quality solution for a very large organization from the ground up. This is the frame of reference that I am using for this series.
Imagine this scenario:
Due to forces outside their control (litigation, regulation, or just changes in the operating environment) the leadership team needs to personally vouch for the integrity of the data across all systems for your organization. They need to be able to accurately report the current state of information quality, where it going and where it has been. If a full accounting can’t be determined immediately, then they need to be able to provide as much quality information as possible now, and provide a realistic (but short) timeline for universal reporting. This needs to be accomplished with minimal head count and without affecting current operations.
Could your organization do this now? Does it want to? For a lot of organizations, this is a doomsday scenario, something that makes people clamp both hands over their ears, sing la-la-la-la-la, and think happy thoughts until it all goes away. However, every day we see things in the news where information quality problems cause big, thorny issues for otherwise well meaning organizations. We have seen Congress get involved with financial information (through Sarbanes-Oxley), medical information (with HIPPA), and governmental oversight (through the Data Quality Act). This isn’t a crazy request.
I have been personally involved in dealing with issues like this, where sudden, critical issues around information quality and security caused our organization to completely change how an organization operates. I have seen it from both sides of the coin, working both in management and in technology. I can tell you that dealing with these issues isn’t pretty, or cheap. However, they are do-able, and the more proactive you are the easier it is to deal with unforeseen circumstances.
More to come …
technorati tags: iinformation quality, data quality, SOX, compliance, HIPPA, data
Posted in Systems Integration, Information Quality, People | No Comments »
May 11th, 2006 by morgan
ETLGuru has posted a couple of articles on some of the same subjects we have been touching on.
I would recommend taking a look at:
(Ironically, I have a posting that is half done that was titled “What is ETL”, he beat me to the punch!)
The first article is a pretty good rundown of ETL for the data warehouse, although I would probably expand the scope of what exactly ETL covers. I don’t think that ETL is a data warehouse specific activity, although it is often focused around warehousing. Personally, I think a lot of people are doing ETL development, either on their own or with commercial tools, but call it something different, like “report writing”, “scripting”, or “database maintenance.” ETL is far more than populating a warehouse with a commercially produced tool like Ab Initio, Ascential, or Informatica.
One thing, I take issue with is the assertion (in the latter article) that an ETL process never creates new data. I would argue that a well architected process should create metadata, telling unambiguously what was done, how it was accomplished, and what was affected. Not only is it critically important data in its own right, metadata becomes disproportionately more valuable over time.
Why? Because the true cost of operating any system is calculated over the time of operation, not just the time of development. With many systems of scale, it isn’t possible to see a problem and follow it back to where it came from (sometimes called traceability). This is because the longer a process runs, the less an organization understands what it really does. The people who created it move on to other tasks and memories fade. Often (especially with legacy sources) metadata, is the first and only clue we have on how to fix things. Instead, we have to use clues and the breadcrumbs that were left for us to follow.
Without proper metadata,, maintaining a “black box” system is inefficient, expensive, and leaves customers frustrated. For example, if it takes two weeks for a problem to be resolved (say from a customer noticing a problem on a report all the way to a developer fixing the problem and an operator re-running the process) it is expensive. We are talking real money, costing an organization salaries (for all the people fixing the problem) and lost opportunities (for delayed or incorrect decisions due to the problem).
technorati tags: etl, metadata
Posted in ETL, Information Architecture, Practices, Understanding, Metadata | No Comments »
May 9th, 2006 by morgan
CIO Insight has published an interesting interview with Ralph Szygenda (please don’t ask me to pronounce it), the Chief Information Officer of General Motors.
One of his major points is that his team works as IT Brokers, not as technologists. I think that the term broker is right on for the knowledge worker, especially in today’s environment. My personal experience has been in corporate IT for the last 10 years or so, and I have seen a marked shift in how things work within the business world. At one time, the thought was to hire as many technical people as possible, throw twinkies and computers at them, and start reaping the benefits. Unfortunately, it didn’t quite work out …
Working with data on a large scale is inherently a people problem, as data is only a representation of the organization that owns it. Within the realm of the data warehouse and information architecture, it is clear that the people who deliver the most value (and hence will be most successful) are not necessarily the most technical. Szygenda talks about “insourcing” several thousand employees, because they provided a direct benefit to the operation of the business. Not because they had great skills or worked with the coolest technology, because they helped the organization run better.
The role of IT (and of the Data Warehouse) isn’t going away, it is becoming more focused on the things that pay the bills. The ability to grok C code, write clever shell scripts, or build elegant CSS files are all important, but important within the context of the business that they are in. Technology (and technologists) come and go, but a useful person is valuable over the long term.
Incidentally, one of the reasons I love working in ETL and Data Warehousing is the people. IMHO, the people in data warehousing are normally generalists and they often drift into the field by accident. When you sit down and talk with them and you ask them about their background, you will hear something like, “After I moved back from Botswana I went back to college and got a job doing X. To do my job better I started playing around with spreadsheets, which led to ….” These type of people are the perfect brokers, because they are able to make things happen by creating, stealing, gluing together, and rearranging parts until the problem is solved. I think this is one reason that the data warehousing is seen as mission critical, while other parts of IT are increasingly outsourced.
Posted in In the News, People, Practices, Relationships, Understanding | 1 Comment »
May 5th, 2006 by morgan
I just created a page for PSP-IQ on the wiki. If you are working in ETL or Information Quality, I would really recommend familiarizing yourself with the paper and what it has to say. Very interesting reading.
This page is a decent summary of this influential theory and how it works, but not a replacement for reading the paper yourself! Also, there is some additional information for practitioners based on my own experience in applying PSP-IQ in a corporate environment.
Posted in Information Quality | No Comments »
May 1st, 2006 by morgan
William McKnight brought up an interesting point about life in the information business. Namely, that access needs to be balanced with accountability for a system to be secure. However, I think there is one thing that missing in this analysis, and this is true for any information delivery system:
The most important part of any system is the people.
No amount of technology, policy, or security will keep a system clean unless the people running it are above board. Interestingly, living in Austin I have heard some details about about this case through the grapevine, as the BI community just isn’t that big. Evidently, this case was broken when an alert employee was looking at their P&L statement and questioned some resources. They were pooh-poohed by the the data warehouse manager but wouldn’t let it go. This person continued to press things and chased after the answers until the whole thing began to unravel. I think that this just solidifies my view on the importance of people.
Posted in People, Understanding | No Comments »
|

Architected.info is a web site dedicated to information architecture, focusing on transformation and understanding. We focus on these categories through the lens of organizational dynamics, looking at people, practices, and relationships.
Morgan Goeller is the author and maintainer of this website. He has worked as an architect and engineer, specializing in software development, web applications, database engineering, ETL, and information quality.
|