The Cost of Information Quality (per record)

June 9th, 2006 by morgan

In “They Need It All”, Frank Dravis brings up an interesting point about information quality and scale.

A common debate in project management circles is how much data quality functionality is needed by “lower end” users. In this case, we define a low-end user based on data volumes. More specifically, a person who works for a firm with a million or fewer records that need to be processed.

Do smaller organizations need data quality less? Dravis doesn’t think so …

It is a misperception that just because a firm has a relatively small data set they have small processing requirements. Oh to the contrary. It is true the processing (volume) demands and the value derived through cleansing software is dramatically different between a Fortune Five global enterprise, and a small direct marketing firm, but the sophistication of what they need to do to the data is the same.

Exactamundo! If smaller organizations need the same level of data sophistication for less data, then they actually need greater overall information quality (at least when measured on by volume). If your decision making ability is impaired when 10% of your data is invalid, it is a lot easier to get there if you are using 10 records compared to 10 million.

The question then becomes, how do you serve this market hmmmmmmmmmm ….

technorati tags:, ,

The Challenges of Real-Time

June 9th, 2006 by morgan

A lot of my recent work has been in real-time (actually near real-time) data warehousing. There are some real challenges for ETL and information quality when moving towards a real-time environment. Everything seems to become more dificult, and at times the constraints become almost unbearable to work with. You really, really, really need a real-time system in order to justify building one, especially from a data-centric point of view.

What got me writing about this was reading an that “some Cingular subscribers endure 4-hour-plus outage (and the fact that this isn’t the first time this has happened). I knew exactly what the Cingular representative was talking about when I read this quote …

“There’s a database that has all the customer numbers and somehow, we don’t knowwhy at this point, about 10 percent (of customers in the area) were prohibited from making or receiving calls,” Merriman said.

The big issues around real-time systems are in dealing with emergence within the system. Things get into an unexpected state, and it is very difficult to figure out why, especially after the fact. This is because when are running in real time:

  1. Resources are at a premium, and often this means that only enough data is kept in order to process what is available right now.
  2. Data handling is set up to ensure that the system doesn’t break, not to ensure optimal quality.
  3. Downtime usually means there is normally no information coming in. It is usually very hard to know what you know you don’t know.
  4. Breakage is normally catastrophic and the priority is on getting
    things going again, not performing detailed analysis on what happened.

Because you have a lot less information than in a batch-processing type system it is a lot harder to figure out what is going on. Good luck to the Cingular engineers in preventing this type of thing in the future ;-)

technorati tags:, , ,

Information Quality Saving Lives

June 8th, 2006 by morgan

A few days back, The Register wrote about some of the severe information quality problems occurring within police departments across England. In auditing crime reports, it found that more than 1 out of 3 departments did not meet statutory requirements and that individual crime reports had initial error rates of 15-86%, and error rates of 22% after being reviewed by a supervisor.

I credit the Brits for at least trying to address the problem, and doing it in a methodical and well-thought out manner. Instead of sanctions, they are trying to ensure that “a reverence for data quality must become part of the culture of the police”. I have absolutely no information about this situation other than what I have read, but I would suggest that a system that allows error rates above 20% after review may need to be completely re-worked.

An interesting comparison can be made to the progress being done by William J Bratton and the Los Angeles Police Department with real-time analytics. In the last few years they were able to reduce property crimes by 26%, even with less police officers per capita than most other cities. An article in Baseline talks about how the LAPD had a similar problem with information quality in the past, but has really been able to turn things around, to the point where analytics have become a true crime-fighting tool.

technorati tags:, , ,

On Shadows and Spreadsheets

June 6th, 2006 by morgan

I read a couple of interesting articles this week that really helped to crystallize some thoughts I was having …

1. Slashdot had an article about the $10 Billion lost annually due to spreadsheet error and fraud.

2. Rick Sherman wrote an interesting article about “shadow systems” in this month’s DM Review.

He defines shadow systems as:

“… groups of spreadsheets and local, customized databases - often Microsoft Access and statistical databases - created by business groups to gather data for their users. While these systems provide exactly the information that business users are asking for, they are rarely part of [a] .. strategy.”

Shadow systems are an undeniably human response to very real, immediate needs. Something needs to get done, it needs to get done efficiently, and it needs to be done with the tools at hand. All too often, an organization will approach its IT group with a need, only to be rebuffed because it doesn’t fit easily into the existing architecture or there is a conflict around priorities. For example, a small department has a need that is critical to its operating function, but it cannot get the attention or the resources of IT, which is focused on meeting the needs of the overall business. In this case, a shadow system is the only logical way to meet the needs of the business needs in a timely manner.

A lot IT people would say that shadow systems are evil, a waste of resources, and should be stamped out of any organization. I would take issue with this characterization. Shadow systems are a legitimate strategy in their own right. However, the organization as a whole has to understand exactly what they are getting into. Let’s look at some of the true characteristics of a shadow system:

Pros

  1. Low initial cost.
  2. Very fast to deploy.
  3. Extremely customized solutions.
  4. Empowers customers and users of data.
  5. Great way to prototype potential solutions to business needs.

Cons

  1. Extreme (almost fatal) issues around quality and consistency ($10 billion a year!).
  2. Duplicated effort across (or even within) organizations.
  3. Difficulty in oversight and accountability.
  4. Maintenance and upkeep cost are undefined and are often much larger than the implementation cost.
  5. True cost is hidden from the organization at large.

Notice that almost all the “pros” look great to business users, who don’t really consider the “cons”. At the same time, all the “cons” send IT people screaming and running for the hills, completely ignoring most of the “pros”. These are the seeds of conflict within an organization. Personally, I think the most interesting problems to solve lie at the intersection of two different domains. Shadow systems lie exactly at the border between IT and the organization-at-large. Fun stuff!

A truly effective information architecture has to consider shadow systems as a legitimate part of the overall data solution. The need for a shadow system should be used as an opportunity for IT to partner with an organization to build something really useful. At the same time, shadow systems should be developed deliberately and not allowed to evolve (or devolve) without direction. I have developed a checklist of the necessary ingredients for a shadow system to be successful:

Morgan’s Laws of Shadow Systems

To be viable over the long-term, a shadow system MUST have …

  1. Organizational sponsorship, including IT oversight and support.
  2. A limited amount of pre-defined resources for personnel, hardware, software, and licensing.
  3. A well-thought out strategy for risk-management, especially considering information quality and regulatory issues (SOX, HIPPA, etc.).
  4. A complete lifecycle, including firm dates for deployment, integration with the overall architecture, and retirement.

technorati tags: , , , , , ,

about


This is the about me section, you will prob. want to edit this. If you want to change the image you may do so by changing the avatar.jpg located in the NewZen images directory.

search

navigation

archives

categories