I have been working on some more detailed articles for the wiki to help illustrate some ideas about information quality. While I don’t want to just duplicate that article here, I thought I would post some things on the blog and get some feedback.
I am currently working on the Information Quality Pyramid, which discusses the various components that go into improving information quality across an organization:
The pyramid is made up of several parts, each of which are important in their own right. However, the base components (in blue in green) have the interesting combination of being very important, terribly inexpensive, and totally unglamorous.
Understanding Your Organization – The single most important thing that you can do to ensure success in any information quality effort. Without a solid understanding of how your organization works it is virtually guaranteed that you will not be able to deliver the solution your customers need. This (coupled with the need for extreme customization) is one of the reasons that it is very difficult to outsource this type of work.
Architecture and Design Practices – To put it bluntly, if you build your information architecture in an inconsistent manner then you have to expect inconsistencies in its output. These inconsistencies become quality-related issues very quickly. If you can proactively address (or at least mitigate issues around) consistency through your architecture then you can dramatically improve the quality of information that you produce.
Automation – The key to high-value, high-quality information architecture is automating everything possible. This is because:
- Moore’s Law will double the speed of computerized processes every two years. It is pretty tough for humans to keep up.
- Humans make mistakes.
- In order to automate a process, it has to be understood by more than just the designer or the developer.
Sanity Checks – The easiest and most cost effective ways to catch issues before they become problems.
Data Profiling – The only way to understand your information is to know your data. Intimately. Regularly. Historically. Profiling takes a generic look at an arbitrary dataset and discovers important statistical information about it. Profiling is by far the cheapest and most reliable way of examining data (think of it as an expanded sanity check).
Process Testing – Instead of looking at a dataset in a generic way, process testing looks at things in a very specific way. These should be customized tests that will tell information that are automated and deliver results that are unique to the process. Because of the level of customization and effort, this is significantly more expensive than profiling or sanity checks.
Human Intervention – Anything that involves humans, from adjusting processes already in production to performing manual analysis to resolve concerns to creating new code. Think of it as if all of information quality was outsourced to a 3rd party company and all personnel costs came directly out of your budget. This is the true cost of IQ, it is just that people see it in a more abstract sense.
The one category that I can see people might think is missing here is metadata. I think metadata is an incredibly important part of information quality, but I tend to value it in its most concrete form instead of in the abstract. I will get into this more in the wiki article.
Any feedback would be most appreciated!
technorati tags:information architecture, information quality, data quality, automation, metadata