Information Quality in Black and White
One of the real challenges about information quality is that the field is still very abstract. In the academic world, the theories (like PSP/IQ) are still being written and discussed. In practice, this means that there isn’t a standard way of doing things. Or, to be more precise, everyone has a “standard” way of doing things, it is just that they are all different. So, let me add my $0.02, perhaps this will help someone in some general way.
Previously, we talked about the semantic and statistical approaches to information quality. Two distinctly different ways of trying to do the same thing. How can we reconcile these two different ideas and actually accomplish something in the real world? The best way I know is to try and fall back to some well established practices and try to adapt them to our needs. While we are working with data instead of applications, I think that these approaches correspond directly to principles from software engineering. For most applications, there are two types of testing:
- White box testing uses an intimate knowledge of the internals of an application and tests to make sure everything works as expected.
- Black box testing uses an expectation of the behavior of an application and tests to make sure that it does what it should. A black box test will not test anything internal to the application, just the inputs and outputs.
Both of these types of tests do different types of things and work in different ways. As this terribly obscure but well-written and concise tutorial points out:
- White box testing does quality control, while black box testing does quality assurance.
- Black box testing finds sins of omission, white box testing finds sins of comission,
- Black box testing can be started as soon as the specifications are available, while white box testing must wait until the code is written.
- Black box testing is a lot cheaper than white box testing.
- Both types of testing are needed in order to truly verify that things are working properly.
In my opinion, a semantic approach to information quality is the equivalent of a white-box test. Conversely, the statistical approach is the equivalent of a black-box test.
Next in our series, we will be exporing the semantic approach to information quality with practical examples.









August 4th, 2006 at 12:27 pm
[…] When an organization begins a concerted effort to improve its information quality, often it gets stuck in trying to figure out exactly where to start. Previously, we had discussed the semantic and statistical approaches to information quality and linked them to black box and white box testing (you may want to take a look at these if you aren’t familiar with the subjects, as these are the basis for this article). […]
August 19th, 2006 at 4:25 am
[…] With this in mind, is semantic definition the most efficient way to improve information quality? Is a statistical definition the most descriptive way to understand information quality? We will explore both of these methods in the next part of this series. Share and earn some karma …These icons link to social bookmarking sites where readers can share and discover new web pages. […]