Two Methods for Defining Information Quality

July 31st, 2006 by morgan

In Information Science today two competing methods for indexing information: semantics and statistics. While this may not seem to have a lot to do with information quality, bear with me and I promise I will link them up (eventually). Both methods approximately the same job, that is to allow information to be read and manipulated by machines on a grand scale. The difference is in how this is done.

  • A semantic approach would have the author define concepts and relationships ahead of time. You can see some examples in this tutorial, as they are long and would be difficult to reproduce here. The Semantic Web would be a good example of this methodology.
  • A statistcal approach would simply look at the text that was available and try to determine what is there and how it relates to other things through textual analysis and aggregation. Google is a good example of the use of this approach.

The semantic way of looking at things is very abstract and much more rigorous. It says that there is a truth to be represented, it designs a way of doing it, and expects everyone to follow along. The statistical way of looking at things is much more flexible. It says that there are things to be gleaned regardless of form, and that we should accept this fact and try to make the best of things. Not surprisingly, the semantic approach is the favorite of academia and has been under development for many years, while the statistical approach is already in real-world use.

What got me thinking about this in the first place was the latest issue of Baseline. Specifically, it was an article from Paul A. Strassman titled, “How Clean Data Can Transform Your Business”. Normally Strassman’s stuff is pretty good, but it is helpful to note that Strassman is a senior consultant to the Department of Defense and has been in the business for a long, long, long time.

The crux of his argument was that:

The first step in business transformation: enterprisewide standardization of data. That calls for the declaration of a metadata directory as the template for defining data that can circulate within a firm’s information systems. The policy and implementation of an enforceable metadata directory likely will be resisted by bureaucrats, who see this as a threat to their indispensability. It will not be welcomed by systems developers, contractors and vendors, who prefer to concentrate on upgrading software as a technologically more interesting—and profitable—task.

A classic argument for a semantic model of truth. We just need to get everything defined and then it will be smooth sailing from there. For most vendors and consultants, the semantic view is the accepted one, probably because it is so structured and logical, although at least partially because it all those hours spent defining concepts are billable. Even Strassman acknowledges this reality …

To reach agreement on the representation, semantics and taxonomy of data, you will likely go through a painful political process that must be adjudicated by line management. This can get messy because it will reveal that a large percentage of installed software perpetuates incompatible, unreliable, insufficiently secure and delayed information.

With this in mind, is semantic definition the most efficient way to improve information quality? Is a statistical definition the most descriptive way to understand information quality? We will explore the basis for both of these methods in the next part of this series.

Focus on Information Quality

July 31st, 2006 by morgan

For the near future I am going to be focusing on information quality and its role in information architecture.  I think it will be useful to focus on one broad subject for a while and start trying to get into it with a little more depth.  While I am currently planning this to be (about) a five part series, it could go longer or shorter.

From Architectures to Ecosystems

July 29th, 2006 by morgan

Slow Leadership has an interesting article about the competition within the workplace. It is worth a read on its own merits, but I think it has some interesting applications to the world of information architecture. In a dynamic, data-centric organization, it is very common to have two (or more) systems that have some kind of overlap. While some would consider this to be a virtual heresy, it is something that can easily occur when there are acquisitions, mergers, or if systems simply evolve (or devolve) over time.

The thing that strikes me is that right now virtually every expert pushes for an extremely competitive environment, at least according to SL’s criteria (although Rick Sherman’s concept of shadow systems is the notable exception). It is kill or be killed, only the fittest systems survive. Total integration. The problem is that these type of purely competitive systems rarely exist outside the textbooks.

In reality, the architecture we end up with is almost always an ecosystem filled with a heterogeneous population that can only thrive through cooperation and diversity. This is true with people, processes, systems, data, and reporting. Every architect wants a coherent architecture, and a huge part of our work is focused on integrating and unifiying. At the same time, we also need to be OK the fact that we will probably never achieve this lofty goal.

Think Locally, Act Globally

July 29th, 2006 by morgan

In Roads to SOA, Ronan Bradley wrote an interesting piece titled “Data: The Heart of SOA“. He hits the nail right on the head with this article. He writes …

There are two fundamental approaches to dealing with data integration within SOA: Build a global data model for your business which each connecting business unit must bridge into and out of in order to integrate with other units or build multiple data models.

There is a real difference of opinion between the folks who think there should be a single view of the customer and those who advocate a more organic methodology (like a market-based approach). After watching several different companies try to implement a single data repository and overall data model I have mixed feelings about the whole thing. For the most part, they seemed to be political exercises to try and force compliance in using a solution that did not meet the needs of business units. At the same time, there is very real value to be gained through integration and shared resources (especially with business intelligence).

SOA really offers an elegant solution for this conundrum. Individual groups can still run their business-critical applications AND they can be integrated into a single data repository with very little pain for data providers or consumers. Where it makes sense, operations can be pooled and where it doesn’t make sense, they can be seperate but accessible.

For example, it often makes sense to have a global business intelligence solution to minimize costs for licensing and operations. but having your data on different systems can make things difficult. An organization has a business-critical system that runs on an hourly basis and a data warehouse that is updated nightly. An SOA interface allows you to have your cake and eat it too. This won’t eliminate ETL, but it certainly makes it easier to do.

Business Intelligence and the 1% Rule

July 28th, 2006 by morgan

The Customer Evangelists have an interesting article about citizen participation in social websites (like Yahoo Groups and Wikipedia). Basically, their thesis is that in a community-driven site, 1% of the people will actually create content. In addition, they figure that 10% will interact with content (aggregate, synthesize, etc.). However, the overwhelming majority of people will be passive consumers. It ends up looking kind of like this

The 1% Rule is important to help set expectations for the appropriate role of user-generated content within your information architecture. As I read this, I thought of:

  1. How users often resist moving to a self service model for business intelligence, even when it might benefit them.
  2. Why a wiki might be dominated by a few strong individuals.
  3. A good argument for limited customer participation in design.

about


This is the about me section, you will prob. want to edit this. If you want to change the image you may do so by changing the avatar.jpg located in the NewZen images directory.

search

navigation

archives

categories