Victims of Information Quality

April 27th, 2006 by morgan

Although it isn’t exactly breaking news (from late 2005), this article really highlights the personal impact of information quality on individuals. It talks about Appriss, which provides “innovative technology solutions that help hundreds of local, state, and Federal government agencies serve and protect their citizens”. One of their flagship products provides automated victim notification, which will let the victim of a crime know when the offender is being released from prison. Sounds like a useful thing, providing people with the information they need in an efficient manner.

During a maintenance period (in the middle of the night), the State of Ohio sent a file containing 3,000 names to Appriss. The company did what it was supposed to and notified the victims, even going so far as to wait until the morning to start making calls so as not to disturb anyone. In essence, they did their job by the book, and (I am guessing) meeting their SLA. The folks in government made a regrettable but honest mistake.

Appriss is not an evil corporation and the Ohio state government are not complete buffoons. These things really do happen, all the time and at all levels. The problem is that the cost of an error is incredibly high. Not a lost revenue opportunity, but true emotional anguish for the people who need it the least.

My suggestions for improving these processes (and hopefully avoiding future problems) are:

  1. Define what constitutes a valid transfer between systems. The systems did catch that the file was sent at a time where phone calls could not be made, good catch. Was it normal for files to be delivered at that time? Was it normal for a file of that size to be transferred? Had those calls been made before? Was there a timestamp in the file? Was it accurate? There are a lot of subtle and different ways of profiling data to see if it is “off”. Due to the level of visibility of an error it would be worth it to do whatever it takes to find bad data.
  2. Make automation as transparent and visible as possible. Would an automated email confirmation have prevented this situation from occurring?
  3. Continually improve your processes. Use this (and every) situation as an opportunity to make things better.


NOTE:
I originally saw this mentioned in Baseline, props to them. If you are interested in IT and data, this is an invaluable resource.

Elephants, Opaqueness, and Security

April 26th, 2006 by morgan

We all know the story of the blind men and the elephant. But did you know that this story applies to ETL, Databases, and security?

In a previous life at a major corporation I had to deal extensively with SOX compliance in ETL development. Basically, I needed to write code that accessed and manipulated data on “production” systems, where the data was considered too sensitive for me to actually see it. I could write code to manipulate it, but it had to be on a “development” system that might have a scaled down version of the data, with all sensitive information removed. Sounds like a reasonable balance of functionality and security, right? The only problem was that it was an incredible chore to get scrubbed data delivered to a development environment, as it involved dealing with the corporate security types. Also, once we got the data it might be missing the very pieces of information that our project needed to use. It was not uncommon for these types of issues to add weeks to the completion of a simple project.

So, stretching the analogy (perhaps a bit too far) … Consider me an elephant attendant who’s job it was to get things ready for the big parade. But the owner doesn’t know me very well and is afraid that I will steal his animal. Things need to be secure, and yet I need to be able to have access to the animal in order to get things done. He can’t hack off a foot and ask me to paint the toenails and return it. What are we to do?

Opaqueness vs. Security

The major issue here was that the good folks in security were confusing opaqueness with security. It is true that if you can’t see it you can’t comprimise it. But this is really just one step past embracing security through obscurity. Also, one of the things I didn’t mention above was that it was not unusual for someone to deliver “scrubbed” data where there was just junk in the fields that needed to be secure (for example, just the text “1111111″ in for all records). This was worse than bad, as when you have to join together multiple files (or tables) and the keys you won’t get a true representation of what is going on. Your data isn’t just opaque, it is distorted!

What is really needed is a way to use the defining characteristics of production data without actually being able to see the data itself. Let me see the elephants toenails but keep his leg in a chain so we can’t get away with him. A measured, balanced approach to administering information.

How do we do this?

With a clever application of encryption this should be possible.

Imagine a database server where all data is stored in plaintext (as it is now). All manipulations done on the database end would be done with the plaintext (as it is now). However, by default, all data being sent out would be encrypted. If a given user had the correct permissions (for a database/table/column), the data would not be encrypted on the way out.

Taking this approach would really change things on the database end, as we would have (out of the box):

  1. A fully secure environment.
  2. A flexible, permissions based encryption scheme.
  3. The ability to manipulate and combine data without being able to view it.

Imagine, no more “development” vs. “production” servers, no more replication, no more scrubbing of data, no more code that mysteriously fails upon deployment. A much better solution for everyone.

What is out there now?

There are encryption based solutions for a number of commercial databases. Several vendors have solutions that involve some sort of encrypted data, either on a table or row/column level. However, most of these are clumsy and require key management, which is a nightmare. Oracle 10 has a feature called transparent encryption which is interesting, but only does half the job because data is either fully opaque or fully visible.

BTW, I heard from a friend who is a Sales Engineer at Netezza that they were looking at a few things like this. However, I haven’t seen anything as of yet. Perhaps one of the open-source databases will see the light …

Stress and Information Quality

April 25th, 2006 by morgan

Stress” is a good way to describe what happens when data is reused or repurposed, as the term really puts things into more human terms. Kind of like the effects of a dealing with a corporate reorganization, execept on a systems level. JM Juran defined data quality very precisely as, “fitness for use”. The issue is that “use” seems like a concrete term, and in the short-term (say the life of the project that creates the data) it probably is. However, for the medium- or long-term (say, the life of management of the data) the term “use” is going to change, perhaps radically. Not surprisingly, this disconnect translates directly into stress, both on people and systems. This isn’t because techies have built a bad system or because the users gave poor specifications. This is because life moves on, situations change, and organizations evolve. No blame, just reality.

The challenge then becomes to design processes and products that are flexible and forward thinking enough (”fit for future uses”). An interesting analogy is with the total cost of ownership for a PC. Google engineers have found that for their use, the electricity to run costs more than hardware itself. To reduce the long term stress (in this case, cost) it is worth it to architect the right solution from the beginning, even if it cost a bit more.

about


Architected.info is a web site dedicated to information architecture, focusing on transformation and understanding. We focus on these categories through the lens of organizational dynamics, looking at people, practices, and relationships.

Morgan Goeller is the author and maintainer of this website. He has worked as an architect and engineer, specializing in software development, web applications, database engineering, ETL, and information quality.

search

navigation

archives

categories