Elephants, Opaqueness, and Security
We all know the story of the blind men and the elephant. But did you know that this story applies to ETL, Databases, and security?
In a previous life at a major corporation I had to deal extensively with SOX compliance in ETL development. Basically, I needed to write code that accessed and manipulated data on “production” systems, where the data was considered too sensitive for me to actually see it. I could write code to manipulate it, but it had to be on a “development” system that might have a scaled down version of the data, with all sensitive information removed. Sounds like a reasonable balance of functionality and security, right? The only problem was that it was an incredible chore to get scrubbed data delivered to a development environment, as it involved dealing with the corporate security types. Also, once we got the data it might be missing the very pieces of information that our project needed to use. It was not uncommon for these types of issues to add weeks to the completion of a simple project.
So, stretching the analogy (perhaps a bit too far) … Consider me an elephant attendant who’s job it was to get things ready for the big parade. But the owner doesn’t know me very well and is afraid that I will steal his animal. Things need to be secure, and yet I need to be able to have access to the animal in order to get things done. He can’t hack off a foot and ask me to paint the toenails and return it. What are we to do?
Opaqueness vs. Security
The major issue here was that the good folks in security were confusing opaqueness with security. It is true that if you can’t see it you can’t comprimise it. But this is really just one step past embracing security through obscurity. Also, one of the things I didn’t mention above was that it was not unusual for someone to deliver “scrubbed” data where there was just junk in the fields that needed to be secure (for example, just the text “1111111″ in for all records). This was worse than bad, as when you have to join together multiple files (or tables) and the keys you won’t get a true representation of what is going on. Your data isn’t just opaque, it is distorted!
What is really needed is a way to use the defining characteristics of production data without actually being able to see the data itself. Let me see the elephants toenails but keep his leg in a chain so we can’t get away with him. A measured, balanced approach to administering information.
How do we do this?
With a clever application of encryption this should be possible.
Imagine a database server where all data is stored in plaintext (as it is now). All manipulations done on the database end would be done with the plaintext (as it is now). However, by default, all data being sent out would be encrypted. If a given user had the correct permissions (for a database/table/column), the data would not be encrypted on the way out.
Taking this approach would really change things on the database end, as we would have (out of the box):
- A fully secure environment.
- A flexible, permissions based encryption scheme.
- The ability to manipulate and combine data without being able to view it.
Imagine, no more “development” vs. “production” servers, no more replication, no more scrubbing of data, no more code that mysteriously fails upon deployment. A much better solution for everyone.
What is out there now?
There are encryption based solutions for a number of commercial databases. Several vendors have solutions that involve some sort of encrypted data, either on a table or row/column level. However, most of these are clumsy and require key management, which is a nightmare. Oracle 10 has a feature called transparent encryption which is interesting, but only does half the job because data is either fully opaque or fully visible.
BTW, I heard from a friend who is a Sales Engineer at Netezza that they were looking at a few things like this. However, I haven’t seen anything as of yet. Perhaps one of the open-source databases will see the light …








