Making Metadata Pay

I am currently working on a process to add some instrumentation to an existing legacy system. Not physical instrumentation, but conceptually similar. Consider the relationship of the speedometer or the tachometer to the engine of an automobile; I am doing the same thing for an ETL process.

Basically, we are trying to track the process of data provided by manufacturing systems through the entire information architecture, from delivery to publishing. A textbook example of metadata creation. To be honest, it isn’t the most exciting work in the world, but it is at least interesting to dissect an existing process and come up with something useful. Most importantly, it is very useful to our customers, and this is the measurement I really care about.

Anyway, one of the big stumbling blocks with tracking metadata is that it is expensive to make it useful. It is easy to build controls into a process that tracks every potential error that occurs. It is really useful to have an overall view of a process (or of all processes across an organization) to see how it is doing, especially over time. Unfortunately, it is often very challenging to bridge the gap between these two.

For this project, I think I found a way to do it fairly easily. There were three important steps …

  1. We decided that all potential errors (invalid records, bad assignments, data that does not join properly) would be written to individual error files, one error per line (separated by a ‘\n’).
  2. We decided to give our error files names that would describe what was inside at a glance. In our case we used a standard of <process name>.<program id>.<useful error description>.err.
  3. We wrote a simple process that would parse these files and put the results into a table on a database. The table had fields for:
  • process name
  • program id
  • useful error description
  • error count
  • parse date

For a file named xfer.999.invalid-file-names.err this would generate a record that looks like:

  • process name = xfer
  • program id = 9999
  • useful error description = “invalid file names”
  • error count = a simple count of the number of lines in the file
  • parse date = the date the operation happened

Now, we can process any error file (from any process) into the same table and now we have a generic method for capturing error data. On the development side, the only real cost is that of adhering to the file naming conventions, which is relatively low. On the operations side, we have the ability to track the results of our processes historically with simple SQL queries. A win-win at very low cost!

I am very pleased with this solution, hopefully it will help you as well.

technorati tags: , , , , ,

Share and earn some karma ...These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • NewsVine
  • Reddit
  • Spurl
Digg this     Create a del.icio.us Bookmark     Add to Newsvine

One Response to “Making Metadata Pay”

  1. Architected Information » Process Meta-Usability Says:

    […] Human intervention – Trap and deal with every possible situation in the design phase. Also, when a human does have to be involved, make it easy for them to understand what is going on at a glance. This probably means documentation, standards, naming conventions, log files, metadata, the works. This isn’t as hard as it sounds, there are some easy steps that you can take to make things run more smoothly for your friendly neighborhood operator. […]

Leave a Reply