On Metadata [part 1]

I was thinking about my previous article on metadata

and would like to expand on some of those ideas. I think that for ETL we can generally break metadata down into two types:

  1. Referential Metadata is a maintained repository that describe the data or process that we are interested in.
  2. Inferential Metadata is derived from the environment from which the data or process was created and/or lives.

For example, imagine a dataset that has a full description of how it is created, contents, formatting, use, and history. This information is stored in a central location (hopefully with the metadata for other files). This is would be referential metadata.

Now, imagine the same exact same dataset that is created by an undocumented shell script that writes to a certain directory on a certain server that only the operations staff knows about. There is no referential metadata, so we can only describe it with inferential metadata. Unfortunately, in the real world (and especially with legacy applications) all too often the only metadata available is inferential metadata.

Now, it may sound like referential metadata is the only way to go if you are building a system, but I would disagree. If this is what someone is telling you, then they are most likely a salesperson for an ETL tool company or a consultant who is paid by the hour ;-)

I would argue that any efforts around metadata should be evaluated on a cost/benefit basis, and that on that criteria you get the most bang for your buck with a combination of inferential metatada and standards-based programming practices.

More to come …

technorati tags: ,

Share and earn some karma ...These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • NewsVine
  • Reddit
  • Spurl
Digg this     Create a del.icio.us Bookmark     Add to Newsvine

No Responses to “On Metadata [part 1]”

No comments yet

Leave a Reply