May 29th, 2006 by morgan
There is a heartbreaking story that really demonstrates the sometimes all-too-high cost of poor information quality in the real world. The article IT Integration: The Army’s Pay Misstep discusses the problems that the US Army Reserve has had in paying its people properly and the impact that it has on real people, especially wounded soldiers and their families.
Like so many IQ related stories, this one has a bit of everything:
- Organizations Outpacing Systems
- Legacy Applications
- Manual Data Entry
- Complex Business Logic
- Regulatory Compliance
- Technical and Process Wizards Keeping Everything Running
The sad thing is, often it is the individual service members who end up paying the price. Something worth considering around Memorial Day.
technorati tags: information quality, data quality, integration, case, studies, automation, people, practices, data
Posted in Systems Integration, Information Quality, Case Studies, In the News, Automation, People, Practices | 1 Comment »
May 25th, 2006 by morgan
I was thinking about my previous article on metadata
and would like to expand on some of those ideas. I think that for ETL we can generally break metadata down into two types:
- Referential Metadata is a maintained repository that describe the data or process that we are interested in.
- Inferential Metadata is derived from the environment from which the data or process was created and/or lives.
For example, imagine a dataset that has a full description of how it is created, contents, formatting, use, and history. This information is stored in a central location (hopefully with the metadata for other files). This is would be referential metadata.
Now, imagine the same exact same dataset that is created by an undocumented shell script that writes to a certain directory on a certain server that only the operations staff knows about. There is no referential metadata, so we can only describe it with inferential metadata. Unfortunately, in the real world (and especially with legacy applications) all too often the only metadata available is inferential metadata.
Now, it may sound like referential metadata is the only way to go if you are building a system, but I would disagree. If this is what someone is telling you, then they are most likely a salesperson for an ETL tool company or a consultant who is paid by the hour
I would argue that any efforts around metadata should be evaluated on a cost/benefit basis, and that on that criteria you get the most bang for your buck with a combination of inferential metatada and standards-based programming practices.
More to come …
technorati tags: metadata, etl
Posted in ETL, Information Architecture, Systems Integration, Automation, Metadata | No Comments »
May 23rd, 2006 by morgan
I am currently working on a process to add some instrumentation to an existing legacy system. Not physical instrumentation, but conceptually similar. Consider the relationship of the speedometer or the tachometer to the engine of an automobile; I am doing the same thing for an ETL process.
Basically, we are trying to track the process of data provided by manufacturing systems through the entire information architecture, from delivery to publishing. A textbook example of metadata creation. To be honest, it isn’t the most exciting work in the world, but it is at least interesting to dissect an existing process and come up with something useful. Most importantly, it is very useful to our customers, and this is the measurement I really care about.
Anyway, one of the big stumbling blocks with tracking metadata is that it is expensive to make it useful. It is easy to build controls into a process that tracks every potential error that occurs. It is really useful to have an overall view of a process (or of all processes across an organization) to see how it is doing, especially over time. Unfortunately, it is often very challenging to bridge the gap between these two.
For this project, I think I found a way to do it fairly easily. There were three important steps …
- We decided that all potential errors (invalid records, bad assignments, data that does not join properly) would be written to individual error files, one error per line (separated by a ‘\n’).
- We decided to give our error files names that would describe what was inside at a glance. In our case we used a standard of <process name>.<program id>.<useful error description>.err.
- We wrote a simple process that would parse these files and put the results into a table on a database. The table had fields for:
- process name
- program id
- useful error description
- error count
- parse date
For a file named xfer.999.invalid-file-names.err this would generate a record that looks like:
- process name = xfer
- program id = 9999
- useful error description = “invalid file names”
- error count = a simple count of the number of lines in the file
- parse date = the date the operation happened
Now, we can process any error file (from any process) into the same table and now we have a generic method for capturing error data. On the development side, the only real cost is that of adhering to the file naming conventions, which is relatively low. On the operations side, we have the ability to track the results of our processes historically with simple SQL queries. A win-win at very low cost!
I am very pleased with this solution, hopefully it will help you as well.
technorati tags: etl, metadata, data, quality, information, architecture
Posted in ETL, Information Architecture, Information Quality, Case Studies, Automation, Understanding, Metadata | 1 Comment »
May 23rd, 2006 by morgan
I was especially inspired today by the beautiful photo on the home page for the MUJI Awards. To me, the amazing thing about design is the way that information is communicated on a deep and almost unconscious level. It is something that I strive to create in my own designs around information, although not always successfully.
This is a design competition to look at the "extremities" of a room instead of the major pieces, called Sumi. From what I could gather, "sumi" is a Japanese word that means "corner/edge/end". From the time I spent living there, I am sure that any understanding that I have, there is probably a great deal of cultural inflection that I am completely missing, so please forgive me.
In the realm of architected.info we almost always deal with the extremities. The really interesting problems always come at the edges, at the places where two or more data sets collide and become something new. The collision causes conflict, and with it comes opportunity for greater understanding. Pushing bits from server to server is easy. Making something truly wonderful takes vision.
BTW, I first saw the link for this on Signal Vs. Noise.
technorati tags: sumi, design, architecture, ETL, data, information, integration
Posted in Information Architecture, Systems Integration, Transformation, Understanding | No Comments »
May 14th, 2006 by morgan
“There is no greater impediment to the advancement of knowledge than the ambiguity of words.”
– Thomas Reid
Posted in People, Understanding | No Comments »
|

Architected.info is a web site dedicated to information architecture, focusing on transformation and understanding. We focus on these categories through the lens of organizational dynamics, looking at people, practices, and relationships.
Morgan Goeller is the author and maintainer of this website. He has worked as an architect and engineer, specializing in software development, web applications, database engineering, ETL, and information quality.
|