Data Centrism

Almost all the work that I do is based around data, be it architecting, designing, analyzing, modeling, engineering, or any other number of operations. After doing this for enough years, it dawned on me how radically different it is to architect information instead of software.

This post was spurred on by a conversation that I had with a new acquaintance who works in software engineering. Now, he is a pretty sharp person, but couldn’t really put his head around what I did, or why it provided any value. And yet, I made a lot more money than he did and had a lot more job stability, which drove him nuts. Some background: before I started working with data, I was a web programmer, and before that in client applications, mostly working with Java, C, TCL, Python, and Visual Basic. This has its own complexities, and I don’t want to knock those languages or the people who use them. However, this is my frame of reference, so I will use them.

I would almost consider data engineering to be a wholly different field from software engineering, with some considerable overlap around tools. It is almost like the relationship between mathematics and computer science. Mathematicians use (and create) software to do their work, and computer scientists use math to do theirs. However, they usually aren’t doing the same things.

The big differences I have found between data-centric programming and traditional engineering are:

  1. Logistics are for professionals – With data, I find that that most problems are borne out of information logistics, rather than straight logic.
  2. Productivity is nebulous – I can recall in college learning to measure output by lines of code written. However, when working with data I can spend all day staring at a pile of data and write one line of SQL and have it be a very productive day.
  3. Integration is paramount – In dealing with data you will be using databases, ETL tools, web services, and other common components. However, it is very, very uncommon that you will be writing your own tools. So, when working with data you spend a lot more time working with configuration than with building the underlying systems.
  4. Resource limitations are extreme – When working with data, there are always a series of limitations, from money to people to time to hardware to software. Being effective is a matter of finding out what is stopping you from moving forward and dealing with that. Being successful is a matter of figuring out what is stopping you three steps down the line and dealing with that ahead of time.
  5. Customization is the rule, not the exception – There is so much customization done in information architecture that two people in similar industries working on similar systems with the same underlying technologies would probably still do things very differently. There is very little directly transferable knowledge, which makes documentation both more and less valuable than you might think.
  6. Everything is back-loaded – When working with data it is not uncommon to put a ton of work in to design and development and not know if there are any issues until you are in production. True system testing is very difficult or even impossible, especially in large, complicated, or sensitive environments.
  7. It is all about the people – There is no such thing as an un-interpreted fact. Everything is seen through the lens of the human mind, which may be empirical or may not be. I would not recommend a data-centric career if you don’t like dealing with people.

While this isn’t an exhaustive list, hopefully it will help people who work with data explain to their more software-oriented peers exactly what they do and why they do it.

Share and earn some karma ...These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • Furl
  • NewsVine
  • Reddit
  • Spurl
Digg this     Create a del.icio.us Bookmark     Add to Newsvine

No Responses to “Data Centrism”

No comments yet

Leave a Reply