July 20th, 2006 by morgan
Yesterday, Informatica and Salesforce.com announced an interesting deal that will allow the two tools to interact. After looking at a general overview of the technology, it looks like something that will be relatively useful for users of these products, and help to cement sales for both companies. I wouldn’t call this quite as appealing as the combination of Nike+iPod, but then again I am both a distance runner and an iPod owner.
I think that this is solid recognition that using data as a service in the enterprise is not only possible, but probable. Also, a nice move by Informatica to keep its products fresh and leaning towards the leading edge. At the same time, I wonder if we are trying to teach an old dog new tricks. After all, Cast Iron Systems sells EAI appliances that are also integrated with Salesforce.com. Probably not, as the really profitable customers of ETL tool vendors are probably not the same ones looking at appliances.
At least, not yet …
technorati tags:mashup, Informatica, ETL EAI,information architecture
Posted in ETL, Information Architecture, Systems Integration, In the News | 3 Comments »
July 19th, 2006 by morgan
I found a really interesting tool called MyOwnDB. It is a web-based, multi-user database that does not store data locally. Instead, all information is stored on a remote server, based on Amazon’s Simple Storage Service (S3) infrastructure. You can access the information from any web browser and you don’t need any drivers to access the information (my guess is that access via web services won’t be out of the question in the future). While MyOwnDB is clearly not an enterprise-ready tool, I think it gives a great indication of where things may be heading in the medium- to long-term.
Most of the data tools (databases, ETL, etc.) on the market are very powerful and very mature. For many years, they had a value proposition that was very difficult to beat. However, recently we have seen a real emergence both of open source tools, truly disruptive innovations in that market. All of the sudden software is free. How can you compete directly with that?
I say, you don’t. Instead, disrupt things yet again. A well-run web service (like Salesforce.com, DoubleClick or even Google Calendar) can be even cheaper than a free software license, as it radically reduces the cost and effort of supporting things internally. The emergence of S3 and utilities like it will only make web businesses cheaper and accellerate the acceptance of using web tools for core business functions. Couple this with the high cost of maintaining data-related systems (DB, ETL, BI, etc.) and I think we are on to something.
Posted in Databases, Information Architecture | 6 Comments »
July 19th, 2006 by morgan
This site (more accurately related comments) got a nice mention on Jerri Ledford’s Weblog over at Computerworld. Her post was talking about the proliferation of Business Intelligence in the enterprise, and the growing awareness of its operational need at the CIO level. I think this is great news, but that a more pertinent observation is that in a few years the need for information quality tools will be even more acute. Once any BI system is operational is when most organizations discover how bad their data really is.
I have a lot of respect for Jerri and her blog is easy to read, meaty, and concise. The fact that I have gotten most of my outside references linked from her doesn’t hurt my opinion of her either …
Posted in Information Quality, In the News | No Comments »
July 17th, 2006 by morgan
Have you ever come across a process that is so parameterized that it has become generic? A programmer takes all the things that help their code to make logical decisions and pushes them out to the user. The process then becomes a shell that is all based around user-supplied information, such as command line arguments or a configuration file. At best, it is minimalism run amok.
For example, I came across a shell script that looked like this:
#!/bin/ksh
$1/$2 $(echo $* | sed “s/$1//g” | sed “s/$2//g”)
For those of you who aren’t UNIX people, this script takes a number of arguments, the first two being a path and a program name. The script then pastes them together and executes them along with any extra arguments that might have been provided.
This example would be run like:
test ls $HOME
Of course, you could just run the command
ls $HOME
And get the same result.
This example is extremely silly, as it really doesn’t have any value to the person who is calling it. As a matter of fact, it is less than valuable, as it makes the person calling the program do something they wouldn’t normally do (split a command line into two pieces). What has been done here is that all the effort that it takes to execute the program has been pushed out to the person calling the program.
A parameterized process will allow certain parts of its execution to change based on well-defined, well controlled input from the user. It provides value by allowing the user to do things faster or accomplish things they couldn’t otherwise do easily. A generic process is that has taken parameterization too far. It is merely a container for executing user logic that could be better done elsewhere.
Often, people new to data-centric programming misguidedly try to apply the principles of object-oriented programming to their work. The problem is, you end up with programs that are generic instead of parameterized. Writing good code is a matter of making tools that allow your users as productive and flexible as possible. Normally, this involves a combination of user parameters and internal logic to build something coherent and truly useful.
Here are some rules to see if your processes are in the sweet spot.
A process is well parameterized if …
- It simplifies the use and understanding of another tool or combination of tools.
- It can easily be run in a loop from the command line (in whatever operating system you use).
- It works well with the environment specific features of your operating system, such as pipes and redirection in UNIX.
- It is designed to run on a variety of machines (but not any possible one) without much effort.
A process is probably generic if …
- It absolutely requires a GUI in order to execute.
- There are more command line options than you can easily remember.
- The man page (for UNIX tools) is more than the user can comfortably read in one sitting.
- The primary logic of the process is contained outside the process itself.
- There is no reason to contact you if there is a problem with the process, as it can all be associated to either the user configuration or the underlying system.
- It takes more effort to use the process than it did to develop it.
Remember, there is no free ride in the data life cycle, the logic and effort has do be done somewhere. The only reason to develop tools is to make the users or systems more productive in the long term.
Don’t put a white label and bar-code on your processes. Add value!
technorati tags:information architecture, parameterization, parameters
Posted in ETL, Systems Integration, Automation | No Comments »
July 17th, 2006 by morgan
Almost all the work that I do is based around data, be it architecting, designing, analyzing, modeling, engineering, or any other number of operations. After doing this for enough years, it dawned on me how radically different it is to architect information instead of software.
This post was spurred on by a conversation that I had with a new acquaintance who works in software engineering. Now, he is a pretty sharp person, but couldn’t really put his head around what I did, or why it provided any value. And yet, I made a lot more money than he did and had a lot more job stability, which drove him nuts. Some background: before I started working with data, I was a web programmer, and before that in client applications, mostly working with Java, C, TCL, Python, and Visual Basic. This has its own complexities, and I don’t want to knock those languages or the people who use them. However, this is my frame of reference, so I will use them.
I would almost consider data engineering to be a wholly different field from software engineering, with some considerable overlap around tools. It is almost like the relationship between mathematics and computer science. Mathematicians use (and create) software to do their work, and computer scientists use math to do theirs. However, they usually aren’t doing the same things.
The big differences I have found between data-centric programming and traditional engineering are:
- Logistics are for professionals – With data, I find that that most problems are borne out of information logistics, rather than straight logic.
- Productivity is nebulous – I can recall in college learning to measure output by lines of code written. However, when working with data I can spend all day staring at a pile of data and write one line of SQL and have it be a very productive day.
- Integration is paramount – In dealing with data you will be using databases, ETL tools, web services, and other common components. However, it is very, very uncommon that you will be writing your own tools. So, when working with data you spend a lot more time working with configuration than with building the underlying systems.
- Resource limitations are extreme – When working with data, there are always a series of limitations, from money to people to time to hardware to software. Being effective is a matter of finding out what is stopping you from moving forward and dealing with that. Being successful is a matter of figuring out what is stopping you three steps down the line and dealing with that ahead of time.
- Customization is the rule, not the exception – There is so much customization done in information architecture that two people in similar industries working on similar systems with the same underlying technologies would probably still do things very differently. There is very little directly transferable knowledge, which makes documentation both more and less valuable than you might think.
- Everything is back-loaded – When working with data it is not uncommon to put a ton of work in to design and development and not know if there are any issues until you are in production. True system testing is very difficult or even impossible, especially in large, complicated, or sensitive environments.
- It is all about the people – There is no such thing as an un-interpreted fact. Everything is seen through the lens of the human mind, which may be empirical or may not be. I would not recommend a data-centric career if you don’t like dealing with people.
While this isn’t an exhaustive list, hopefully it will help people who work with data explain to their more software-oriented peers exactly what they do and why they do it.
Posted in Information Architecture, People, Understanding | No Comments »
|

Architected.info is a web site dedicated to information architecture, focusing on transformation and understanding. We focus on these categories through the lens of organizational dynamics, looking at people, practices, and relationships.
Morgan Goeller is the author and maintainer of this website. He has worked as an architect and engineer, specializing in software development, web applications, database engineering, ETL, and information quality.
|