The Challenges of Real-Time
A lot of my recent work has been in real-time (actually near real-time) data warehousing. There are some real challenges for ETL and information quality when moving towards a real-time environment. Everything seems to become more dificult, and at times the constraints become almost unbearable to work with. You really, really, really need a real-time system in order to justify building one, especially from a data-centric point of view.
What got me writing about this was reading an that “some Cingular subscribers endure 4-hour-plus outage (and the fact that this isn’t the first time this has happened). I knew exactly what the Cingular representative was talking about when I read this quote …
“There’s a database that has all the customer numbers and somehow, we don’t knowwhy at this point, about 10 percent (of customers in the area) were prohibited from making or receiving calls,” Merriman said.
The big issues around real-time systems are in dealing with emergence within the system. Things get into an unexpected state, and it is very difficult to figure out why, especially after the fact. This is because when are running in real time:
- Resources are at a premium, and often this means that only enough data is kept in order to process what is available right now.
- Data handling is set up to ensure that the system doesn’t break, not to ensure optimal quality.
- Downtime usually means there is normally no information coming in. It is usually very hard to know what you know you don’t know.
- Breakage is normally catastrophic and the priority is on getting
things going again, not performing detailed analysis on what happened.
Because you have a lot less information than in a batch-processing type system it is a lot harder to figure out what is going on. Good luck to the Cingular engineers in preventing this type of thing in the future
technorati tags:data quality, information quality, real time, etl








