June 2008     |     Subscribe     |     Archive     |     Contact Dataupia
The Independent Spirit Behind Data Warehousing, Part 2

In Part 1 of this profile, we looked at data warehousing pioneer Bill Inmon's passion for both work and the people he meets. The profile continues in this issue with a look at his vision for the future of data warehousing: the incorporation of unstructured data.

What is Inmon's vision of the future of data warehousing? His latest book, Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence, relates to that very issue: the incorporation of unstructured data - that is, data that exists in text rather than within traditional data constructs.

"Most corporations have thousands of contracts, and they have no idea what's in them. That's all textual data. There is no human that has the mental capacity to track a thousand contracts. So when executives talk about their contractual liability, the truth of the matter is, they're only making an educated guess."

There are industries, he believes, where the value for such a capability is clear. "Take the oil and gas industry. They have accident data, repair data, incident reports, all in textual format," he says. By putting that information into a data warehouse, they can now query about safety and reliability. "If you can improve safety, you've got something that's worth its weight in gold."

Even as enthusiastic as Inmon is, he understands there are challenges in the representation of data. "Ask a roomful of people to write down today's date, and you'll get a dozen representations of the same thing. But if you're going to analyze the text, you have to have a standard date format, because the engine that does the comparison and analysis expects dates in a singular format. That's an almost trivial example, but it creates a real challenge."

Even in specialized situations, he notes, data representation becomes a challenge. "We were working with some medical researchers who had collected 30 years of notes on a certain kind of cancer. They wanted to take those notes and make sense of them." But not only had medical terminology changed over the last 30 years, but different medical specialists, such as cardiologists and hematologists, refer to blood differently. "You have to address the terminology issue."

Inmon has conciliatory words for those who feel trepidation at the idea of adding more to their data warehouse - in the form of unstructured data. Simply put, a data warehouse should be flexible enough to accommodate such change. "A data warehouse is more like a city than a house," he says. "What goes into a house is predictable: a kitchen, bedrooms, bathrooms. A city is much less organized - there's a financial district, apartments, entertainment, an airport, docks. And if you think about it, a city never stops growing. So a data warehouse is much less predictable as to what will be there."

That's an important attitude to adopt in the creation of a data warehouse. "Companies usually start with one set of data, such as financial or sales," says Inmon. "But after two or three years, they discover other kinds of data that can be meaningful or useful to integrate into the data warehouse." The takeaway: building a data warehouse is not a one-time exercise. Once you start, you continue to build it, just as long as your company grows and changes.

The Independent Spirit Behind Data Warehousing, Part 1


 Excellent    Very good    Good    Fair    Poor



« Previous Article
Business Analytics - Getting the Point
   
The Problem with Dashboards
Business Analytics - Getting the Point
The Independent Spirit Behind Data Warehousing, Part 2