January 2008     |     Subscribe     |     Archive     |     Contact Dataupia
The Whole Story on Your Data

How much of your company's most useful information lies outside the data warehouse?

Data warehouses are selective because they were designed to organize a particular type of business data. Information has to be strictly structured for the relational database management system most warehouses use. According to a late 2006 survey of data management professionals conducted by TDWI Research for The Data Warehousing Institute (the association of business intelligence and data warehousing professionals), only about half of all corporate information meets this criterion. That leaves half inaccessible by your warehouse or BI applications.

In fact, that figure may even be higher. The TDWI study asked about information on servers. When individual PCs are taken into account, some put the ratio of structured to unstructured data closer to 20-80.

What is unstructured data?

The term "unstructured data" refers to information that is not in a form that can be recognized by your data warehouse. This includes e-mail, word-processing documents, presentations, graphics, Web pages, or graphics and media files. These files may contain data crucial to your business, such as customers' buying habits, how much they like and use your products, or the work habits of your employees.

For example, your marketing or product team might create a competitive analysis, with the final deliverable created as a series of presentation slides. This information would be considered unstructured. Similarly, a series of critical discussions might take place via e-mail. Again, as text files, these would not be immediately recognizable by your data warehouse. Finally, with multimedia becoming more prevalent, an increasing amount of data is being created using audio or video formats, also unstructured formats.

Pressure to organize

Over the last few years, pressure has been building for companies to get a better handle on this so-called "unstructured" or "semi-structured" data. First, there are outside factors. New regulations such as Sarbanes-Oxley are requiring corporations to institute controls so that they know what data is on their networks and can track and account for it at all times. Failure to do this can lead to fines and, perhaps more importantly, a damaged reputation. Similarly, changes to rules regarding discovery in civil litigation now require any party to a lawsuit to make its electronic information available to other parties. That means this information must be in a form that is searchable.

Beyond such outside forces, the ability to harness and use this unstructured data can be an important competitive advantage. Imagine analyzing the unstructured information in call-center representatives' notes and recorded audio transcripts to identify patterns that lead to quicker and higher-quality service to customers. You could then use this knowledge to create best practices and improve the overall performance of your call centers, thus increasing customer satisfaction.

Get started with text mining

The Holy Grail, some believe, is the seamless integration of structured and unstructured data, but that still may be a long way off. In the meantime, new technologies are being applied that can help organizations get started bringing unstructured data into the data warehouse. Intelligent search and text mining (see "Organizing and Mining Unstructured Text"), for example, are helping some organizations turn natural-language text into information that can be analyzed. This type of application can be useful for locating trends in call center logs, or organizing text-based service records.


 Excellent    Very good    Good    Fair    Poor



    Next Article »
What is the Difference Between Querying and Browsing Data?
The Whole Story on Your Data
What is the Difference Between Querying and Browsing Data?
Organizing and Mining Unstructured Text