January 2008     |     Subscribe     |     Archive     |     Contact Dataupia
Organizing and Mining Unstructured Text

If unstructured data is to play a role in business intelligence and data warehousing, industry has to figure out how to structure it in a way that the data warehouse and its reporting tools can understand.

Perhaps the biggest and juiciest target is text - all those e-mails and Word documents - and this is an area where technology may be making some progress. According to Philip Russom, senior manager at TDWI Research, some companies are starting to use business intelligence search and/or text analytics technologies to turn natural-language text into nuggets for the warehouse. Search technology can index reports from business intelligence platforms so users can more easily find what they are looking for. Analytics parses human-language text and converts it to some form of structured data.

Companies in the healthcare, government, financial, insurance and automotive industries - which have lots of critical business information in text - are blazing some trails in this area, says Russom. Insurance companies, for example, can spot fraud by mining the textual information in all the forms associated with the claims process. By looking for patterns in locations, people's names and other identifiers, companies can pinpoint abnormally high incidents of accidents or claims. The data can also be used to improve actuarial tables. "If they see that a certain street is prone to flooding, they can adjust the rates that they apply to that area," notes Russom. They could also share the information with auto companies if certain types of cars are involved in accidents more frequently.

Automotive is another industry that is using text analytics. The Transportation Recall Enhancement, Accountability, and Documentation (TREAD) Act, which was passed in the wake of the failure of Firestone tires on Ford Explorers, requires car manufacturers to collect and retain auto service records and deduce information about which vehicles are prone to certain mechanical problems that may lead to accidents. Because most of those service records are in text, this legislation has driven an explosion of text mining and analytics in the automotive sector, says Russom.

But by the far the hottest application, and this cuts across all industries, is in call centers, he notes. "Text information from call center applications gives you a lot of information about customer satisfaction, and from there you can link it to the probability of customer churn," he says. Text analytics can interpret customer satisfaction levels from these logs, alerting the company to dissatisfied customers it's in danger of losing. The company can then proactively reach out to those customers with an apology or special offers in an effort to retain them, he explains.

But while companies are making progress with text, other types of unstructured data remain a big challenge. Multimedia files like photos, audio and video "contain extremely valuable information, but it's not in a traditional numeric or natural language format," says Russom. "The industry is just starting to use tools that can find information in text and turn that into structured data. The idea that you'd somehow parse multimedia files - people just don't even entertain the notion."


 Excellent    Very good    Good    Fair    Poor



« Previous Article
What is the Difference Between Querying and Browsing Data?
   
The Whole Story on Your Data
What is the Difference Between Querying and Browsing Data?
Organizing and Mining Unstructured Text