August 2008     |     Subscribe     |     Archive     |     Contact Dataupia
Ask the Experts Video Blog from BeyeNETWORK Network

Question: Can you please discuss the scalability and performance challenges with large quantities of data and the need for real-time data flows?


Answered by John Myers, founder, The Blue Buffalo Group

This is an interesting question because large data sets have unique requirements in terms of bandwidth, transformation and analytics requirements no matter what the time latency requirements are.

The big question with these large data sets is "Time is money; how fast do you want to go?"

The business challenge and operation risk for each of these scenarios associated with large data sets needs to be assessed with a true latency requirement and not simply an arbitrary decision about real time versus batch to be made in those particular areas.

Two examples of this might be a business owner who comes in and says, "I went to a conference that talked all about real time and how real time is the competitive advantage we need to have for our organization, therefore we will be real time in our assessments or in terms of our data flows.'

A technologist might make the same type of decision saying, "Other organizations are real time; our organization must be real time as well."

This is all well and good but if it doesn't apply to the competitive advantage of a particular organization or at least to the reduction of costs or risks, do you really get the value associated with moving to a real-time or near-real-time environment?

You need to make an assessment in terms of risk to operations or in terms of the business case if you want to go real-time or near-real-time as opposed to batch.

In terms of network optimization or fraud that case can be made fairly easily. If you're running a route at a loss and someone starts moving large amounts of data down that pipe, that's information you need to have in a near-real-time environment or as soon as you can know about the situation because you can't recoup losses associated with that particular network path – from either the provider who's allowing it to you or the customers who are using it.

In terms of fraud, if you have premium rate services where you're passing dollars outside the organization paying for those services, you need to make sure you make a decision on that fraud as soon as it starts happening so that you can minimize your losses associated with it. You're looking at green dollars moving out of the organization from a risk perspective or from a cost-benefit analysis perspective. And that's where real-time or near-real time comes into play.

If you have two large data sets, one of which gets created in real time and the other a week or month later, you may not need to move that one set of data and compare it as fast as you would like. So you need to make an assessment: Is that comparison valid in a real-time environment and can we make that a competitive advantage – and a significant competitive advantage – so that it makes sense from a business case perspective?

Other organizations, such as retail, have similar data requirements. They're pulling information down from grocery stores or consumer electronics looking at point-of-sales systems where you need to look at this data. The question is do you need to look at it in real-time to establish a competitive advantage?

To review today's question: "What has been your experience with the need for real-time data flows in the real world?"

I think real-time data latency requirements exist in organizations; but is it a blanket statement so that you need to do this to everything about the business? Or do you need to look at it on a case-by-case basis so that you understand whether we are going to get a cost benefit associated with moving into real-time or near-real-time with our data flows for analytics or for processing and things of this nature?

To watch the video, post a reply or submit a new question, please visit http://www.b-eye-network.com/blogs/ask_the_experts/index.php.

About John Myers
John Myers is an "Expert" blogger for the BeyeNETWORK covering the worlds of telecommunications (telecom) and business activity monitoring (BAM). John is the founder of the Blue Buffalo Group, whose goal it is to inform and educate on the subjects of business intelligence and telecommunications. Whether it be the technical aspects of data warehousing or the business aspects of the telecommunications industry, the Blue Buffalo Group is dedicated to the bringing the latest information in an interesting way. www.bluebuffalogroup.com.


 Excellent    Very good    Good    Fair    Poor



« Previous Article
Information Insight Faster
  Next Article »
The Skinny on Massively Parallel Processing
Information Insight Faster
Tackling the Challenge of Real-Time Data Flows
The Skinny on Massively Parallel Processing