Contact Us    |     Sitemap    |     Privacy Policy

Q&A with Arun Taneja on Data Warehousing Appliances - Next Generation Approach to the Challenges of Big Data Environments


Listen to the Q&A with the Author
Download the Whitepaper (*Registration is required)
back to Industry Analyst Reports

This is Tamara Graves of Dataupia and it is my pleasure to spend a few minutes talking with Arun Taneja about his research on Data Warehousing Appliances - Next Generation Approach to the Challenges of Big Data Environments

Can you tell us:

Q. Why should data storage professionals care about data warehouse appliances?
A. There are a lot of reasons why storage professionals should care. Maybe the best way to understand why is to see what is happening in the lives of the data storage professionals. So, there are two sides of this. There is a whole bunch of drivers to cause specific things to happen that impact the storage professionals. There is a certain set of issues that storage professionals deal with in terms of the solutions that are available to deal with those issues and some limitations. Let me start with the drivers that create issues.

Clearly, as we all know, data growth is so hard and heavy that it is unanticipated. No one thought we would be growing information at practically 100%/year. So this tsunami that keeps coming on has to be kept somewhere and managed. The storage professionals are dealing with that issue and taking one step forward and two steps back. To make matters worse, the compliance situation is very different than 10 or even 5 or 6 years ago. There are new set of regulations that have come in like SOX, HIPPA, PCIA or a whole series of new regulations that have been designed to protect information or make companies act more responsibly from a financial management perspective and customer care. All of these things amount to one of several things. You are going to keep more data for longer periods of time and you are going to secure data. There are some common characteristics that are in many of these regulations. And if that isn't enough, the other competitive pressure that these guys are dealing with is just the fact that the lines of business people actually want to keep more data online, not for data compliance reasons but more for analysis on data that is kept on longer period of time will yield better results. They want to keep more data online and they want queries and answers quicker and faster than ever before because they have their own competitive pressures. That is the environment that we are dealing with.

Then we look at the other side of the equation which is the solution. We are going to bring in data warehouse appliances to this pictures. So how do we solve all of those problems? I am particularly focused on the structured data for the moment because that is where data warehouse appliances come in. I look at the current solutions and how people manage large amounts of structured data. How do they do data warehousing? Well, you get one of three approaches. You either have the classic scale up kind of system. A 16 way or a 32 way SMP machine that is very strong, very powerful, lots of memory. Yes, it has a lot of power and performance but at a humongous, large price. Scalability is non-existing effectively because once you buy it and it runs out of gas the only way to get more performance is to get a bigger one. You have to do fork lift upgrades. Those are horrendously expensive. That is one solution.

The other solution is to go to clustered environment with software like Oracle RACK or something equivalent to that. You have a number of nodes that share common storage. Those things are hard to manage. They run into all kinds of issues as inter-nodal traffic becomes large. All kinds of synchronization issues come in. People have found some ways of dealing with that. Maybe partitioning the database but the moment you do that then you get much higher storage management and database management issues. Scalability in that environment is pretty much at 4 nodes--maybe some installations with 8 nodes but few and far between. It is very hard and expensive to manage.

The third scenario that you have today is the classic Teradata-type environment which is MPP and in that situation you break the problem down into pieces and you let multiple processors attack it and pull the solution back together. Last I looked the prices of those solutions were several millions of dollars and some times it is not unusual for us to find IT literally spending millions of dollars of operational costs in that scenario as well.

So, that's the environment we are dealing with. You've got massive issues on one side and you have these solutions that have very distinctive limitations either on the expense side or on the expense AND managed size. This is the environment in which one has to view data warehouse appliances. In my view, data warehouse appliances are a method of eliminating or reducing significantly all of the negatives these three solutions that I just mentioned offer and have all of the positives of these three solutions provide plus more because by definition a data warehouse appliance is an appliance. This means that it is something that comes up and be easily manageable, so you can deploy them very easily. They are designed to deal with data warehouse and data mart kind of workloads. Provide very fast queries by orders of magnitude than these other types of solutions. If done correctly, not all data warehouse are alike, they can actually start small and grow very large--very, very large, in 10s or 100s TB of data. That is the reason why the storage professionals, if they didn't know about data warehouse appliances, they would be doing themselves a great injustice if they didn't consider data warehouse appliance in this day and age. Otherwise, they are going to get crushed with all of these issues that just seem to get worse and worse.
Q. What surprised you the most when doing research for this paper?
A.The first surprise I had when doing research on this issue is that I had done earlier research on unstructured data and what we discovered then is that area is going through a growth spurt more than anywhere else in the world. Massive growth in unstructured data. I had gone into looking at this data warehouse appliance area, thinking that the structured data was growing at a much saner levels. What I discovered is that it is NOT growing at saner levels. It is growing at insane levels and particularly when it comes to the need for analyzing and getting answers. Extracting information from mounds of information that is growing more rapidly than I thought it was. That pressure is very high and the DBAs, storage professionals and the line of business analysts, they are all under serious amount of pressure. The structured data is growing faster than most people imagine.

The other thing that I discovered, it was not a humongous surprise, but never the less, it was a surprise to me was the spend on OPX. I knew people were spending huge amounts of money in using the current limited solutions like the Terdata MMP approach or for that matter the SMPs but frankly didn't realize how much money they were spending managing these situations or these type of systems. So it is just not a CAPX issue, it is an OPX issue. That is where a data warehouse appliance, particularly one that is done well, has distinct advantage on the OPX side. So, to me a well designed data warehouse appliance is for all practical purposes-self-managing, self-healing, put it together and the stuff will continue to grow. As it continues to grow a DBA doesn't have to move it around into specific areas of storage so that the performance will stay in tact. All of that intelligence is built into the data warehouse appliance. No single points of failure, scales indefinitely, so bring it in, put it together. Put in the policy that you want the data warehouse appliance to adhere to and let it go. Let the DBAs then go on to other stuff. The storage professionals are not constantly provisioning new storage capacity. As we all know storage provisioning should be trivial but ask a DBA who goes a storage professional and asks for an extra 500 gigabytes to be attached to the existing volume and they will tell them it takes a week. Little things like that that have been archaic ways of managing things in the past. A data warehouse appliance can make a big difference in many of those dimensions. So, the OPX spend was actually a big surprise to me. There are still 100s and 1,000s of installations where very expensive SMPs are being deployed. And SMPs as we all know are designed to be general purpose computers. They are designed to do a lot of things reasonably well. They are not designed to do one thing extremely well. Data warehouse appliances are designed to do one thing really well that is manage large databases and warehouses and provide quick queries and be self managing for all practical purpose.

Thank you so much for sharing just a few of your insights on next generation approaches to the challenges of big data environments. In order to get all of the details, please be sure to download the whitepaper in its entirety here.