Solving urban Hadoop infrastructure sprawl and inelastic clusters

Solving urban Hadoop infrastructure sprawl and inelastic clusters

2017-04-14T20:05:13-07:00

Apache Hadoop, and Big Data analytics in general, have become common household names in small, medium and large enterprises. Big Data analytics is now a business necessity rather than a “good to have” option. This is because enterprises have realized the economic importance of collecting and analyzing all the data around them. And Hadoop has become the mascot of Big Data analytics since it can handle both structured and unstructured data sets, giving Hadoop the ability to handle a wide variety of use cases.

Most Hadoop initiatives we see start as small and isolated clusters due to the very specific application or analytics needs that different groups or departments have within an organization. Many times, this leads to each department owning and managing their own applications and infrastructure, resulting in separate infrastructure silos in the same organization that grow without direction or control. We call this infrastructure sprawl.

Another aspect of the real world is that many organizations were probably not completely aware of their application needs when their Hadoop clusters were first deployed, and, based on current usage and runtime, their application compute or storage requirements might have dramatically changed. Unfortunately, inelastic Big Data clusters cannot adapt to these dynamically changing needs.

But we see more and more Hadoop use cases requiring elastic clusters for experimentation with new applications for both structured and unstructured data. So a question that arises is how to fix infrastructure sprawl and cluster inelasticity?

DriveScale has just such a solution.

The DriveScale System for next-generation Scale-Out architecture disaggregates the commodity servers within the clusters into pools of compute and storage resources. DriveScale provides hardware and software technology that allows the separate purchase and scaling of compute and storage in the form of commodity storage-light servers and JBODs full of commodity disks, along with software control for flexible binding of disks to compute elements in any ratio required by an application. As applications change, these bindings can be dissolved and new bindings applied on demand, all under software control. With DriveScale’s technology, a data center can start with a minimum Hadoop cluster and scale out by adding more compute or storage resources independently as needed.

DriveScale gives users to create a composable server/storage combination that will eliminate infrastructure sprawl and create truly elastic clusters.

About the Author:

Alpika Singh has been a Technical Marketing Manager for the past 4 years. Prior to DriveScale, Alpika was working as Technical Marketing Manager at Hewlett Packard Enterprise working on the pre/post launch technical collateral's, Sales/Presales technical training, customer engagements for the HyperConverged platform. Alpika actively participated on the product development and road map activities as well. Prior to Hewlett Packard Enterprise, Alpika was working at Emulex corporation (now acquired by Broadcom) for 5 years as a Senior Technical Marketing Engineer working on the pre/post launch technical collateral's for the 10/40GbE Converged Network Adapters. Alpika has a Master in Science degree from Syracuse University. She also holds a Bachelors of Engineering degree from Mumbai, India.

Leave A Comment