Modern Workloads Require a New Approach

What are Modern Workloads?

They go by several names: Next Generation, Scale-Out or Cloud Native applications (even though they aren’t exclusive to the cloud). We like to call them “modern workloads”. They are relatively new on the IT landscape, and their very nature has had a profound impact on data center architectures.

Prominent examples of modern workloads come from Big Data, IoT, social and mobile applications (see examples on the right). Unlike traditional enterprise applications such as email or accounting software, modern workloads are more dynamic and unpredictable, and have data sets which often grow exponentially. Whereas traditional applications scale modestly and can be managed by increasing the number of virtual machines running on a server, modern workloads scale horizontally, with performance or capacity increases being addressed by adding servers. The dynamic and unpredictable nature of these applications means IT has to continuously provision and re-configure resources to handle the fluctuations in resources required.

Modern Workload Architectural Principles

The developers of the Big Data architectures at Google and Yahoo were looking to design a platform that could store and process a vast quantity of data at low cost. To achieve this, they developed several key principles around system architecture to support modern workloads such as Hadoop, Spark, Cassandra, etc.

Deploy Commodity Servers – Use the lowest cost storage and compute servers – what were traditionally called “PC Servers”. Every computer manufacturer has several versions of these systems, and the competition has resulted in them becoming low margin, low cost platforms. Building clusters of 100’s or 1000’s of these servers provides vast storage and compute capacity at the lowest possible cost.

Share Nothing, Parallel Architecture – Ensure no co-dependency between servers. If something is shared, even a minor resource, clusters would reach a point where the shared component would become a bottleneck preventing further scaling. Each node needs to be independent, equal, and parallel to all other nodes in the cluster. This allows for linear scaling from 10’s to 10,000’s of nodes.

Move the Application Processing to the Data – Keep the data close to the processor. Since the cheapest way to store data is in commodity servers with high performance connectivity to its local drives, the fastest way to process huge data stores is to use each of these nodes as a parallel computing unit to process the data stored on its own drives. This “data locality” is handled by a central control node that sends the applications out to the nodes, and each node then processes its own data.

Communicate Just the Results – Move data infrequently. Both Map Reduce and in-memory applications such as Spark keep the majority of data transfers local. Only the results from each level of processing are moved to other nodes for further processing. This keeps bandwidth requirements to a minimum between nodes and especially between racks.

Build Resiliency and Efficiency in Software – Move sophisticated resiliency and efficiency functions out of the hardware and “up” into the software layers. Big Data deals with data resiliency by triplicating the data rather than by using RAID or other data recovery schemes. Management of the triple copies is handled by the file system itself, rather than the storage subsystem. This may seem less efficient, but it enables the use of commodity drives rather than sophisticated storage arrays. Commodity drives cost orders of magnitudes less than storage arrays, so the total system remains vastly less expensive. Furthermore, in very large data stores the chance of a double disk failure, which storage arrays cannot recover from, becomes a probable certainty. Triplicated (and in some cases, quadruplicated) data is much more robust against failure than RAID arrays.

Hadoop, Spark, Cassandra and other Big Data modern workloads are all based on these principles. By following them, enterprises can reap the same benefits first achieved by the hyperscale organizations that developed them.

That’s where Software Composable Infrastructure comes in. It supports all of these Big Data principles, and allows IT organizations to realize even greater cost savings with far greater agility to respond to the ever-changing nature of these modern workloads.

See “What is Software Composable Infrastructure?