At DriveScale, we say we are addressing the scale-out environment. But what do we mean by that? Today, I explain how applications are driving the definition of scale-up vs. scale-out architectures, ultimately resulting in the emergence of a new Application Platform. In the long run, the new wave of scale-out applications will inevitably win because they combine the best of both worlds and bring with them huge cost and availability advantages.
Scale-Up vs. Scale-Out Architecture Historical Perspective
Certainly, the term ‘scale-out’ has been around for many years but it is only recently that we are seeing the birth of applications that are truly scale-out. To me, a truly scale-out application must benefit from the multiplicity of nodes as well as be tolerant of the inevitable node failures and service partitioning that come with multiple nodes. The early days of scale-out were driven by applications that were either embarrassingly parallel or that, like most HPC applications, were not really tolerant of failure.
Life was so much simpler for the software developer when programs were single-node and single-thread. Back then, if the hardware failed you just waited for the hardware to be fixed. But even as far back as 1974, with the founding of Tandem Computers, it was recognized that there was a place for software services which were more reliable than any particular hardware system. Tandem represents the first truly clustered (aka scale-out) system—it forced the applications to not trust the underlying OS and hardware in order that the application itself would be highly available.
Historically, clustered applications have been less important than scale-up applications that can take advantage of multiple processors in the same system. Multi-processor systems such as the Burroughs B5000 date back to the early 1960s. For decades, software and developers happily rode Moore’s law as processors became faster, cheaper, and smaller in every generation. However, due to what is called the ‘frequency wall’, individual processor cores stopped enjoying the generational speedup back in around 2004. Since then, the most important way to achieve speedup with Intel processors is to leverage multiple processors in a system—which we now call scale-up. Language support for scale-up programming has blossomed since then – everything from POSIX threads to Java threads has become widely used. Yet threading as a paradigm can lead to hidden race conditions, potential deadlocks, and non-linear scaling—it is hardly universally hailed as a ‘good thing’. These days, one can obtain systems with zillions of processor cores, but it is rare to find an application program that can scale linearly beyond around 10 cores. Paradoxically, then, many scale-up systems are best utilized by multiple instances of scale-out programs. However, any production grade application today needs to leverage scale-up to some decent number of cores, and then go scale-out for true breadth.
Scale-Up vs. Scale-Out Architecture: We Have A Winner
It is the advent of cloud computing which has forced the emergence of truly scale-out applications. Cloud instances are often wimpy, so you need many of them, and they are subject to failure and unreachability in ways that traditional infrastructure have avoided. But outside of the cloud, back in the ‘traditional’ data center, scale-out brings huge advantages to the cost and availability of enterprise applications. Big multi-core systems have a super-linear cost, taking advantage of those customers who are stuck with scale-up and can’t scale-out. Scale-out software allows hardware choices which target the sweet spot between cost and cores.
But Creating Scale-Out Apps Isn’t Easy
But make no mistake – writing scale-out applications is hard. Really hard. Scale-out applications usually improve their availability by the fact that they run across multiple nodes. However, consistency of data across these nodes can be a challenge. Any multi-node application is also subject to partitioning, which can wreak havoc on availability and consistency. Eric Brewer’s famous CAP theorem makes it clear: of consistency, availability, and partitioning, only 2 of 3 can be achieved. No scale-out app is safe from partitioning, so this means that an app must sacrifice availability, consistency, or both! Fortunately, many scale-out applications squeak by with self-defined levels of consistency and availability which are “good enough.”
Fortunately, there is help for the scale-out application developer. The Erlang language system was one of the first to embrace a high concurrent and partition tolerant environment; today the Go programming language is taking some of these ideas into the mainstream. The Akka framework brings many of these capabilities to Java and other JVM languages. Microsoft’s Project Orleans has been a very successful approach to integrating concurrency and fault tolerance into the C# language.
Scale-out environments are being driven by a new wave of applications which are multi-node, highly available, with well-defined consistency, and make efficient use of multi-core processors and commodity hardware. In the scale-up vs. scale-out architecture battle, scale-out apps rule!