“Scale-out” exists in relation to “scale up”, the latter being the traditional method of adding performance and capacity in the data center. Simply put, if you want more, make it bigger – more CPU power and memory and storage in a single node. For an example of scale-up today, think Oracle database on an Oracle SPARC M8-8 Server. Lots of cores in a single physical domain to run a single application. The Oracle database is an exquisitely engineered piece of enterprise software designed to exploit the many cores found on mainframe-equivalent data center servers. Ubiquitous virtualization arose due to the collision of the inertia-against-change of large, complex, single system applications and the CPU clock wall being hit in the mid-2000s
Initially, virtualization was being used to host multiple individual applications on the now many-core platforms in a secure fashion. But increasingly, applications were designed to span multiple low-end platforms where they cooperated and presented as a single application to the user. This method of large application development was called “scale-out”. When we say “scale-out” for applications it simply means you add additional performance and capacity by creating more instances of the application.
I want to distinguish the emergence of scale-out applications from clustered applications. Clustered applications have been around since the 1990’s at least, and had limited scale from two to fournodes, and although rare, sometimes eight nodes, and a restricted definition of fault tolerance.
Scale-out applications as we know them today emerged from the hyperscalers. Scale-out applications are designed as cooperating instances that run on a large set of commodity-off-the-shelf (COTS) hardware. The first significant scale-out application that saw widespread adoption was the map-reduce application Hadoop, which had its beginnings at Google back in the early ‘00s. While map-reduce is the core function of Hadoop, what we call Apache Hadoop is actually a collection of technologies, including what I would term an application-specific storage layer called HDFS.
Today there are many production examples of scale-out applications. They run on COTS servers in the data center or a COTS platform like Amazon Web Services (AWS) in the cloud. The term can also be applied to storage solutions, such as Ceph, where the scale-out application is the storage service itself.
In clustered applications and clustered platforms of the past, the fault-tolerant model was narrowly and strictly defined. In a sense, cluster applications were simpler to define, if fragile to deploy. Scale-out applications are very much characterized by the failure and recovery semantics being defined an implemented in the application itself, with each application adopting a slightly different model depending on its requirements. And the failover and recovery model for scale-out is much more relaxed. Fail overtimes are often non-existent, in that no compute node is distinguished and data is typically triple replicated and a failed node simply results in a rebuild of the data onto a new node or redistribution of data to existing nodes to reach triple replica protection again.
A subject for a future blog post is a discussion of the control plane versus the data plane in deploying scale-out applications today. I touched on this in my previous post on Kubernetes, but it bears a closer look.
The last blog in this series is D is for DriveScale, where I will describe how DriveScale technology fits into all of this.