Persistent Volumes: Separating Compute and Storage

Persistent Volumes: Separating Compute and Storage

2020-06-26T14:52:31-07:00

The Kubernetes documentation states that “managing storage is a distinct problem from managing compute instances.” This recapitulates a fundamental tenet of modern data center design, that compute and storage are best considered separately from a planning and deployment perspective. The logical separation of compute and storage has become increasingly formalized in Kubernetes via subsystems like the Container Storage Interface (CSI), and in this article, I argue that the physical separation of compute and storage leads to improved economics and more efficient operations; and so it is a powerful strategy to employ with Kubernetes.

Computer storage itself is simply a technology. Instead of thinking about storage, we should think about data and its uses. In a perfect world, one would like to store data such that it is:

  • Correct
  • Consistent
  • Available
  • Fast to access
  • Durable
  • Recoverable

Correctness is often assumed, but studies have shown that correctness should not be taken for granted in real-world systems. Consistency is critical in certain applications but is often relaxed for the majority of data center data usage. Availability and performance — the ability to access your data and to retrieve it fast enough — are often the focus of storage system requirements, followed by recoverability. Durability is often subsumed under recoverability, with the assumption that real-world systems eventually fail and a scheme must be implemented for recovering lost data.

Hard Drive Reliability — Annualized Hard Drive Failure Rates

While enterprise storage has focused on engineering smaller highly-tuned platforms (appliances — think 10 controllers) that focus on transparent-to-the-application data availability techniques, modern scale-out, data-intensive applications incorporate the storage stack directly into the application and commonly use a triple replica configuration for availability. The economics of industry-standard (that is, commodity) storage media and enclosures favors the triple replica approach versus engineered storage systems employing erasure encoding when factoring in performance requirements. And the scale of the applications storage deployment (think 1,000 controllers) led to the strategy of implementing rack failure semantics and the ability to grow performance or capacity simply by adding more compute and storage.

But in some sense this is beside the point, as the more common causes of data unavailability today in the enterprise are planned downtime for application upgrades, network and infrastructure upgrades, and technology refresh.

The biggest lever at the disposal of the data center architect today to improve availability is the separation of compute and storage into separately managed layers, connected via ubiquitous high-speed Ethernet networking. Not only is availability tackled with this approach, but benefits accrue with improved economics and potentially improved operability.

The benefits derived from separating compute and storage are several:

  1. Reduce downtime by eliminating rebuild/recovery in case of server “failures” — simply reconnect storage to a new node.
  2. Eliminate SKU proliferation by deploying a smaller number of optimized thin-client compute node optimized configurations.
  3. Reduce per-slot media overhead costs by optimizing the storage layer in density, form factor, and population.
  4. And finally, adopt new technology faster in the compute and storage layers by decoupling lifecycles.

Kubernetes has evolved to support a flexible storage interface called the Container Storage Interface (CSI), that allows any storage provider to offer service through a well-defined interface. The CSI deprecates the former in-tree storage provider method and opens up Kubernetes to a wide variety of storage deployment approaches.

The CSI provider interface defines a dynamic storage model for Kubernetes. In many ways, the CSI interface encourages vendors to define new interfaces to existing storage solutions to dynamically allocate persistent storage volumes on-demand under the control of Kubernetes. CSI has expanded the capabilities even to legacy enterprise storage solutions. The CSI abstraction provides a necessary and beneficial separation of compute and storage that supports the physical separation approach described above. CSI also allows newer networked storage approaches, which focus on the networked aspect of “networked storage,” to deliver robust solutions to Kubernetes that are better suited to storage provisioning for web-scale data applications.

Kubernetes deployment requirements for storage for web-scale data analytics is an area that is evolving. Kubernetes provides the scale and orchestration capabilities to deploy high-performance data analytic applications. The lightweight container model is much better suited for high-performance scale-out data analytic applications than the more heavyweight traditional VM approach. However, many of these applications were written assuming direct access to local HDD or SSD storage, depending on the particular workload. Approaches exist, like composable infrastructure, to provide networked storage with the performance and behavior of local drives. Composable infrastructure also supports the storage-stack-in-the-application model defined by many of these applications, and correctly configured and automated, enables rack-failure semantics. The CSI model fully supports such deployments, providing a high-performance solution for these applications on Kubernetes.

The CSI interface enabling persistent storage is a recent addition to the Kubernetes ecosystem. It is already leading to a number of solutions for networked storage for web-scale applications. This will be an interesting space to watch in the coming year.

Original content published on The New Stack Blog found here.

About the Author:

Brian Pawlowski has years of experience in building technologies and leading teams in high-growth environments at global technology companies. He is currently the CTO at DriveScale. Previously, Brian served as Vice President and Chief Architect at Pure Storage, focused on improving the user experience for the all-flash storage platform provider’s rapidly growing customer base. As CTO at storage pioneer NetApp, Brian led the first SAN product effort, founded NetApp labs to collaborate with universities and research centers on new directions in data center architectures.

Leave A Comment