As I mentioned in the previous post, Seagate and DriveScale solutions for optimizing your data center architecture exist today.
Composable Infrastructure is a technique for implementing the optimized disaggregated data center. It is conceptually very simple: define a set of discrete optimized building blocks, starting with thin compute nodes, dense fast storage such as the Seagate® Exos® AP series, and a high-speed fabric to connect the pieces.
An orchestration platform, such as ours, is critical for scaling the disaggregated data center. When you look at production deployments of 10,000’s of managed elements of storage, compute and GPU, automation based on policies brings the sprawl under control.
The orchestration and automation framework designed by DriveScale is designed to allow future technologies to be brought under a common composable infrastructure platform. The journey has already been taken since the DriveScale platform was extended to include NVME® over both RoCEv2 (RDMA) and TCP as they became available. In the DriveScale model, the engine that drives configuration management and change uses built-in rules and user-supplied templates to implement policies. One of the cool features of the DriveScale design is that if the endpoints for a given server and storage pair previously supporting only iSCSI were upgraded to support NVME over TCP, as a software refresh cycle, the storage binding to the compute server would change transparently to the new protocol. The templates provide a higher-level abstraction of the assembly of disaggregated components that push down details into the automation engine, and software upgrades “simply do the right thing.” Of course, the administrator has fine-grained control over the templates, especially when using the orchestration API directly, and can override say the automatic negotiation of the storage transport protocol.
The DriveScale orchestration platform was built to evolve gracefully as new technologies become available, and works to hide details of the underlying data center implementation as much as the administrator would like it to. DriveScale’s goal here is that the solution “simply works”, allowing the administrator to worry more about the application deployment instead of details of the disaggregated infrastructure. Additionally, DriveScale adds strong security features to ensure no storage can be accessed other than from the appropriate servers.
Today we have high-speed standard fabrics based on Ethernet with ROCEv2 and increasingly NVMe over TCP. NVMe over fabric adoption is accelerating and supplanting the previous open storage protocol iSCSI. The partnership with Seagate is providing the storage platforms that support these new fabrics. Looking towards a common fabric future, Seagate is pushing to deliver NVMe-based HDDs.While NVMe evolved initially to deliver the full performance potential of solid-state storage by providing parallel queues and deeper buffering, even HDDs can gain a performance improvement over an end-to-end NVMe data transport.
But for large scale production deployments, refresh cycles of hardware and software happen rack-by-rack. No one replaces or updates 5,000 computes nodes in a single upgrade. The DriveScale Composer fully supports heterogeneous scale-out application clusters in order to implement rolling graceful upgrades that are a key part of application lifecycle management. This is critical to enable the rapid adoption of new technologies when they become available – all the while supporting the unmodified part of your data center infrastructure to eliminate application disruption.
It is not only transport and fabric technology evolution that is influencing the direction of the disaggregated data center. The container management platform Kubernetes is seeing increasingly widespread deployment. The way to view Kubernetes and DriveScale’s solution is: Kubernetes is providing the logical platform for application deployments via containers (pods), while DriveScale is the physical layer management by providing dynamic persistent storage provisioning through the CSI provider interface. As part of the work to support Kubernetes, DriveScale’s composer platform is now “pod aware”. That is, storage bindings in a Kubernetes environment are shown as pod-to-persistent storage relationships. Kubernetes simply sees a standard CSI dynamic persistent storage provider while DriveScale automatically allocates storage on demand to pods, all while enforcing administrator-defined policies for any particular application. Of course, the Seagate Exos storage solutions drop right into a DriveScale managed Kubernetes deployment and is ready to go – quickly and simply.
This is important because new technologies and disaggregated elements are appearing in the data center. Managing disaggregated GPUs for critical applications such as machine learning and deep learning is ground very much under construction, with new deployment configuration possibilities becoming available in the future. DriveScale is working on approaches to GPU disaggregation that tackle the problems of performance and GPU utilization in a shared infrastructure environment.
DriveScale is already partnering with nVidia (Mellanox) on enabling NVMe fabric SmartNICs (Bluefield 1 and 2) in the disaggregated data center. SmartNICs offload storage functions such as compression and encryption and provide a stronger security model in a shared compute environment by cleanly isolating the fabric control plane from the application execution space.
As Mohamad Elbatal described in the joint webinar, longer-term memory disaggregation is coming. Standards are emerging, such as Compute Express Link (CXL) and Gen-Z, each with their strengths and weaknesses. DriveScale’s viewpoint is that our composer platform architecture is CXL and Gen-Z capable with a modicum of effort, and supporting memory disaggregation will leverage existing logic and policy capabilities to enable these new disaggregated elements in the future.
In a somewhat different category is Infiniband and PCI Express networking. These physical interconnect architectures provide some performance benefits compared to RDMA over Ethernet, but at a cost. PCI Express networks are not highly available via redundant pathing and have very limited port and lane count in current switches. The latter limitation is likely not an issue for many PCI deployments, but high-end GPUs from nVidia are 16 lane capable and will likely remain in the compute node rather than on a PCI network if maximum performance is to be achieved.
Perhaps the merger of Nvidia and Mellanox will lead to the development of “network-native” GPUs, which would be much friendlier towards composition. The raw bandwidth of Ethernet, now available up to 400Gb/s, is certainly able to support this, but many issues, particularly in software support, would need to be addressed to make this happen.
Seagate and DriveScale are partnering to not only deliver production solutions today at scale, but are working to provide new and exciting capabilities in the future.