DriveScale and the NVIDIA BlueField-2 DPU

DriveScale and the NVIDIA BlueField-2 DPU

2020-10-05T17:06:38-07:00

DriveScale provides Composable Infrastructure orchestration to enable the disaggregated data center. By dynamically configuring (and reconfiguring) servers with any storage configuration on demand, composable infrastructure marries the flexibility and scale of networked storage with the performance and cost-efficiency of local drives. It also eliminates the need to pre-determine configuration at purchase time. The DriveScale approach allows you to right-size the amount of CPU and storage, connected over readily available high-speed fabrics, for large scale data intensive applications, and manage the technology refresh life cycles separately, leading to improved utilization and significantly reduced total cost of ownership.

Configuring servers for the data center typically means buying a bare-metal x86 computer with some storage and attaching it to the network. As data analytics and machine learning have gained traction as primary applications in the data center, today’s network is more your data fabric, and the clear choice for data fabric today is NVMe. Most people have heard of NVMe SSDs and their blazing speed. But NVMe itself is just a protocol standard that layers on top of PCI Express.  It allows any SSD to work with any operating system as long as they both support NVMe, and every operating system now supports local NVMe access. NVMe-over-Fabrics (NVMe-oF) sends the NVMe protocol across Ethernet, bringing its capabilities to a whole new level, allowing crazy fast remote storage, but enabling NVMe-oF needs cooperation from the host OS.

But data center architects were hampered in their adoption of NVMe-oF by CPU platform vendor support. Until now.

Enter NVIDIA.

The NVIDIA BlueField-2 data processing unit (DPU) is the world’s most advanced DPU, combining the networking power of the ConnectX-6 Dx SmartNIC with programmable Arm cores plus offloads for traffic inspection, storage virtualization, and security isolation. The BlueField-2 DPU frees up CPU cycles for applications while enhancing security, efficiency, and manageability of every host. In the traditional model, legacy appliances or the CPU would run data center services. However, in the hardware-accelerated, BlueField-enabled server, these services are offloaded to the DPU–freeing the CPU to run applications. At the same time, the BlueField DPU provides critical management and security functions to every server. The DPU enables secure and performant accelerated computing at data center scale.

Enter DriveScale.

DriveScale turns commodity physical infrastructure into programmable components that are easy to deploy and adapt and gives IT the flexibility and scale of networked storage with the performance and cost-efficiencies of local drives. Previously this was limited to Linux server platforms. However, with the NVIDIA BlueField-2 DPU, DriveScale now supports Windows and VMware application hosts and other operating systems or hypervisors for data-intensive applications and machine learning.

Of course, the NVIDIA BlueField-2 DPU provides accelerated networking capabilities for the host and also has a feature known as SNAP™ (software-defined network accelerated processing), which allows the DPU to emulate storage devices, so that remotely connected storage can be presented to the CPU as local storage devices. With SNAP, the complexities of NVMe-oF are completely hidden from the host, which sees simpler NVMe (or Virtio) devices.

DriveScale’s software agent runs on the NVIDIA BlueField-2 DPU, fully managing the SNAP configuration to set up the desired NVMe/RDMA or NVMe/TCP data fabric. This enables the automated allocation of drives or slices of drives to the host application — allowing virtual resources to be used as easily as physical ones. Using DriveScale’s API or GUI, users can set parameters for the application’s infrastructure — and compute, storage, and network resources are automatically connected and ready for application deployment. Additions or changes to these resources are implemented on-the-fly.

Referring to the diagram above, the NVIDIA applications speak NVMe to the PCI Express bus, and the BlueField-2 DPU translates these local accesses to NVMe-oF for the network. Without any host involvement, the NVIDIA BlueField-2 DPU makes NVMe fabric devices appear as local NVMe SSDs. DriveScale’s server agent runs on the BlueField-2 DPU to allow dynamic composing of the networked storage to the DPU. Moving the NVMe fabric control plane from the compute server to the DPU allows a more transparent, universal solution for Composable Infrastructure that can support more use cases.

The DriveScale agent works only on the control plane for binding networked NVMe storage to configure the BlueField-2 DPU to translate to local drive access. There is no software inserted in the data path, it is pure NVMe end-to-end, ensuring the highest possible performance and platform compatibility.

While we’re focused on enabling NVMe fabric (over TCP or ROCE) in this post, it is the case that the DriveScale/NVIDIA solution provides support for the iSCSI protocol to support legacy storage infrastructure that you may want to integrate into the disaggregated data center.

The BlueField-2 DPU offloads protocol and network translation overhead from the compute host allowing the compute server (the application server) to focus its CPU cycles on applications. But deploying the BlueField-2 DPU addresses some other critical problems in the data center.

The BlueField-2 DPU provides hardware offload of encryption and RAID capabilities, and DriveScale is fully capable of exploiting these capabilities in managing networked storage. The value DriveScale brings to the table here is being able to simply define policies for data management (like RAID protection levels and encryption) on an application by application basis and push these policies into the fabric and manage them at data center scale. The combination of the NVIDIA BlueField-2 DPU and DriveScale’s policy-driven orchestration solution is unparalleled in the industry.

The DriveScale agents manage the nitty-gritty details within the DPU, but it is the DriveScale Composer that keeps track of the big picture of the data center as a whole and decides which storage resources to connect to which DPUs.  The Composer discovers all necessary information automatically (from the agents) and manages the constraints imposed by different types of storage, different types of servers, and “zones” of bandwidth and reliability. All communication with agents is cryptographically secure, and no connections between resource endpoints can be made without the Composer’s permission.  The DriveScale Composer scales to 10s of thousands of devices managed in a single deployment.  And like any good orchestrator, the Composer is 100% API driven – for easy integration into other data center systems.

The DriveScale and NVIDIA solution together enables the disaggregated data center by logically presenting networked storage over NVMe fabric as a local NVMe drive that can be instantly sliced, expanded, and replaced providing flash storage virtualization and elasticity for bare-metal clouds, big data analytics, and machine learning use cases — all with the highest levels of performance, security, and manageability.

 

About the Author:

Brian Pawlowski has years of experience in building technologies and leading teams in high-growth environments at global technology companies. He is currently the CTO at DriveScale. Previously, Brian served as Vice President and Chief Architect at Pure Storage, focused on improving the user experience for the all-flash storage platform provider’s rapidly growing customer base. As CTO at storage pioneer NetApp, Brian led the first SAN product effort, founded NetApp labs to collaborate with universities and research centers on new directions in data center architectures.

Leave A Comment