Part 7: The 2010s and into 2020
The 2010s saw the explosion of big data and other scale-out systems using vast amounts of Direct Access Storage instead of the more manageable, but more expensive, SAN and NAS approaches. In the mid 2010s we started DriveScale, where we provide Composable Infrastructure, creating servers with any storage configuration on demand, not at purchase time. Until now we’ve done it without SmartNICs, but a new type of SmartNIC lets us do our job without any host involvement, by emulating NVMe devices.
Most people have heard of NVMe SSDs and their blazing speed. But NVMe itself is just a protocol standard that layers on top of PCI Express. It allows any SSD to work with any operating system as long as they both support NVMe. And every operating system supports NVMe now.
NVMe-over-Fabrics takes the NVMe protocol and puts it onto an Ethernet, allowing crazy fast remote storage. But NVMe-oF needs cooperation from the host OS. Linux has great support, but both VMware and Windows have been lagging. But the new SmartNICs can speak NVMe on the PCI Express bus, while translating to NVMe-oF for the network. So it allows a more universal solution that can support more use cases, and offload protocol overhead at the same time.
Have I mentioned the cloud? Hyperscaler spending is what is driving all hardware architecture now. SmartNICs have been in use in the cloud for about 5 years, but each hyper operator tends to have their own idea of what the SmartNICs should do. Microsoft’s Azure uses FPGA based SmartNICs to offload network and accelerate compute functions. Most SmartNICs today are only doing network offloads (those precious x86 cycles can be sold for real money).
Amazon AWS bought the chip firm Annapurna Labs about 5 years ago and has been really pushing the meaning of SmartNIC. The AWS Nitro chips now provide both network and storage connectivity to x86 machines. The storage uses the NVMe emulation approach. But a really compelling feature that many overlook is that the Nitro chips are providing the security services for the system as well. Only the Nitro chips, not the x86, can write any firmware state. The Nitro chips provide wire speed encryption for both network and storage, and are also involved with all secure boot operations.
In the age of CPU vulnerabilities like Spectre and Meltdown, it becomes troubling to rely on any CPU based separation of user code and operator code. Letting the user have the whole x86 (bare metal or VMs) and keeping the operations code on a SmartNIC is a much better way to reduce the attack worries for the operator.
But wait! What about my old dislike for slow network controllers? Well, it turns out that silicon CPUs have hit a frequency “wall” and cores are just not getting faster anymore. And the cores inside of a SmartNIC (typically ARM64) can be just as fast as the cores inside of an x86 processor, though with less access to big caches and memory. So offloads and front-end processing are much more viable, and true acceleration can come with more exotic hardware embedded into the SmartNICs. In my early testing of SmartNICs, I was using cheap x86 servers which were substantially slower than the SmartNIC!
DriveScale’s storage provisioning is working on SmartNICs today, scaling to thousands of servers, and we plan to do a lot more to help bare metal operators, offering public or private services. So check us out and stay in touch!