New AI/ML and Genomics Benchmarks Coming

New AI/ML and Genomics Benchmarks Coming


Ken Cantrell and Nick Principe presented two additional proposed workloads being developed and considered by the SPEC Storage Committee – Genomics and AI/ML workloads – for the SPEC Storage 2020 Benchmark. Proposing these additional workloads strongly reflects the changing application landscape for data-intensive applications. While the Genomics workload may appeal to a smaller audience, the AI/ML workload will be of interest to a much greater number of data center architects and platform developers.

SPEC Storage Benchmarks go back almost to the beginning of the SPEC organization. Started in 1988 by a small number of vendors who saw a need for realistic, standardized performance tests, the Standard Performance Evaluation Corporation (SPEC) has grown to more than 60 member companies and has an established track record of developing insightful vendor-neutral benchmarks.

The first storage benchmark was an evolution of the nhfsstone benchmark from Legato Systems Inc. (this man page for nhfsstone is a blast from the past) into the LADDIS benchmark that I was involved with, which then became the basis for the first SPEC storage benchmark. These early benchmarks were NFS-only load generators and modeled a software development environment workload.

All these benchmarks are synthetic, and consist of a load generating engine that statistically generates I/O operations that have been calibrated to real application workload traces and resemble the original workload. This calibration work consists of getting detailed traces of real applications that represent the workload of interest, and transforming those traces into a simulation of the original workload. The capability to more closely simulate complex workloads is where much of the work has happened in the current SPEC Storage benchmark. The engine is known as netmist and was primarily developed by Don Capps. Not only can the engine model complex workloads more accurately, but the current benchmark is no longer NFS specific and can be used to compare head-to-head any number of solutions regardless of their underlying storage architecture.

The proposed release of the SPEC Storage 2020 Benchmark would add two new workloads, Genomics and AI/ML. AI/ML is the rapidly growing area of machine learning which businesses are using to develop sophisticated analytic applications. The applications in this space require both compute performance and I/O performance, and represent a new class of data-intensive applications that DriveScale is crafting solutions for.

The slides and the video of the session describing the proposed workloads previewed at SNIA SDC are useful to individuals who want to know more about this exciting development at SPEC. Please take the time to review them.

Now, I need to get back to editing the benchmark documentation.

About the Author:

Brian Pawlowski has years of experience in building technologies and leading teams in high-growth environments at global technology companies. He is currently the CTO at DriveScale. Previously, Brian served as Vice President and Chief Architect at Pure Storage, focused on improving the user experience for the all-flash storage platform provider’s rapidly growing customer base. As CTO at storage pioneer NetApp, Brian led the first SAN product effort, founded NetApp labs to collaborate with universities and research centers on new directions in data center architectures.

Leave A Comment