Deep Learning with DriveScale

//Deep Learning with DriveScale

Deep Learning with DriveScale

2018-08-17T00:40:21+00:00

I had the privilege of attending the DellEMC HPC Community meeting prior to the SuperComputing ’17 conference.  There was a lot of discussion of Deep Learning as a rapidly growing workload of concern to almost every enterprise.  The meeting was topped off by the announcement of the Dell EMC PowerEdge C4140,  a server designed specifically to support very dense GPU accelerated computing.

Deep learning requires incredible amounts of compute horsepower, even for the “inference” stage which is what applications actually use.  However, during the design, refinement, and training of the neural network models, huge amounts of data is needed in addition to enormous amounts of compute.  Training a model with fewer than a million samples typically does not result in good accuracy – and developing the model may take thousands of iterations to find a good model. This results in huge amounts of data transfer in large clusters of GPU accelerated servers.

This data workload is actually pretty similar to what is used for large scale data analytics, so most of the large deep learning developers use HDFS – the Hadoop Distributed File System – that can deliver the aggregate performance needed. HDFS uses servers with Direct-Attached Storage (DAS) – which has far better cost and much more scalable performance than any traditional SAN or NAS storage solution.

But now the cluster architects have a conundrum – should  they buy servers built for dense GPU computing, or servers with lots of DAS storage to support HDFS? Getting both in one package is really not possible any more.  So many users are stuck with buying *both* types of servers, even though the CPUs in those servers are little more than “babysitters” of the GPUs or storage.  And separating the compute node from the server node results in network inefficiencies and scaling problems.

Fortunately, DriveScale‘s Software Composable Infrastructure is able to solve the conundrum.  Using the C4140 or similar servers for hosting dense GPUs, logical servers can be composed with commodity JBOD based storage – resulting in servers which are good at handling both GPU computing and HDFS. This reduces server count, reduces the  number of types of servers that must be maintained, and  makes much better use of the CPUs in the servers.

As network fabrics mature, we’ll start to see the GPUs themselves become dis-aggregated from the servers,  to make boththe servers and GPUs more dense and efficient.  DriveScale is already working with hardware partners to include GPUs in future composability solutions.

Deep Learning needs DriveScale!

About the Author:

Tom Lyon is a computing systems architect, a serial entrepreneur and a kernel hacker. Prior to founding DriveScale, Tom was founder and Chief Scientist of Nuova Systems, a start-up that led a new architectural approach to systems and networking. Nuova was acquired in 2008 by Cisco, whose highly successful UCS servers and Nexus switches are based on Nuova’s technology. He was also founder and CTO of two other technology companies. Netillion, Inc. was an early promoter of memory-over-network technology. At Ipsilon Networks, Tom invented IP Switching. Ipsilon was acquired by Nokia and provided the IP routing technology for many mobile network backbones. As employee #8 at Sun Microsystems, Tom was there from the beginning, where he contributed to the UNIX kernel, created the SunLink product family, and was one of the NFS and SPARC architects. He started his Silicon Valley career at Amdahl Corp., where he was a software architect responsible for creating Amdahl’s UNIX for mainframes technology. Tom holds numerous U.S. patents in system interconnects, memory systems, and storage. He received a B.S. in Electrical Engineering and Computer Science from Princeton University.

Leave A Comment