A classic misdirection in almost any unforeseen situation that is problematic, is to focus the attention on something unrelated and question if the issue exists somewhere else. That is to say, make it appear as if your particular area of responsibility is not the place where the fault lies, rather that it rightfully belongs elsewhere.
In the traditional storage world, appliances that serve up data have been designed to withstand random failures in various parts of the system such that no one single issue results in catastrophic failure. This level of resiliency has been expected of these systems for a very long time and many a ‘mission critical’ application rely on such hardened machines where it is almost impossible to lose data or have inadequate performance.
However, the world of storage has changed in a significant manner. Today, a lot of critical storage systems are being built as software appliances residing on commodity servers with internal disk drives. All of the so-called hyper-converged systems are designed in this manner. Resiliency of storage is not bound in a single appliance but spread across several servers, usually requiring a minimum of three such systems. Copies of data are spread across all three systems such that in the event of the failure of a single server, the application has access to the data in disk drives residing in other servers.
But, there are several questions that arise from such an architecture. First and foremost is the issue of recovery. How does one get back to the highest level of resiliency with a minimum number of copies of each block or file, if only two of these storage systems are alive. The answer, of course, is to purchase that redundancy up front in the form of spare systems that can be brought into service in the event of a failure. Such additional system investments are not required in the traditional storage world.
Secondly, once the redundant system is brought online, it needs copies of the data to be transferred to it. This equates to network traffic. And that is where the ‘misdirection’ happens. The vendors of hyper-converged solutions will tell you that the network is the bottleneck and that their software is designed to be fully resilient. However, by shifting the area of responsibility away from them, they are doing customers a dis-service because this issue is in fact a result of the choices in system design from the start.
Copying a single drive’s 6TB worth of data across a 10Gb Ethernet network requires 1.3 hours of time, assuming you are able to use the entire bandwidth of that network and are not sending or receiving any other traffic on the same network. One can extrapolate and calculate how much bandwidth and time will be consumed if you have to copy tens of TB’s of data across the network. The vendors of these hyper-converged systems will never tell you about this.
Dedicated storage systems never have this problem as they are designed with dual controllers at a minimum, ensuring significantly higher levels of resiliency. With DriveScale, you can achieve similar levels of high-availability and resiliency as traditional storage systems, but at the cost points of commodity servers and drives. And, you can build hyper-convergence on top of the DriveScale solution. Come and talk to us about it.