Future cloud storage demands more than NAND flash can deliver

This article is part of the Technology Insight series, made possible with funding from Intel.

As global data volumes continue to rise, the resulting increases in storage traffic create critical infrastructure bottlenecks. Mass-scale block storage providers — organizations from big enterprise data centers to cloud service providers (CSPs) and content delivery networks (CDNs) — must find solutions for handling this data inundation.

Conventional approaches to the problem use NAND solid state drives (SSDs) to buffer data and help keep network pipelines just within their bandwidth capacities. With modern changes in network and PCI Express bandwidth, though, NAND is failing to keep pace. That means applications and services struggle to meet end-user expectations and organizational ROI objectives.

What’s needed is a new approach and technology that does not rely on overprovisioning — a popular but expensive way to increase storage performance and endurance in modern “disaggregated” environments.

Here’s a brief look at traditional approaches, and what enterprises and providers must do to position for tomorrow’s cloud storage demands.

Key Points:

  • As data volumes continue to explode, datacenters must increase bandwidth across conduits such as Ethernet fabrics and PCI Express (PCIe) channels. SSD buffers cannot cope with this increased storage load that impairs network performance.
  • Due to higher performance and endurance characteristics, Intel Optane SSDs can offer much greater efficiency and value in this buffering role than conventional NAND SSD approaches.
  • Cloud service providers, content delivery networks, and enterprises handling mass-scale block storage stand to benefit most from Optane-based buffering.

Datacenter bandwidth is squeezing storage performance

Until a few years ago, it wasn’t a problem if cloud providers placed storage next to compute in their servers. CPU and memory performance were plenty fast, and 1 GbE and 10 GbE network links sufficient for the modest amounts of data flowing into systems. This data could be written and read by NAND SSDs quickly enough to keep up with workload demands without bottlenecking PCI Express conduits.

Today, NAND SSDs have grown incrementally faster. But these improvements pale alongside the doubling of per-lane bandwidth from PCI Express 3.0 to 4.0. An x16 connection now boasts a unidirectional bandwidth of 32 GB/s. Concurrently, datacenter networking pipelines have broadened into 25 GbE, 100 GbE, 200 GbE, and even 400 GbE (although this lofty speed remains rare).

These bandwidth advances are sorely needed as data volumes continue to swell. According to Statista, the annual size of real-time data globally didn’t reach 1 zettabyte (ZB) until 2014; between 2021 and 2022, it will grow by 5ZB. By 2025, total volume in the datasphere will surpass 50ZB. This ballooning will be reflected across most major datacenters, as providers seek to deliver ever more real-time analysis, transaction processing, and other top-performance I/O services.

In short, CSPs and CDNs have too much real-time data to keep it all next to CPUs, even though that would provide the best I/O performance. The data must be spread across multiple systems. This reality popularized the idea of disaggregating storage from compute, effectively creating large “data lakes.”

The approach also lets IT in enterprises and service providers scale storage without increasing compute and memory, enabling more cost-effective capacity expansion. The faster the networking pipes, the more feasible high-performance disaggregation becomes. Otherwise, I/O demand caused by higher data volumes and bigger real-time workloads will create a bottleneck in the network fabric.

“With the [PCIe] Gen 4 interface and these faster networks, the amount of data that can go to your storage is so big that you need dozens of SSDs to absorb the data coming from the pipe,” explains Intel senior manager of product planning Jacek Wysoczynski. “Within that, you want top-performance SSDs to serve as a buffer and de-stage to the data lake. Say each storage box has 24 slots. If you only need two of those to be buffer drives, that’s one thing, but when you need 12 of them to buffer, that’s something different. Now you’re risking overflowing the storage box all the time. If that happens, data can’t be written to storage, which will pause the networking traffic, which temporarily stops the datacenter. That’s a ‘sky-is-falling’ moment.”

The situation Wysoczynski describes involves SSD overprovisioning, typically done to improve storage performance and/or endurance. Imagine having an 800GB SSD, but only making 400GB visible to the host. The invisible space can be allocated to activities such as additional garbage collection, which will help improve write performance. It can also help keep usage below the 50% capacity threshold, above which drive speeds can start to decline. An Intel white paper details how SSD overprovisioning can also significantly improve drive endurance. The downside, of course, is the cost of potentially massive amounts of unused capacity. Without a better alternative, overprovisioning was the highest-performance (if costly) option for storage buffering. Fortunately, that’s now changing.

The Optane alternative

Since their arrival in 2017, Intel Optane SSDs have provided a higher-performance alternative to even enterprise-class NAND SSDs on both write (especially random workloads) and endurance metrics. In write-intensive, real-time storage application settings such as CSPs or any sizable datacenter implementing elastic block storage at scale, Optane SSDs excel in buffering roles. However, the growing bandwidth in datacenter networks, now coupled with rising PCIe bandwidths, have changed the dynamics of how storage should be deployed.

Consider the following figures from Intel’s white paper “Distributed Storage Trends and Implications for Cloud Storage Planners.” Note the emphasis on achieving 90% network bandwidth, which is what datacenter admins often consider the “sweet spot” for maximizing bandwidth value.

Above: Then: With components and connectivity from circa 2018, Optane and NAND SSDs were roughly similar in their ability to fill a network pipe.

Given the prevalent technologies of the era, 90% saturation could be achieved on a 25 GbE connection by only two Optane P4800X drives on PCIe Gen 3. A then-high-performance SSD like the P4610 couldn’t supply as much I/O as the P4800X, but the two weren’t miles apart.

Above: Now: Updated to 2021 technologies, it becomes clear how difficult it is for NAND storage to make effective utilization of networking bandwidth.

With 100 GbE and PCIe Gen 4, the situation changes significantly. Keep in mind that the new 400 GB P5800X offers impressive leaps in performance over the 3.75GB P4800X across several key metrics, including 100% sequential write bandwidth (6200 vs. 2200 MB/s, respectively), random write IOPS (1,500,000 vs. 550,000), and latency (5 vs. 10 µs), all of which contribute to much more storage traffic. Thus, despite the quadrupling of network bandwidth, it only takes three second-gen P5800X Optane SSDs on a PCIe Gen 4 bus to nearly fill that Ethernet link. In contrast, up to 13 current-gen NAND SSDs are needed to supply the same network I/O, depending on the workload. Not surprisingly, the numbers roughly double when stepping up to 200 GbE.

The big point: Optane can use low-capacity drives and still achieve huge performance and endurance leads over drives four or eight times bigger – 1.6TB/3.2TB NAND SSDs.

Another factor to consider is performance consistency. High-volume mixed read/write loads can be particularly grueling for SSDs. Numerous Intel studies have examined how Optane media retains consistent, very low I/O latency over time when stressed under complex, heavy workloads. In contrast, NAND SSD responsiveness tends to deteriorate over time with similar conditions, making application quality of service difficult to maintain.

Real-time implications

Intel’s “Distributed Storage Trends” paper discusses a common datacenter scenario with dense storage racks, 100 Gb/s Ethernet, and 90% I/O saturation. The bottom line is that three Optane P5800X SSDs can do the buffering work of 13 TLC NAND SSDs, leaving room for many more bulk storage drives per enclosure. Intel claims this leads to a “12.6% improvement in cost per GB of raw storage,” including both Capex and Opex savings over three years of power use.

This strategy of using Optane SSDs to provide sufficient buffering performance for current and coming data volumes flowing over expanded I/O conduits will interest CSPs, CDNs, and infrastructure-as-a-service (IaaS) providers offering storage. That said, the performance and cost advantages of Optane SSDs in this scenario could also apply to compute server clusters with considerable local-attached storage, provided the cluster was handling multiple large data sources at once for real-time processing. The Optane SSD can offer greater cost-efficiency, higher total performance, and fewer sources of admin frustration.

“It’s very common for people to try to optimize workloads and be kind to their SSDs,” says Andrew Ruffin, Intel strategy and business development manager. “You can try to only stream sequentially, do the deep queues, and so on. But when you have multiple nodes hitting the same data lake, it just becomes random traffic with zero sequentiality. No matter how much you over-provision or whatever, when you have those multi-tenant environments, it will be hard on the storage devices. This is why it’s essential to understand the need to optimize your device for the traffic.”

Source: Read Full Article