Alluxio, a global leader in AI caching solutions, delivers the fastest cache for GPU-based AI workloads. Its scalable architecture supports tens of thousands of nodes, significantly reducing storage bandwidth consumption. The success of large language models (LLMs) around the world is largely due to Alluxio’s cutting-edge approach to solving AI storage challenges.
"Solidigm partnered with Alluxio to deliver a superior distributed AI caching solution. Using Solidigm's D5-P5336 as a read cache and D7-PS1010 for checkpoint writes with Alluxio's minimal overhead solution offers our customers the best combination of cost and performance for large-scale AI workloads. We optimized this solution to leverage full write bandwidth of a Solidigm D7-PS1010 Gen5 TLC SSD and the read bandwidth of Solidigm D5-P5336 Gen4 QLC while maintaining a write-amplification of 1.02 on TLC and QLC SSDs. Together, we hope to continue to deliver cost- and performance-optimized, low overhead solutions for our customers' AI needs,” says Greg Matson, Senior Vice President, Strategic Planning and Marketing at Solidigm.
DORA, short for Decentralized Object Repository Architecture, is the next-generation architecture of Alluxio. As a distributed caching storage system, DORA offers low latency, high throughput, and cost savings while aiming to provide a high-performance data access layer for AI workloads. DORA leverages decentralized storage and metadata management to provide higher performance and availability, as well as pluggable data security and governance, enabling more scalability and efficient management of large-scale data access.
The architecture consists of four key components: the service registry, scheduler, client, and worker. Together, these components manage tasks such as service discovery, distributed load scheduling, and data storage, all while maintaining optimal performance across the system.
DORA utilizes a battle-tested page store module for cache storage, enabling more granular caching of small to medium read requests on large files. This reliable page store technology has been proven in applications like Presto at Meta, Uber, and TikTok. DORA’s fine-grained caching has reduced read amplification by 150 times and increased file position read performance by up to 15X.
The Page Data Store leverages a journal file system and organizes data into two levels of directories with fixed, large-sized chunk files. All writes are appended to these chunk files, and when objects within them are marked for deletion, the files themselves are only removed when no longer needed. This design ensures optimal performance, even with PCIe 5.0 TLC SSDs, while maintaining an SSD Write Amplification Factor (WAF) of 1, which maximizes SSD endurance.
For example, when using Alluxio’s read cache with QLC SSDs, the storage engine fully utilizes the endurance of QLC without causing any internal or garbage collection WAF, ensuring efficient operation on QLC NVMe SSDs.
DORA spreads metadata to every worker to ensure that metadata is always accessible and available. To optimize metadata access, DORA utilizes a two-level caching system for metadata entries. The first level of caching is the in-memory cache, which stores metadata entries in memory. This cache has a configurable maximum capacity and time-to-live (TTL) setting to set an expiration duration. The second level of caching is the persistent cache, which stores metadata entries on disk using RocksDB. The persistent cache has unbounded capacity, depending on available disk space, and also uses TTL-based cache eviction, avoiding any active sync or invalidation. The stored metadata is hashed by the full UFS path like the Page Store.
The combination of in-memory and persistent caching helps ensure that metadata is readily available and accessible, while also allowing for efficient use of system resources. The decentralization of metadata avoids the bottleneck in the architecture where metadata is primarily managed by the master nodes. With the ability to store up to 30 to 50 million files per DORA worker, the system can support large-scale data-intensive applications with billions of files.
Solidigm has verified that the combination of in-memory metadata design and RocksDB offers the optimal metadata storage solution. This design fully utilizes the read and write speeds of PCIe 4.0 QLC SSDs (7GB/s read, 3GB/s write) and PCIe 5.0 TLC SSDs (14.5GB/s read and write). Additionally, RocksDB consolidates many small writes into larger, sequential 2MB writes using a skiplist-based write buffer, which is highly efficient and helps minimize SSD WAF, further enhancing SSD endurance.
Storage Server― Intel Gen5 | |
---|---|
OS | Fedora Linux 40 (Server Edition) |
Kernel | 6.8.5-301.fc40.x86_64 |
CPU Model | Intel(R) Xeon(R) 6740E 2 x sockets @2.4GHz, 96 cores/per socket |
NUMA Node(s) | 2 |
DRAM Installed | 256GB (16x16GB DDR4 3200MT/s) |
Huge Pages Size | 2048 kB |
Drive Summary | 2x Gen5 TLC Solidigm D7-PS1010 8TB FW Revision:G70YG030 PCIe Gen5x4 2x Gen4 QLC Solidigm D5-P5336 60T |
FIO | 3.37 above or latest version |
Alluxio | AI version |
Alluxio load ingest cache SSD | ./bin/alluxio job load --path file:///mnt/qlc/alluxio/data --submit |
Alluxio fuse read fio | fio -engine=libaio -bs=256K --rw=read -group_reporting -directory=/mnt/fuse/fusedir/test1/multiple_files -name=read_test -direct=1 -numjobs=16 --nrfiles=1 -openfiles=1 -size=16G --alloc-size 1024000 fio -engine=libaio -bs=1024K --rw=read -group_reporting -directory=/mnt/8Greadfuse/alluxiofuse/local -name=wayne_read_test -direct=1 -numjobs=128 --nrfiles=1 -openfiles=1 -size=4G --alloc-size 1024000 --readonly |
File system | XFS |
In a recent experiment using Intel's Gen5 BNC storage server with Solidigm D7-PS1010 and D5-P5336 SSDs, Alluxio demonstrated its capabilities for fast data ingestion and read performance, especially with GPU scaling. Notable results include:
We set up a single-node test for quick deployment to showcase the power of the Alluxio storage engine. It's important to note that Alluxio's greatest strength lies in its ability to leverage a host-side distributed replicated cache, which scales with GPUs and significantly reduces north-south storage bandwidth overhead. Even in single-node setups, Alluxio demonstrates exceptional efficiency, especially when paired with high-performance NVMe SSDs. For this test, we configured the cache with either PCIe 5.0 TLC or PCIe 4.0 QLC, while the underlying file store (UFS) used PCIe 4.0 QLC.
Alluxio Load Test | Cache SSD write BW(MB/s) | UFS read BW (MB/s) | Cache SSD WAF |
---|---|---|---|
Solidigm D7-PS1010 | 6823 | 6923 | 1.02 |
Solidigm D5-P5336 | 3341 | 3613 | 1.02 |
1. The Alluxio cache load engine is highly efficient, capable of saturating the maximum read bandwidth of UFS QLC and ingesting data into PCIe 5.0 TLC cache SSDs. Even if the UFS supports a read bandwidth of 10GB/s, Alluxio can easily saturate the 9.3GB/s write bandwidth of the Solidigm D7-PS1010.
2. Alluxio’s page cache storage engine, built on the XFS journal file system, has been rigorously tested using the Solidigm Alluxio FIO emulator for longevity. The results show that, regardless of whether TLC or QLC SSDs are used, Alluxio achieves a WAF of 1.02. This near-ideal WAF of 1 maximizes SSD performance and endurance, delivering optimal results to end users.
Alluxio Fuse Test | Cache SSD Read BW(GB/s) |
UFS read BW (GB/s) |
---|---|---|
Solidigm D7-PS1010 | 14.8 | 0 |
The FUSE read overhead is minimal. When performing a FUSE read with a 100% cache hit on the SSD cache and bypassing the DRAM page cache, the FUSE framework can nearly saturate the read bandwidth of a single PCIe 5.0 SSD, reaching 14.8GB/s.
Solidigm 61.44TB D5-P5536 seq write PBW | 5 years avg util | Write BW support |
---|---|---|
213 | 50% | 2900MB/s |
For customers seeking high-density cache solutions, Solidigm’s 61.44TB QLC SSD is an ideal option. Alluxio’s storage engine is highly WAF-friendly, excelling as a read-heavy cache with minimal writes, making QLC a perfect match for its read cache path. Thanks to Alluxio's efficient design, we can estimate the endurance of a QLC cache device based on a sequential write PBW of 213PB, which means Solidigm supports writing up to 213 petabytes of data. With 50% utilization over a 5-year period, which is a high threshold, QLC still supports a write bandwidth of 2900 MB/s, nearly reaching its maximum. Additionally, QLC provides up to 6GB/s random read bandwidth per SSD, enabling the creation of a highly cost-efficient Alluxio AI cache system.
For read cache, the Solidigm D5-P5336 61.44TB QLC SSD offers exceptional performance and scalability. For checkpoint writes, the PCIe 5.0 Solidigm D7-PS1010 delivers world-class write performance.
Xuan Du, VP of Engineering at Alluxio says, “We collaborated closely with the Solidigm team to validate the performance benefits of running Alluxio’s distributed caching technology with Solidigm SSD and NVMe drives for AI model training workloads. Through our collaboration, we were able to further optimize Alluxio to maximize I/O throughput for large-scale AI workloads leveraging Solidigm drives.”
Xuan Du
Vice President of Engineering at Alluxio
The Alluxio and Solidigm collaboration has yielded results showing that both Solidigm TLC and QLC SSDs significantly enhance Alluxio’s services while reducing operational costs. Solidigm also raises the bar in quality and reliability, supported by a dedicated customer care team that has provided excellent support for Alluxio.
Wayne Gao is a Principal Engineer and Solution Storage Architect at Solidigm. He has worked on Solidigm’s Cloud Storage Acceleration Layer (CSAL) from pathfinding to commercial release. Wayne has over 20 years of storage developer experience, has four U.S. patent filings/grants, and is a published EuroSys paper author.
Yi Wang is a Field Application Engineer at Solidigm. Before joining Solidigm, he held technical roles with Intel, Cloudera, and NCR. He holds "Cisco Certified Network Professional," "Microsoft Certified Solutions Expert," and "Cloudera Data Platform Administrator" certifications.
Jie Chen is a Technical Marketing Architect at Solidigm, responsible for ecosystem enabling for cloud customers, especially in Data placement modes and storage AI. Prior to joining Solidigm, Jie took different technical roles as Application Engineer, Quality & Reliability, Product Development Engineer and Program Manager of varies Flash memory and Persistent memory products.
All product plans, roadmaps, specifications, and product descriptions are subject to change without notice.
Nothing herein is intended to create any express or implied warranty, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, or any warranty arising from course of performance, course of dealing, or usage in trade.
The products described in this document may contain design defects or errors known as “errata,” which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your Solidigm representative or your distributor to obtain the latest specifications before placing your product order.
For copies of this document, documents that are referenced within, or other Solidigm literature, please contact your Solidigm representative.
All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Solidigm may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Solidigm reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase.
Performance results are based on testing as of dates shown in the configurations and may not reflect all publicly available updates. See configuration disclosure for details. No product or component can be absolutely secure.
Solidigm or Intel optimizations, for Solidigm or Intel compilers or other products, may not provide optimized performance to the same degree for non-Solidigm or Intel products. Solidigm or Intel technologies may require enabled hardware, software, or service activation.
Your costs and results may vary.
Solidigm does not control or audit third-party data. You should consult other sources to evaluate accuracy.
Some results have been estimated or simulated using internal Solidigm analysis or architecture simulation or modeling, and provided to you for information purposes only. Any differences in your system hardware, software or configuration may affect your actual performance.
© Solidigm. “Solidigm” is a trademark of SK hynix NAND Product Solutions Corp (d/b/a Solidigm). “Intel” is a registered trademark of Intel Corporation. Other names and brands may be claimed as the property of others.