Zhengrui Technology and Solidigm SSDs

Zhengrui Technology uses Solidigm high-density storage to accelerate innovation in animal husbandry

DNA strand for modern livestock breeding using Solidigm high-cap SSDs for data lakes.
DNA strand for modern livestock breeding using Solidigm high-cap SSDs for data lakes.

Modernized livestock farming meets the growing demand for animal products such as meat, eggs, and milk, and brings new opportunities of large-scale scientific breeding and genetic research. With the help of processes on genomic selection, intelligent breeding and feeding, disease diagnosis, intelligent monitoring and management, data mining and decision support, AI technology optimizes livestock genetic breeding, improves livestock production efficiency and health, and promotes the development of the livestock industry.  

Artificial intelligence (AI) technology is optimizing livestock genetic breeding in numerous ways. Some examples include:

  1. Genomic selection: The genetic traits of individuals can be predicted through deep learning and analysis of a large amount of genomic data to optimize the breeding program, taking into account traits such as growth rate, meat production, milk production, and more. 
  2. Intelligent breeding: The breeding performance of female livestock and the genetic traits of their offspring can be predicted with the help of AI technology. Through the comprehensive analysis of genomic information, physiological information, and environmental information AI can improve breeding success rates.
  3. Intelligent feeding: The amount of feeding and nutrient composition can be intelligently adjusted for livestock, taking the physiological needs, growth stages and environmental factors into consideration to improve growth rate and production efficiency.
  4. Disease diagnosis: AI technology can detect abnormalities through real-time monitoring of livestock behavior and physiological indicators. It can predict the occurrence and spread of diseases through the analysis of historical data, allowing early diagnosis and treatment.
  5. Intelligent monitoring and management: Real-time monitoring and automatic adjustment of environmental parameters such as, temperature, humidity, carbon dioxide concentration, etc., can boost the health of the livestock and the comfort of the farming environment. 
  6. Data mining and decision support: Mining and analyzing breeding data provides decision support for farmers and managers. They can formulate optimal breeding plans, feeding programs, and disease prevention and control strategies.

With the help of AI technology, a livestock research institute and client of Zhengrui Technology collected and labeled a large amount of biological data, such as genomes and production capacity phenotypes, then trained deep-learning models like intelligent identification of livestock diseases and prediction of breeding value. The combination of AI and genetic technology enables the development of the livestock industry to move forward in a brand-new way and at a brand-new speed.

Researchers reviewing biological data from combination of AI and genetic technology using Solidigm high-capacity SSDs.

 AI applications in animal husbandry

With AI application of big data, machine learning, multimodal modeling, and other technologies in animal husbandry, the concept of data warehousing has undergone significant changes. Before AI was able to help analyze data, the electronification of manual records in the livestock industry faced bottlenecks, lessening their value. Instead of providing useful insights, data was largely archived, becoming “cold data” that was not used to its full potential. AI technology has allowed farmers and livestock managers to re-explore this archived data, as well as further expand the scope and frequency of data collection, resulting in a massive influx of original, raw, unstructured data into a "data lake" that they can then use for making decisions in management and breeding.

The Institute of Animal Husbandry generates terabytes (TB) of data daily. Traditional data storage and management models are unable to cope with the rapid processing of large-scale and complex data, including structured data, semi-structured data, unstructured data, and binary data (images, audio, and video). AI allows the data in a data lake to be converted from raw data to targeted data for tasks such as reporting, visualization, analytics, and machine learning.  

Configuration details

The livestock research institute has constructed an arithmetic base to process and utilize the data collected, with four big data computing servers, four inference servers, and eight storage servers as the infrastructure. 

  • Each of these computing servers is configured with four NVIDIA A100 GPUs, which are mainly used for data mining and training
  • Each of the inference servers is configured with eight NVIDIA RTX 3090 GPUs, which are used to perform visual neural network operations on a large amount of image and video data
  • The storage servers are configured with large-capacity mechanical hard disks to form distributed storage
  • The server network is linked with 200G InfiniBand switches

Data overview

With the import of historical data, as well as the accumulation of data from daily R&D and operation, the constraint on storage and arithmetic power gradually appears. The R&D and operation process involves a huge amount of data, but the data value density is decreasing daily, and there exists a large amount of duplicate data, resulting in sparse effective data. The reading load is too large, the reading efficiency of the original storage system decreases significantly, the arithmetic power cannot be fully utilized, and the R&D efficiency decreases. To improve the data quality, a large amount of data preprocessing work is also required, which introduces more read and write operations and occupies many resources.

Researcher reviews AI-trained data from data lake preporsessing to improve data quality for AI in animal husbandry.

Revisiting storage system requirements for AI data lakes

For smaller data lakes, HDDs provide good read throughput, with write throughput limited by network bandwidth. However, as data sizes grow, cross-node access to distributed storage consumes a lot of network resources, causing efficiency issues even with high-performance 100G Ethernet or InfiniBand networks with increasing overhead for high-speed networks. 

Even more significant than network costs, though, is idle arithmetic power. With the rapid development of AI arithmetic such as GPUs and TPUs, greater data throughput is required to meet computational demand, which puts higher demands on the performance of the storage system. Considering how expensive AI arithmetic builds are, idle arithmetic waiting for data ingest is a huge waste of resources.

Switching to SSDs as a storage medium is a logical step to cope with the high demand for storage performance. But this shift needs to combat several problems:

  1. All-flash solutions need to balance cost and capacity.
  2. Hybrid solutions or tiered storage is less than ideal for training massive AI data sets. Data must be accessed numerous times during the training process, which means that “cold” storage on HDDs will create bottlenecks.
  3. The cost of high-performance cloud storage is high. A typical model for big data applications is to host the computation and store it in the cloud, utilizing the technical strength of public cloud vendors to address the management costs of data security redundancy, tiering, and other aspects. However, the premium price of high-performance cloud storage is much higher than that of basic cloud services, making cost pressure for long-term operations. Add to that, the cost of daily data collection traffic and you have an expense that is difficult to justify.

Traditional distributed storage sore points

  1. Performance constraints: The storage system is facing a hundred terabytes of data growth, the read efficiency decreases, and the distributed computing time elongates which seriously restricts the research process.
  2. Drop in cost-effectiveness: The storage cost of cloud computing center is high and may become unsustainable, while the use of commercial small-capacity SSDs may face reliability and management bottlenecks. Legacy storage servers have outdated architecture, a large footprint, and high heat and power consumption compared with today’s high-capacity SSDs.

Solidigm Gen 4 QLC SSDs offer breakthroughs

One practical answer to these storage problems is Solidigm 192-layer 3D NAND. It delivers an industry-leading density of 18.6GB/mm², which is substantially ahead of volume densities of competing products. Solidigm delivers unrivaled high storage density SSDs that address the industry's high-density, power efficiency, and high-throughput sore points.

Solidigm D5-P5336 is part of Solidigm's fourth generation of QLC SSDs for the data center, delivering an industry-leading combination of high capacity (up to 122.88TB*) and read-optimized performance with support for high-throughput read and data-intensive workloads. Its architecture is designed to efficiently accelerate and scale increasingly large data sets in widely deployed read-intensive workloads while increasing storage density, lowering total cost and enabling a more sustainable storage infrastructure than TLC SSDs and HDD-based solutions.

Solidigm form factors for up to 122.88TB SSDs Table 1. Solidigm SSD form factors U.2 and E1.L up to 122.88TB capacity, and E3.S up to 30.72TB capacity.

It is now technically and economically feasible to build all-flash servers based on single-tier storage with the entry of high-capacity, high-performance QLC SSDs from Solidigm into the market. Single-tier storage designs significantly reduce the technical difficulty of developing and deploying storage servers and provide more consistent, predictable performance.

Storage servers deploying high-density QLC SSDs achieve petabyte-scale single-node storage capacities that are beyond the reach of mechanical HDDs. The single-tier storage media design also saves the capacity, space, and energy consumption occupied by cache disks, further consolidating each node's advantage in storage density.

High density storage nodes also save rack space, energy consumption, and network port overhead. If the user's target storage capacity can be realized in a single node, it will greatly reduce the difficulty of deployment and operation and maintenance. Solidigm D5-P5336 122.88TB in the U.2 form factor can already achieve a capacity of up to 4PB in a standard 1U server, which can help to move enterprise data completely into all-flash storage, giving them the opportunity to exploit the value of dormant, or “cold,” data.

Table shows Solidigm D5-P5336 122.88TB in the U.2 form factor to move enterprise data into all-flash storage for cold data.
Table 2. Solidigm D5-P5336 Features1
TCO savings from D5-P5336 high-capacity all-nearline storage vs TLC or hybrid array.
Table 3. D5-P5336 TCO savings vs other storage configurations.1

Zhengrui Technology collaborates with Solidigm: Enabling AI reasoning, accelerating storage innovation

Zhengrui Technology used Solidigm high-capacity QLC NVMe SSDs to create a set of animal husbandry bio-genetic data storage solutions. The high-density, high-reliability, and scalable storage platform provided both efficiency and cost-effectiveness.

Zhengrui Tech server capable of holding 24 Solidigm SSDs for a maximum possible capacity of 700TB of data storage.

Figure 1. Zhengrui Tech server capable of holding 24 Solidigm SSDs for a maximum possible capacity of 700TB of data storage.

Features

  1. Adopting all-flash technology can support up to 24 hot-swappable NVMe SSDs running on full-speed PCIe 4.0 with Solidigm D5-P5336 30.72TB SSDs. High IOPS is complemented by high-capacity density, and a single node can provide 1 million+ IOPS and more than 700TB of storage space.
  2. The 4th/5th generation Intel Xeon Scalable processors also provide core support for AI acceleration. With the new storage end, it can filter out non-critical data as soon as it is written, reducing network load, and providing excellent data reliability.
  3. 2U all-flash servers take full advantage of the storage density, storing the data of multiple cabinets into just a single cabinet, which improves space utilization and effectively reduces the total cost of ownership for users.
AI data lake storage configurations and efficiency values for a variety of storage devices.
Table 4. AI Data Lake Storage Configurations and Efficiency Values2

Solidigm, together with Zhengrui Technology, was recognized by customers for this tailor-made solution, which helps them solve multiple problems: Meeting the data volume and performance need with high-capacity QLC single disk

In this typical AI deployment, data storage nodes are prone to performance hotspots due to competition for resources, making computation units stall waiting for data. Solidigm QLC drives can meet the need of petabyte-scale data storage maintaining low-latency, stable and reliable performance. 

More scalability

When switching from 18TB HDDs +TLC to 30.72TB Solidigm D5-P5336 SSDs, rack space was reduced by 79%.2 This in effect reduces the footprint of storage cabinets and rack spaces and gives more flexibility to scale up compute nodes within a particular space budget.

Save energy consumption, reduce space occupation

This greatly reduced number of storage devices deployed, reducing the power consumption and cooling energy need of overall deployment, thereby impacting the construction and operating costs positively. The power consumption with hybrid solution (18TB HDDs + TLC ) was 57,600W, compared to 12,000W with All Flash Solution (Solidigm D5-P5336 SSDs) — a power reduction of 79%.

Comparison of Zhengrui Tech & Solidigm array vs traditional mixed-flash storage Figure 2. Zhengrui Tech & Solidigm Array vs Traditional Mixed-Flash Storage

Even if a mixed-flash storage model is adopted, it still must face a series of accompanying problems such as large space occupation and complex management scheduling logic.

Zhengrui Tech + Solidigm solution requires only 2U of space to achieve 700TB class storage. To further increase storage density, options include adopting 61.44TB or 122.88TB single disk, or further introduction of EDSFF form factor support.

Conclusion

Ultra-high density all-flash storage provides stable and efficient IO support for computing centers, improves the overall computing throughput rate, and ensures the output of results. With Solidigm QLC SSDs, customers simplify the operation of the server room and reduce construction and management costs, reducing spent and increasing efficiency.


Disclaimers and notes

*Solidigm D5-P5336 122.88TB availability Q1’25.

  1. D5-P5336 product brief
  2. Solidigm does not control or audit third-party data. You should consult other sources to evaluate accuracy.

All product plans, roadmaps, specifications, and product descriptions are subject to change without notice.  
 
Nothing herein is intended to create any express or implied warranty, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, or any warranty arising from course of performance, course of dealing, or usage in trade. 
 
The products described in this document may contain design defects or errors known as “errata,” which may cause the product to deviate from published specifications. Current characterized errata are available on request. 
 
Contact your Solidigm representative or your distributor to obtain the latest specifications before placing your product order.  
 
For copies of this document, documents that are referenced within, or other Solidigm literature, please contact your Solidigm representative. 
​​​​​​​  
All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. 
​​​​​​​ 
© Solidigm. “Solidigm” is a trademark of SK hynix NAND Product Solutions Corp (d/b/a Solidigm). Other names and brands may be claimed as the property of others.