Revolutionizing RAID: Graid's Supreme Solution for Modern Data Challenges

TechArena Podcast hosted by Allyson Klein and Jeniece Wnorowski

Learn how Graid Technology is using flash storage to meet the most challenging high performance compute environments. Where traditional hardware RAID solutions cannot keep up with the performance of today’s solid state drives, and software RAID struggles with scalability, Graid steps in with its solution to take RAID processing off of the CPU. With this solution, data can flow back and forth to the applications directly from the drives, delivering maximum performance. 

Traditional RAID struggles with high-performance computing demands. Data integrity is one of the key aspects of RAID, providing redundancy to protect against disk failures. The trouble is that with the advent of the massive amount of data generated by AI and machine learning, performance has suffered with these traditional solutions. 

So how do we protect data integrity while providing the performance customers demand in today’s data-intense environment? Graid’s SupremeRAID solution addresses the data bottlenecks so companies can access data fast and reliably. Listen to Kelley to learn more about how Graid is using Solidigm SSDs to innovate storage solutions to meet the size, speed, and complexity of today’s modern workloads.

 

Audio Transcript

This transcript has been edited for clarity and conciseness

Narrator: Welcome to The TechArena, featuring authentic discussions between tech's leading innovators and our host, Allyson Klein. Now, let's step into the arena.

Allyson Klein: Welcome to The TechArena Data Insight Series. It's time for another podcast about data insights, and that means Jeniece Wnorowski of Solidigm is back in the studio with me. Welcome back to the program, Jeniece.

Jeniece Wnorowski: Yay, thank you for having me back, Allyson. It's a pleasure.

Allyson: So, Jeniece, how have you been spending your time since the last podcast, and what have you been picking up on data insights?

Jeniece: Oh, my gosh, [I’m] spending so much time attending various shows. We just landed at Flash Memory Summit, which was a really good time meeting with a lot of our partners and customers. And I am super excited to introduce Kelley Osburn. Kelley Osburn is the Senior Director at Graid and, frankly, just one of my most favorite people to talk to. So we're going to chat with you today.

Allyson: Welcome to the show, Kelley. How's it going?

Kelley Osburn: Thanks. Everything's going great. I went to FMS as well and got to hang out with Jeniece and the Solidigm folks. Very cool show and I made it home without catching any kind of bug.

Allyson: Oh, that's nice. That's wonderful. Let's just get started then. Graid is obviously known as a leader in storage solutions, and I've been reading up on your solutions. The SupremeRAID products have captured the attention of many enterprises. Can you provide some background on SupremeRAID and how Graid came to deliver it to the market?

Kelley: Yep. I think it's pretty obvious that the advent of flash storage has taken the market by storm and it's growing extremely rapidly. Compounding that, we're moving to NVMe, so away from SAS and SATA type interfaces to direct attach NVMe flash storage. And then on top of that, we're seeing a large number of drives that are being able to be put into a server. So, with the new EDSFF form factors, we can see as many as 24 SSDs in a single one-use server platform. And then when you take a look at what Solidigm is doing with these massive drives that hold up to 61TB today, and we know that they're going to be going even higher next year, being able to protect those and being able to protect them with RAID in a timely fashion becomes paramount. [You have] a huge amount of data when you stuff 24 drives into a server like that, and you have to make sure that the data is protected, especially in these large machine learning workflows and large databases, et cetera. We identified a problem in the market where traditional hardware RAID solutions simply cannot keep up with the performance of these drives. And software RAID, which is the alternative to a hardware RAID solution, also has problems with scalability because it consumes a very large percentage of your CPU to handle the data projection tests. So what we came up with was an idea to write our own software RAID stack that we can deploy on an NVIDIA GPU. So we are NVIDIA Inception Partners, and we have written our software to be a CUDA-based driver that runs on an NVIDIA card. And this does two things. It takes all of the RAID processing off of the CPU. And the second thing we did is put out a creative patent to the peer-to-peer DMA [Direct Memory Access] technology that allows us to communicate with the drives and get out of the data path so that the data can flow back and forth to the applications directly from the drives, thus delivering maximum performance. So that's in a nutshell what we're doing.

Jeniece: Super interesting stuff, Kelley, and thank you for the background just on SupremeRAID, how it's utilizing the high-density storage, how it's utilizing the GPU in an innovative manner. Can you share with us a little bit around what type of end customers might be using this type of technology? And I won't put you on the spot. You don't have to name names.

Kelley: The customers that really need this kind of technology are the ones that are buying very dense servers with a large amount of storage. And they've been struggling to generate or get the performance that was promised to them from, let's say, 20 SSDs[CW1] . And so what they're looking for is really low latency, high response. So we see FinTech environments where it's high-frequency trading, high-performance databases like Redis and Oracle and Couchbase and things like that. We're also involved in large super-compute environments and we partner with parallel file system companies like BGFS to be able to provide very high-performance data protection in the storage node that can then be expanded across a number of nodes into a larger file set into the petabytes of capacity. Also, things like machine learning. We have customers who are using us for really high-performance data collection, things like Splunk servers at a major credit card company. And also in the military we're involved in classified data acquisition technologies that meet really high performance in a small footprint because they're in a mobile platform and so weight becomes a critical issue in something like an aircraft.

Allyson: Kelley, you just touched upon so many interesting use cases and innovative use cases, but I have to ask you about the elephant in the room, which is AI. And obviously, AI is placing a tremendous stress on the data pipeline and serving up data at every stage between training and inference. How do you see RAID solutions being tapped to fuel these workloads?

Kelley: The first part of that is the data sets that we're seeing for these models are very large. So they span multiple disks, obviously, and you need to make sure that you protect that data. So that if a drive fails, you can replace that drive and not lose your data. And [you’ll] be able to rebuild quickly because you want to make sure that in a rebuild situation, if there's a problem, you don't go into a degraded state on the host. The second part of it is, how do we deliver the maximum read performance for these types of environments? Things like really fast I/O that can reduce the time spent reading input data, the model training times can be improved when you do this. So, I like to say that AI and machine learning isn't an event, it's actually a workflow. And really high-speed IO can smooth out these workflows. And on the most important part of this, I think, is making sure that the extremely expensive GPU assets and accelerators that these customers and companies are buying can get full utilization. If they're sitting there waiting for data, then you're not getting your money's worth out of these very expensive assets that you purchased. And so delivering data very, very timely is how we help in these workflows.

Jeniece: So delivering that data very timely, I agree, is key. But tell us a little bit more, Kelley, about how your customers or new perspective customers should look at your technology. How do they deploy it? How do they look at it differently? Why should they use this Graid product?

Kelley: We actually very much simplify the way you can deploy storage in a server. Instead of having a physical card that has all the drives directly attached to it, which creates a cabling mess, we actually work with servers that have the drives directly attached to the motherboard. And when we put the NVIDIA card into that server and load our driver on it, it now looks like a RAID controller to the host operating system instead of a graphics card. We can then communicate with those drives across that PCIe root complex, if you will, and act like a traffic cop. And so the way it gets deployed gives you a huge amount of flexibility in the data protection levels you want, how many virtual disks you can create. We support things like NVMe over Fabric [NVMe-oF] to provide access to data outside of the server. And then the customers would then present those virtual disks to their applications instead of the physical disks. And that we create that huge amount of performance to feed those expensive GPUs.

Allyson: Now, I know that in all of this, you also need some fantastic storage media. And I know that you work with Solidigm. Can you talk a little bit about the collaboration between Solidigm and Graid and how that plays into the solutions that you've been talking about today?

Kelley: Absolutely. The Solidigm relationship that we have has been very strong. We've done extensive testing in their engineering labs, as well as ours. And we actually had an opportunity to put together a solution with Solidigm for the large NAB show [National Associate of Broadcasters show]. So in the media and entertainment industry, we partnered together to create an environment that involved several other companies whereby we could have removable disc cartridges and a server full of the Solidigm 61TB drives, which are ideal for recording large amounts of digital video. The removable cartridges gives you the ability to move those large files to another location for post-processing and computer graphics and things like that. And then we had an additional partner that provided a really high-speed file access layer called Tuxera. And between the four of us, we created a pretty interesting solution. The server manufacturer was called CheetahRAID.

[Read more about CheetahRAID and Solidigm here: Empowering AI at the Edge]

And we actually just won a Best in Show award at the FMS show two weeks ago in Santa Clara. So we were very, very pleased with that solution and are now working with a number of different media and entertainment organizations who are interested in this solution that we build.

Allyson: That's so cool.

Jeniece: Yeah, it is really cool. It was shocking to get that award alongside the great team and really appreciate all the interesting work you guys did there, Kelley. And speaking of that specific solution with media and entertainment, as you're going out and talking with other customers, what are you seeing in terms of changes in storage requirements? And how does that shape where you're taking your solutions into the future?

Kelley: So it's the combination of capacity and performance. And sometimes customers need one or the other, or sometimes they need both. These Solidigm drives are capable of delivering the performance with a huge amount of capacity. And so some existing opportunities that I'm involved in where these drives are being heavily considered are for MapR servers, where over time they're going to be collecting a large amount of data, and they need to have access to that for analytics. It has to be very fast, but they need that deep storage and have it really be available whenever they need it. And so these drives are capable of delivering that. The other side of it is where we see companies who want really high I/O per second. And in that kind of an environment, you might go with a larger number of smaller drives because it's not a capacity play, it's a straight out performance play. But if you need a significant capacity with maximum performance, we have found that these Solidigm QLC drives deliver.”

Allyson: Now, Kelley, when you take a look out in the future, obviously we're heading into the end of 2024. We're heading into the 2025 timeframe. What do you see on the horizon for innovation? And is there anything that's queued up that Graid plans to take advantage of with your ecosystem?

Kelley: I think the biggest thing that we're starting to see is samples of PCIe Gen 6. So, Gen 4 is pretty prominent. Gen 5 is really coming out. We're starting to see Gen 5 drives. They're extremely fast, which exposes bottlenecks of traditional RAID technologies even more. The faster they get, the worse the performance is for customers who try to deploy them with these old technologies. So, that really plays into our hands. We're also starting to work with some companies who want to use our software embedded in systems that you might not see. So, backup appliances and other things where we're just a software layer providing that high-speed performance inside of a solution like that. And with the PCIe Gen 6, the performance is only going to continue to increase. And because of the new form factors like E3 short, E3 long, E1, etc., or the EDSFF form factors, more and more servers are going to come out with larger and larger numbers of drives. And so, part of our solution is increasing our ability to support that number of drives and still deliver the performance that our customers are expecting.

Jeniece: Yeah, I mean, software RAID is where it's at, Kelley, and Graid is delivering in such a big way. And we thank you so much for just touching on all of our questions here. But I know others might have more. So where can folks go to find more information about the solutions that we discussed today and engage with you and your team?

Kelley: Yep, so we have a website, of course, graidtech.com. Once again, we are Graid Technology, Inc. Our product is called SupremeRAID. You can also find us on LinkedIn and Twitter or X. And then obviously you can reach out to us and we can have a salesperson or technical support type customer engineer communicate with you to answer any questions.

Allyson: That's so awesome. Thank you so much, Kelley, for being on the show today. It was a lovely interview and I learned a ton about Graid and about where we are going with the data pipeline. So thank you. And Jeniece, yet another interview is in the books for us. It's always a pleasure to do these data insights podcasts.

Jeniece: Same here. Thank you so much, Allyson.

Narrator: Thanks for joining The TechArena. Subscribe and engage at our website, thetecharena.net. All content is copyrighted by The TechArena.

Used with permission.