Riding the AI Data Pipeline With VAST Data

TechArena Podcast hosted by Allyson Klein and Jeniece Wnorowski

TechArena host Allyson Klein is joined by Jeniece Wnorowski of Solidigm as they continue to explore rapid data innovation fueling today’s computing. In today’s episode, they chat with VAST Data’s Global VP of Engineering, Submaranian Kartik, as he describes how his team has delivered a breakthrough data platform for the AI Era.

Learn more about Solidigm storage for AI solutions here. 

Listen to the podcast on:


Audio Transcript

This transcript has been edited for clarity and conciseness.

Narrator: Welcome to The TechArena, featuring authentic discussions between tech's leading innovators and our host, Allyson Klein. Now, let's step into the arena. 

Allyson Klein: Welcome to The TechArena, my name's Allyson Klein, and this is a Data Insights podcast. And that means I have my co-host with me, Jeniece Wnorowski. Welcome to the program, Jeniece, how are you doing? 

Jeniece Wnorowski: Well, thank you, Allyson, I'm doing great. I'm so excited to be here today. 

Allyson: Jeniece, you have been traveling all over the world, and I know that you've been talking to a lot of folks about data and the data pipeline, and we are in for a fantastic episode today. Tell us who is coming on the program to talk to us. 

Jeniece: Yeah, I have been traveling a lot, namely events, and I've just been blown away by the work that our special guest has been doing with the team, and that is VAST Data. So today we have joining us is Kartik Kartik , who is the Global Vice President of Systems Engineering for VAST. Welcome to the show, Kartik. 

Submaranian Kartik: Well, thank you, Jeniece. Much appreciated and fantastic to be with you on another podcast over here. This is great. I remember the last recording we did, and I thoroughly enjoyed it, so looking forward to our conversation today.

Jeniece: Excellent. 

Allyson: So Kartik, I am so excited to talk to you about what you've been doing with the VAST team, but why don't we just get started? VAST has been on the program before, but you've been getting incredible traction in the market for driving the AI data pipeline to new scale and performance. Can you give us a sense of what is shaping this market and how is VAST making progress? 

Kartik: Yeah, so this market is explosive, as you know. Since the introduction of ChatGPT has been this, shall I say, an irrational exuberance in the market around anything connected to the generative AI. And yes, it absolutely, clearly has an enormous promise. It's still in first gen[eration], in my opinion. So a lot of the activity we are seeing is really partly in the enterprise and partly in the cloud as well. There's a new breed of cloud service providers who are specialized in running these kinds of workloads for generative AI, which usually people think of as large language models, but there are other kinds generative AI workloads as well. And the large language models require a large number of GPUs with very sophisticated networking and storage connected to them. And therefore, the hyperscalers, there's a shortage of these things, so this market has exploded. We are heavy participants in that explosion. And we are essentially feeding the frenzy, throwing gasoline on the fire over there. On the flip side is enterprise. Enterprise is very tentative right now. They are still trying to define what they want to do, trying to understand what sort of impact this will have on their business. Are they going to make money? Are they going to save money? Or are they going to stay out of jail? Hopefully a combination of all these three. And so while these steps are tentative, we think that in the next year, two years or so, this wave has just begun. We've only scratched the surface. The expansion of the explosion is on the way at this point. 

Jeniece: Got it. So with that expansion and explosion, can you tell us, Kartik, a little bit about how VASTA is this differently from other storage models? 

Kartik: Sure. For starters, we are not a storage company as you guys know. We're a data platform company. So of course, we're known for our highly scalable, highly performant, online at all times, strongly secure platform, which we really love. And then we expose ourselves to file and object protocols. But we’ve gone quite a bit beyond that and said, why should data only be looked at one set of lenses? Why not structured data as a table? And so we've introduced functions around the database, within our system. This allows for data lake technologies, typically things like Trino and Spark, etcetera, to work natively off of us. We're continuing to move forward and introducing concepts like a global namespace so any data is visible anywhere to any GPU farm. In fact, I just got off a call with a customer showing them how to take a large, long running training job and move it from one data center to another data center without losing that data within minutes. And this is a stunt which is very, very difficult to pull off. So, but probably the most important thing is we look at data as a pipeline rather than something just static. It's a constant stream of data. And that there are different modalities to be able to analyze the data. Some of them are with GPUs and some of them are with more traditional technologies like CPUs. We are the only ones who can do the entire pipeline, core to cloud to edge, as well as through all the different types of data. This is what's been the heart of our success in this market. So, we are now proudly the standard for a large number of the largest of the cloud service providers in the tier-2 space, for sure, and we are making a run with the hyperscalers as well.

Allyson: Now, Kartik, there's been a lot of parallels drawn between traditional high-performance computing platforms and AI training clusters. While you were talking, I could kind of piece together parts of the answer to this question, but for the audience, how do these systems work similarly, and then how do you see the differences between what enterprises and large cloud players are doing with their AI training and what traditional HPC technical computing clusters would perform? 

Kartik: So, both share similarities, both are forms of high-performance computing, obviously. In more traditional HPC environments, still rely on a large number of compute elements which are distributed, tens, thousands of nodes, ten thousand nodes, or even more. And they cooperatively work to solve certain types of problems. That market has been around for over 20 years and is pretty mature. And the primary workloads that they did was what we often call HPC simulations. Large amounts of data are ingested, crunched cooperatively between many, many CPUs, and then that produces an output which is useful. So oil and gas, energy, competition flow dynamics, all these are very common workloads there. Those codes were optimized mainly for large data sets and large scale sequential reads and sequential writes. Large block sequential reads and sequential writes was what really dominates that. The other form of accelerated computing, the era that we're in right now, uses other coprocessors such as GPUs and FPGAs and other things like that. And there the workloads are very, very different. They're very read intensive, yes, but random read intensive. So solid state is a necessary component of the media that needs to underlie that because these technologies are not able to provide the IO performance that's needed for stuff like this. Their scaling and availability characteristics are also somewhat different as well. These systems have to be highly shared systems and are very highly available. Cannot take an outage for anything. Many of the HPC clusters, and we are, by the way, extremely active in high performance computing as well. Some of the largest HPC clusters in the world are on us. But there, it's usually a relatively homogenous sets of workloads that are running. In AI, these are strongly multi-tenant environments, have to be secure environments, classified environments, etc. So they have a different level of requirements compared to what you would see in HPC world. In a nutshell, yeah, need a heavy random read, heavy IO characteristics, especially for things like checkpointing. And therefore, it mandates large scale all-flash systems. 

Jeniece: So Kartik, with that sophistication, right, these workloads are pretty complex. Where in your mind are customers in terms of sophistication and implementing your systems? 

Kartik: Yeah, as I mentioned, right now, the people who are doing the most active work here are the actual model builders themselves, okay? So all of us have heard of models like GPT-3 and GPT-4. These require enormous amounts of data as well as training to get built, or Llama for Meta, Mistral, for example. And these are all people who are on the leading edge of research in here. Clearly, there are many private sector people, too, who are doing a lot of work here; large autonomous driving companies, drug discovery companies are doing this as well. Traditional brick-and-mortar enterprise is more tentative. Like I mentioned earlier, they are still identifying use cases. So they tend to start with pre-trained models, which they would get from Hugging face applications, and then expose that to their internal data through things like retrieval log-venture generation or RAG, and then be able to do inference with something like that. This is going to more, because of a regulatory climate is changing extremely rapidly. The European Union has already passed the AI Act, and that is mandating that certain business sectors and certain types of data, you have to preserve data for a long time. You need reproducibility many months after the fact. So you need to know what data went in, input training, what the outputs are. So, they can repudiate anything anyone alleges about them. US as well, we've all recently seen the new law proposed by Adam Schiff in the Senate, which is requiring that everybody declare any copyrighted information that they may have used for training. This means now it's no longer just a GPU game, it's a governance game. And we're going to have to have compliance archives and controls in place to be able to work with this. We think that over time, people will be training their own models, probably not huge ones, but smaller ones. We may see the emergence of more specialized AI models dominate over highly general models like ChatGPT as things go on. So despite the fact that it's tentative, we're seeing spending pick up and interest pick up quite a lot in the enterprise. The cloud guys, of course, are just going berserk at er this point. They're buying GPUs like they're going out of style, literally tens to hundreds of thousands at a time. 

Allyson: Now, I've been following VAST on The TechArena for the last couple of years. And in fact, you guys are one of my first guests on this platform. You made some really exciting announcements around collaboration with NVIDIA and Supermicro lately. Can you help unpack those and talk a little bit about what these new collaborations with the industry leaders in AI are helping with delivering new capability to your customers? 

Kartik: Absolutely. So I've had the privilege of working with NVIDIA now for over four years. All the initial testing we did with GPUDirect Storage, for high-performance RDMA networks, things that I was deeply involved in all the way through. One of the interesting things about VAST that a lot of people don't realize is even though we do storage and we're a full data platform company, we're not a hardware company, we’re a completely software company. So the hardware stack under us can be very varied. And we've been fortunate, we've been partnering with Solidigm for so long, you guys are anchor suppliers for us for the dense NAND that we need to make our systems affordable and high-performing. And there are other form factors as well which we are exploring. So the Supermicro partnership that we announced, our GTC, is one of those things over here which is, we believe, super important. Prior to this, the shelves that actually held our dense NAND, which were made by our contract manufacturers, tended to be somewhat specialized and bespoke in the sense that even though they were built out of widely available industry components, they required special assembly and care. With Supermicro, what we looked at was to say, can we use a totally generic, off-the-shelf industry standard server instead to be the foundation of VAST? And this is really what we did over the last few months or so. So we take a server with 12 disk shelves, have some storage class memory and some dense NAND and voila, now you got a building block for VAST. We did another thing, which is very interesting. As you know, we are very containerized in our architecture, both our front-end nodes, which handle protocols, as well as the back-end nodes which handle the media, are all essentially docker containers. So we decided to co-locate them on the same server that we have the storage shelves on. So we essentially eliminated a whole layer of server architecture in this mix. And that allows us to have a very highly hyperconverged setup, which has extremely good scale properties. We think this is a fantastic offer for people in the cloud space. It's built for scale, it's built for high performance, it's built for ease. Probably most importantly, it's built also for low form factor and for low power, which are increasingly critical in this space.

Jeniece: Got it. And then at Kartik, you mentioned your work with Solidigm and having worked with Solidigm for a while now. And obviously, the foundation of your architecture, a big portion of that foundation is the data and the media. But can you tell us a little bit more about what type of drives you're using with Solidigm and how those help you?

Kartik: Yeah, we go to Solidigm because you guys make solid, I guess, dense NAND systems. So we started out with U.2 form factor QLC technology, which you had introduced, because one of the key design elements in our platform was the goal to forever kill disk drives and go to completely solid state media. But we knew that the most whiz-bang technology and the work would not be worth it if it costs three times or four times as much. So we had to normalize the cost curve. So going with dense NAND was a major step forward for us. What really changed the game was we figured out, along with Intel and you guys, how to create a flash translation layer which would allow us to extend the endurance for these drives to well beyond what you would normally expect. We were able to extend the endurance to beyond 10 years. That suddenly catapults dense NAND into the arena, being viable for enterprise workloads. This was a huge, huge move for us. This let us bend the cost curve significantly along with some of the other software features we have, such as large scale reduction of data. The combination of the two now makes us not just fast, scalable in online operations and performance, but also affordable, which is a key element of what we do here. We have continued that partnership as you know Jeniece. We moved on from U.2. Now, we're using Google Form Factors, eagerly awaiting other things that we're going to be doing together. Lots and lots of demand for even more density, larger drives. We started with 15 terabyte drives, now we are about to introduce 60 terabyte drives. They're in heavy demand, though. I got to tell you, everybody's buying them up like they're going out of style, which is good for you guys. So this is excellent, so yeah, that's what we're working towards.

Jeniece: Awesome. Yeah, we really appreciate the collaboration. I can speak on behalf of our team over the years, and we are, like you said, also really excited for the future. And we started talking about how, you know, traveling all over and I'm seeing VAST everywhere. Your booths are always packed with people interested in your technology. But can you tell us, for this audience here, where can folks go to learn more about your solutions? 

Kartik: Fantastic. First place to start is to go to our website, www.vastdata.com. You will find there a lot of very interesting material on what kind of industry sectors, what kind of solutions we offer, ranging all the way from high performance computing to life sciences to media entertainment, and of course, the ubiquitous AI, which is almost a horizontal. And along some solutions which people are often surprised to find us in, like in the backup and recovery space, where we act as a target, an all-flash target for backup systems. One might ask why, but that's because our restore speeds are like blindingly fast. And in this day of ransomware, that seems to be as much a concern rather than full environment recovery acts as a concern more than just a single file recovery or single directory recovery. So, those are all the things you can learn about us from that perspective. You can also learn about us from a data platform perspective. What's the buzz all about when we say we can expose data as a table? What kind of problems can we solve with that? And how do we plug it into and refactor Hadoop environments or other kinds of data lake environments like Spark and Impala and Hive or the tools that are used over there? All of that stuff also is clear. A deeper architectural understanding of what VAST is, how it operates. We have a fantastic white paper. Easy to find, vastata.com/whitepaper. Once you go there, it's a long but easy read. And it'll give you a full detailed exposition of what makes us really good. And do not forget to look up all the customer testimonials over here. We have some marquee customers in every one of these sectors. Many of them have great videos they've recorded, like the solution briefs, which are associated there, white papers, all of that stuff is public. Next step, of course, contact someone from VAST. If your appetite is not whetted, trust me, we're just waiting to engage with you. And we'll be able to provide you one-on-one assistance and anything you like, far deeper, dive, drill downs, design workshops, all of that stuff is available. 

Allyson: Well, Kartik, thank you so much for taking time out of your day to talk with Jeniece and I and share your vision for the data pipeline. It was so cool. I've been following VAST and the incredible solutions that you've been delivering to market. So it's a real pleasure having you on the program. Thanks for being here. 

Kartik: As always.

Allyson: I'll catch you next time. 

Narrator: Thanks for joining The TechArena. Subscribe and engage at our website, thetecharena.net.

Copyright 2024 The TechArena. Used with permission.