Deploying AI Data Infrastructure in the Data Center With Ariel Pisetzky of Taboola

Utilizing Tech Podcast hosted by Stephen Foskett and Jeniece Wnorowski

Listen as we discuss on-prem vs cloud applications for AI data applications for content discovery platform company Taboola. This deep dive into practical deployment takes into consideration networking, hardware, and storage to get the most out of investment in compute. Learn how Taboola has scaled on-prem infrastructure to take advantage of SSD technology with Solidigm

 

Audio Transcript

This transcript has been edited for clarity and conciseness

Stephen Foskett: As practical applications of AI are rolled out, they are increasingly being deployed on-premises at scale. We are wrapping up this season of Utilizing Tech with Solidigm focused on AI Data Infrastructure by discussing practical deployment considerations with Ariel Pisetzky Taboola. Listen in and get some practical tips on how to deploy AI infrastructure on prem. Welcome to Utilizing Tech, the podcast about emerging technology from Tech Field Day, part of the Futurum Group. This season is presented by Solidigm and focuses on the question of AI data infrastructure. I'm your host, Stephen Foskett, organizer of the Tech Field Day event series. Joining me from Solidigm today is my co-host, Jeniece Wnorowski. Welcome to the show, Jeniece.

Jeniece Wnorowski: Hi, Stephen. Thanks for having me back.

Stephen: This is the end of this season of Utilizing Tech. I thought that it would be sensible, and I think you thought the same thing, to end with a practical application, end with somebody who is really building AI data infrastructure out there in the real world.

Jeniece: Yeah, I couldn't agree more. We've talked to lots of different ISVs, hardware vendors and the like, but it's really exciting to talk to an organization that hasn't just been looking at AI over the past couple of years, but really been looking at it for the past 10 years. And how are they doing things differently now? I'm excited to talk about [and] understand how they're looking at on-prem versus off-prem cloud solutions. And we have an actual end customer, so I’m excited about this.

Stephen: Cool. Well, let's waste no time and bring him in. So Ariel Pisetzky, VP of Information and Technology and Cyber at Taboola is our guest today. Ariel, welcome to the show. Tell us a little bit about yourself.

Ariel Pisetzky: Thank you very much for hosting me. Well, I've been in IT for many years now. I think my bragging rights go back to AS1680, which is one of those first 2,000 ISPs on the worldwide web, as we called it back then. We even said security. We didn't say cyber when I started out. And now, really many years later, with physical data centers, cloud, so many things that one could not have imagined back in the day, and having a lot of fun providing services to many, many readers out there.

Jeniece: Yeah. So Ariel, let's just dive right in on this. So you're obviously a content delivery network provider, and we want to talk about AI. Can you just give us a little bit of insight about how you are utilizing AI today in your workloads and applications?

Ariel: Sure thing. So I'll give a few words about Taboola, because Taboola is not a brand name. We're not a consumer-facing product. So many people might use our products on a daily basis, but not be aware of them. We are a content discovery platform, and that means that we reside on many of the publishers that you read on a daily basis and love and receive content from. And we are that place where advertisers go to turn users into paying customers. So we actually provide that matching service between advertisers and publishers and content. Now, bringing all that together, we eventually serve over 4 billion web pages a day, and that means that we recommend over 40 billion different articles and kind of, I'd say, pieces of content for you to consume at any given moment. Now, we do this with high levels of personalization. What does that mean? That means that when you are on any given website and you are browsing, reading, you are immersed in the content, then we will provide those content recommendations for the next thing that you can read. The next article, the next content that you may like. Bringing that content to you without actually knowing who you are, without you logging in to our service, without you providing us any specifics about yourself, not your name, not your age, nothing about your gender. That is the kind of essence of using AI, specifically training and inferencing in large scale and in real time. Over the past year, with the emergence of generative AI, even a bit more than a year, we have been also taking advantage of LLMs [Large Language Models] and different generative AI technologies to provide additional tools for editors and for advertisers to curate the article name, some of the article itself, and imagery that you might get on any given day within your beloved websites.

Stephen: It is interesting that when people think of AI, they think of gen AI, they think of everything that has happened in the last 12 months basically, but that is not the whole picture of AI. In fact, companies have been deploying systems based on artificial intelligence concepts for a long, long time. I am a particular proponent, for example, of expert systems, which have existed literally since I was born. The whole world of leveraging data to build productive applications, that is the story of the 21st century really. Everything that you have talked about in terms of what Taboola does, in terms of how it impacts its customers, how those of us out there in the real world interact with it, is basically the story of IT in modern times. I am glad to hear you say and recognize and claim, yeah, this is an AI application as well. Generative AI is a new direction, but it is not the only direction. Yet, it is an interesting way to go. So, I guess talk to us a little bit more about the ways that data feeds your businesses, because I think that your business is all about data.

Ariel: That's so true. So, data is actually the basis for training. What you get when you see 4 billion webpages a day is these massive amounts of data that are written somewhere, in our case, to physical servers and physical drives, Solidigm drives that we utilize on-prem. We'll talk a bit more about that and the additional tech at the low level in a moment. Keeping at the AI level, that means that this data, if you don't do anything with it, if you don't try and, I'd say, infer any knowledge from it or create knowledge from it, it's just inert data. It's just inert bytes, or bits even, that just sit there with no value. So, the place that IT comes to create value for the business is taking data, even if it's vast amounts of data, and turning it into knowledge and turning it into business-relevant knowledge. So, creating that understanding of, “Oh wait, we've been providing recommendations for 15 years now.” Somewhere seven, eight years ago, we said, “Wait, there is a better way to provide content recommendation by trying to understand what is actually interesting for people.” And we say, “Oh, we're sitting on all this data. How do we train? How do we look? How do we put this data into different boxes?” So, that's like the beginning of the training. What are the signals that we are getting from this data? And how can it impact and affect our kind of service moving forward? Then, of course, there is natural language understanding and natural language processing [NLP], which is a big part of AI. Today with LLMs, it's a totally new field. But even seven years ago and moving from then and until last year when LLMs kind of came out to our lives, that was something, that was a hard problem to solve. You needed to understand as a service provider—for us, as a service provider for publishers— how ­do we recognize the article that we reside on? How do we recognize the user coming into that specific article? And what is the relevant article where that user, kind of browsing arc is going to end? And that has been done with AI on different levels, and today with more and more with LLMs. All of that happening on-prem, because storing those vast amounts of data and actually processing them and creating value out of them is something that when you own the data, and data has a certain level of gravity to it, when you own the data and you can process it, then you have a lot of advantages over putting it somewhere in the cloud where it might be very expensive to even run operations on it. Reminding us all that when you run your own drives, you don't pay per 10,000 operations, you don't pay for deletes, I mean, you pay maybe in performance, there is a payment there, but it's not hard cash. Once you bought your drive, that is it. And then the use of that drive over time is a bonus. Thinking about accounting, sorry to bring the pencil pushers into the room, but if you use the drive for over three years, it's even not on the books anymore and it's free. So free compute, who can't love that?

Jeniece: Who can't love that? And you talked a little bit earlier, Ariel, when we first started discussing this on-prem versus off-prem. And there's a lot of organizations out there who are having the same kind of debate. So can you tell us a little bit more about how you are working with AI on-prem and how that might be different from what others might be doing in this space?

Ariel: Yes, yes. Well, the story there is actually fun. I came to Taboola years ago to cloudify our operations. And somewhere in that cloudification story, we started noticing costs and how we as IT impact those costs and how we as IT can do better for the business. And what does on-prem actually mean? And when you optimize and look at the use of data, at the use of CPUs, at the merging of those two, and over the past year or even a few years, the merging of CPUs, GPUs, and data, you suddenly find that working on-prem has its extreme advantages. So when I talk about optimization, I'm talking about multiple layers of optimization. First, understanding that when you're looking at a data center, you're looking at compute, you're looking at data or storage, and you're looking at networking. Data itself can be, of course, data at rest, or can be data within a database service. Now, in any one of those levels, you have multiple layers of optimization, using CPUs in a, I'd say, higher capacity, making sure that all of your GPUs and CPUs are fully utilized, understanding what layers of optimization exist there. One great story is that we looked at our CPUs and we found that we can better utilize our CPUs by updating our code in various ways, from different libraries coming from Intel or coming from NVIDIA or those that for GPUs, of course, or coming in from other vendors and then really better utilizing our CPUs. Then when we better utilize our CPUs, we of course have the drives themselves. The drives themselves today, the NVMe SSDs, the NVMe interface and the SSDs, provide really so much performance. And when you understand the geometry of the drive and start to think: I'm not only buying space, but I'm buying specific types of space. Space with different drive geometries that fits in different places. If you are able to optimize that as well for your read size, you suddenly get this boost of performance where you can do so much more with your on-prem hardware, your on-prem investment in CapEx [capital expenditure]. So we really love to see how we year over year optimize our use cases for storage, for CPUs, and for network, and bring them together to a place where our developers now have so much raw power at their fingertips that they just do not want to go to the cloud for many of the day-to-day operations. At the age of GPUs where you need to push a whole lot more data into the GPUs, controlling your storage layer and controlling your network layer also provides you with multiple opportunities for optimization. One great example for anyone out there listening, think of data compression. Data compression costs you with CPUs, and it costs you with network. Meaning the network layer will work harder, but the CPUs, let me just clarify there, if you compress, CPUs will work harder, and the network will work lighter, and you'll have more storage space on your drives. The flip side there, if your CPUs are the most expensive component in your data center, why send CPU cycles to something that isn't business oriented? Why compress? Get the right type of storage that you need, get enough bandwidth. The bandwidth in the data center is literally free. When you have that level of free usage of bandwidth, you can really, really utilize your drives and push a whole lot more data into GPUs.

Stephen: It really is an interesting use case situation because I think that it pushes back on a lot of the story that people have bought into over the years about cloud. I think that there is certainly a valuable use case for cloud. I think that everybody would agree to that. But I am hearing more and more companies that are looking at deploying cloud technologies on-prem and deploying now in the future AI technologies on-prem. Once applications are rolled out, once they're fleshed out, once they sort of have an understanding of the level of hardware that they are going to need to support those applications. Because as you say, you can buy equipment, you can deploy equipment internally. I think there is another aspect that you kind of mentioned there too. Once things are depreciated, you can continue to use those past their expected lifespan. Maybe they are not in production, maybe they are. Are you seeing that there is a much longer lifespan with modern hardware and specifically modern storage than you expected?

Ariel: Drives have become so much more reliable over the years. Again, when you work with a good vendor, and Solidigm is a great example of that, when you work with a good vendor that provides you drives that do not fail beyond the expected MTBF [mean time between failure], you're getting a good bargain for drives that maintain their value beyond their three-year depreciation. So you get this really, I'd say, great deal out of using hardware on-prem. The cloud, of course, is relevant. We in IT have not done a good service to many of our kind of business counterparts over the years. We have provided services too slow. It took too long to install things. We did not embrace automation on time. The hyperscalers have taught us so much about how we, as the regular Joes of IT, should be providing services within our data centers. So the cloud is great for DR [disaster recovery]. The cloud is great for testing. The cloud is great for production loads that vary and have peaks that are maybe short-lived. But for businesses that are always on, they should probably be always off the cloud. So if you have something, a workload, that is always on, it should be always on-prem. That is where I would take it.

Jeniece: So, Ariel, given that notion and that thought, is there any benefit to efficiency by being always on-prem? And is Taboola doing anything specific to address some of the energy issues that are coming about with AI?

Ariel: Oh, yes. So when you're on-prem, of course, efficiency suddenly becomes, well, your problem. When you're in the cloud, then you get the great carbon footprint of the clouds that are in renewables. You get wonderful e-waste management by the clouds and so on and so forth. So when you are on-prem, you need to kind of control your own destiny there. Anything and everything on the e-waste side and anything and everything on the kind of energy side, that is clear. But then when you look at AI specifically, which is a huge energy kind of hog, because these GPUs and these CPUs are running way hotter than they would normally for other activities, you want your storage at least to be as efficient as possible. So when you have SSDs instead of spinning drives, that's of course a no-brainer. But when you look at the larger capacities out there, that still provide amazing kind of performance in terms of the level of IOPS that you can get, and their thermal footprint doesn't warm up your data center, and their energy levels are in use only when you are at full write mode and so on. But you have varying levels of energy that you can also manage through the NVMe interface, should you so choose. All of those provide you with the ability to A, optimize, and B, provide great service, yet still keeping a very, very eco-friendly footprint.

Stephen: One of the things I've heard from infrastructure managers that are considering buying equipment for on-prem is that it makes sense to buy the biggest, baddest hardware because then it will have the longest lifespan. In other words, you buy the bigger drives, not the smaller ones. You buy the faster CPUs, the faster GPUs. Even though they cost more today, they'll have a much longer lifespan and you'll be able to get much more out of it. Do you agree with that approach?

Ariel: No. So short answer, but here I'll give you the longer version. Thank you. That's a wonderful question. Thinking of what you need is really what the business needs. Are there big, bad, amazing servers that we buy? Yes. Do we always buy the biggest, baddest servers out there? Absolutely not. We try to balance a lot of our servers into something that is in, I'd like to call them Lego block mode, where we buy big servers, but not very big servers. And then these servers have a life cycle. So they can be a server that is driveless at the beginning, and it's CPU intensive, maybe it even has GPUs in it. Two years later, I have now a better CPU or a better GPU. And in terms of energy and optimization, this server is not a good AI server for me anymore. But lo and behold, it has drive bays. Now I can use it as a storage server, and I can get that CPU that isn't the latest and greatest to do great things around managing all the data that is in that server. So LEGO blocks, which is also why I'd say so many of us love LEGO, their versatility, the fact that their lifespan is so long and that you can use them in different models and in different scenarios and build with them different things, that is what is so enticing for us to buy servers that are somewhere in the middle, not too big, not too small, and then for specific use cases, go wild.

Jeniece: Ariel, one of the things I was going to ask was being on-prem and having to deploy across multiple environments, it's pretty taxing to have some of your IT guys go out and have to rip and replace drives, right? And on the notion of buying not just the biggest drives, or the fastest, or even that same case with your servers, how do you feel solid-state technology helps to improve how you do business at the edge or within your on-prem infrastructure? Is there a benefit of solid-state storage versus hard disk drives, in your opinion?

Ariel: Okay, of course, yes. So having solid-state drives at the edge is super important. I'll talk about that in a moment. And then, of course, just to talk a moment about Taboola, we have our front-end edge data centers and we have a back-end data center where we run all the training in the back-end. Yet the inferencing that I spoke of, the hyper-personalization of our service happens at the edge. It cannot happen without data. It happens on data, on SSDs at the edge. So the closer you are to the edge, the faster you will be able to provide service and the faster you need the drives, the servers, and the service to be. Putting all that in perspective in terms of operations and IT operations. So I'll go back to the Lego blocks I spoke of earlier. The idea, if you don't go biggest, baddest, but you go good and exactly what you need, and good being the Solidigm 7TB drives as an example, where you're not going all 64TB, 60TB, 30TB, you're going at a nice moderate size, and then you spread them in the servers, but maybe you don't use them because they're Lego blocks, and this server is now a CPU server. But once it's installed, because the installation cost might be your prohibitive cost, you can then down the road change roles for that server, and the drive is already in there. So that is just one example. We also, of course, try to optimize by holding drives in our kind of front-end operation center, and then we ship them out when we need them and have remote hands install them. So there are multiple options to be had here. If you have the automation stack to manage it, and that is a big part. We wrote our own automation stack in terms of data center automation, not in terms of compute automation. We use Kubernetes for that. So we have the agility to provide storage where we need it, when we need it. And again, the beautiful thing with Solidigm is the connection to the OS and the tooling that is provided, and we can manage the drives remotely through the OS, providing us with all the serial numbers and asset management information that we need to do this in a responsible way.

Stephen: It's interesting that you bring that up because now that you mention it, storage is one of the only data center components that's actually upgradeable and replaceable in the field easily. Certainly, you could take the server apart and swap out the memory, but then what are you going to do with the old memory? Same with the CPUs, same with the network cards. I guess network cards can be replaced, but storage is by far the easiest thing to replace and upgrade, because in many cases, you can even hot swap. I don't know if you would want to do that, but you could probably do that in a server. And as you said, that would let you upgrade things on the fly and match the capability to the requirement. There's another aspect, too, that I want to emphasize here on this because I'm a storage nerd. One of the coolest things about storage, too, is that capacity influences longevity to a great extent. And that if you have sufficient capacity, especially with flash, if you have sufficient capacity, then that can greatly extend the lifespan of that drive because of wear leveling. Essentially, the drives can only handle so many writes. But if the drive is big, then so many writes spreads out across a lot more cells, which means that that drive's lifespan can be a lot longer. And I think that we've seen that in production. As you mentioned, once drives get into the terabyte, multi-terabyte range, I mean, certainly I imagine a 60TB drive is going to have a very long lifespan. But once drives get into the, you know, 7TB or something like that, that's an awful lot of data in order to, quote, wear that drive out. And most people are never going to hit it quite that hard. Because, of course, reads don't affect it. It's only writes. Are you seeing this as well, that bigger drives are more reliable?

Ariel: So, amazing, amazing question. I have multiple things to say. Non-political. I'll start with taking a leaf out of, yes, we can. Hot-swappable. Yes, we can. Yes, we do. It is amazing to see when you control the drives and you have good reliable drives and you're buying drives with the level of endurance that you need, as you mentioned, it doesn't have to be crazy levels of endurance because you're at the multi-terabyte size and you are at a, even at just one, like endurance one and not crazy high endurance levels, you can write a lot of data. And if you have the software and you have the automation in place, you can monitor the media state and you can monitor where that drive is. And we saw over the years that we've been using the drives that we are not wearing them out and we should not be afraid. We can really use drives for a long time. They are sustainable, they are not prone to failures, and doing good IT also means that you have clustering and you have N-plus architectures. Today, it's almost no brainers. You don't have to pay exorbitant amounts of cost to different fancy licenses or storage providers. You can do your own storage. And Software Defined Storage [SDS] today has built in a redundancy that is really good enough and provides, it's excellent, it's beyond good enough, and provides you with the solutions that are cheap, effective, and provide a ton of performance for the business.

Stephen: So let's move up the stack a little bit from storage. So clearly, storage is something that we're all passionate about, but also something that gives a lot of flexibility. But talk a little bit more about the rest of the infrastructure that you're building there on-prem. What do these servers look like? What is the GPU, CPU network and all that?

Ariel: So when you're looking at GPU servers, just as an example, you spoke earlier about these big servers. So GPU servers, some of them are these nice, slim two, four GPU servers, which are, I'd say, specialized for data crunching. We can talk about them in a moment. Spark specifically, we'll talk about them in a moment. And then you have these huge, humongous, six, eight GPU servers that host eight, ten really large NVIDIA drives, really that host eight, ten GPUs, and these GPUs are data hungry. So if you look at the two GPU servers, let's start with the really big ones. When you're looking at a huge GPU server hosting multiple GPUs, it really needs to pull in a lot of data. To pull in all of that data, it's going to come with the kind of huge network cards, the 100 gig network cards and multiple ports of that, going into the top of the rack switch, and it's going to be pulling a whole lot of data coming in from your HDFS [Hadoop Distributed File System], or other storage solutions, SIFs [Sustainable Infrastructure Foundation] or whatever you're using, and then you would need some scratch space locally to be able to cache it locally, because bringing it off the network will get you to one level of performance, but if you really want to harness those GPUs to their fullest and use all of that capital that you spent on GPUs and making sure that those GPUs are fully utilized 100% of the time, then you would like to see a, I'd say, local cache, fully flash with a lot of IOPS that can serve those NVMe to NVLink or NVMe to PCIe, depending what type of NVIDIA you're using, and really feed those GPUs and then bring that data back and feed it out. When we're talking about our data processing GPUs, as Taboola, we've spoken about this quite vocally. You can find it on our engineering blog. We have switched from using, in many cases, CPUs to GPUs, the smaller GPUs actually, let's say the A30 and the A40 family, to crunch data in Spark format. So when you're looking at crunching data at large quantities, instead of having 10 CPU servers running separately, having maybe much less scratch space and doing each one their own thing, you suddenly have one server doing the job of 10 modern servers, pulling in a whole lot of data from data storage, crunching it, and preparing it for some type of report or some other type of, I'd say, Spark/SQL processing. And that we do on GPUs as well, having only 25 gig connections for those servers, but again, having NVMe locally so that while the GPUs are active, you can pre-read as much as possible from the central storage and then provide space that the GPUs can work with locally to optimize their utilization.

Stephen: And that reminds me of one of the discussions we had earlier this season talking about keeping these hungry GPUs fed. That's really the whole ball game because that's the most expensive component out there. And of course, as you mentioned as well, GPUs can be used in other areas of infrastructure. We've talked about that this season. You know, it's interesting here, Ariel, everything you've said, I think, really does kind of summarize what we've talked about all season long here on Utilizing Tech. And it's been so great to hear a practical, real-world use case for all the things we're talking about, especially an on-prem use case, because, you know, I think that increasingly people are going to be looking to that. They're looking to deploying their own AI data infrastructure on-prem. And the lessons that you've brought here, I think, are very valuable to them. So given that, thank you for joining us. Where can people connect with you? Where can they continue this conversation? Where can they learn more about the lessons that you've learned?

Ariel: You can find me on LinkedIn. I'm always happy to continue this conversation. And there's the Taboola Engineering Blog, where we provide additional information about our infrastructure and what we do. It's much more down to the technical details. So if you want to know about running, let's say, volcano on GPUs and MIGs on GPUs, so virtualizing our GPUs to really optimize and squeeze more out of them, or if you want to read more about how we're doing optimization around MySQL and SQL servers in general with storage, with CPUs, it's all there, and we are always happy to share.

Stephen: Well, thank you so much for that. And I really appreciate everything you've had to say here today. Thanks for joining us. Jeniece, I'm sad to say that we're wrapping up the season with this episode. Let me get your call to action. Where can people learn more about Solidigm? Where can they continue the conversation of how flash storage can be transformational for their IT infrastructure?

Jeniece: Thank you, Stephen. It's been such a pleasure being able to sponsor this series, and we're excited to continue throughout the year sponsoring other activities that you're going to be doing, you know, things like AI Field Day and Storage Field Day and many other things. You do just a great job tapping into our audience. But to continue this conversation further, anyone can go to solidigm.com forward slash AI, where we have a multitude of stories, solutions, and also details around the myriad of drives that we have to offer. So thank you so much for the opportunity.

Stephen: Excellent. And actually, I'll call out to Solidigm presented at our AI Field Day, brought in partners, just a tremendous thing. Use your favorite search engine, look for Field Day, look for Solidigm. You'll see great presentations, a lot more information about that as well. And of course, we've had many other kinds of companies presented Field Day over the years too. And so you can look for those too. Thank you so much for listening to this episode of Utilizing Tech. You can find this podcast in your favorite podcast application, as well as on YouTube. If you enjoyed this discussion, please do leave us a rating, a nice review, maybe send us some feedback. We'd love to hear from you. This season was brought to you by Solidigm, and of course by Tech Field Day, part of the Futurum Group. For show notes and more episodes, head over to our dedicated website, utilizingtech.com, where you can find the entire season in order. You can also contact us on X, Twitter, and Mastodon at Utilizing Tech. Thanks for listening to this season, and thank you, Ace and Jeniece, for co-hosting as well. We look forward to our AI Data Infrastructure Field Day event, along with our AI Field Day event coming soon. Check out techfieldday.com to learn more information about those. And maybe if you'd like to participate, reach out to me, Stephen Foskett. I'd love to hear from you. Thanks for listening, and we will catch you next season with another exciting episode of Utilizing Tech.

Copyright Utilizing Tech. Used with permission.