Hear from Laura Carriere, High Performance Computing Lead for the NASA Center for Climate Simulation, about how NASA works to model aerosols in the atmosphere. Laura talks with Jeniece about ways they use modular compute infrastructure to keep their HPC up-to-date on a limited budget to keep its supercomputing resources on pace with scientific requirements. Massive amounts of data, collected by NASA from satellites and balloons scattered in the atmosphere, require analysis and visualization, which means storage is a key aspect of their considerations.
For more interviews with experts from Supercomputing 2024, read and listen to Data Fueled Insight and Discovery Shine at Supercomputing 2024.
Jeniece Wnorowski: Welcome Laura it's nice to see you again!
Laura Carriere: Thank you.
Jeniece: We're at Super Connect 2024 it's a lot going on but it's a pleasure to be here with you. So can you tell us a little bit about what you do for NASA.
Laura: Sure. I am the High Performance Computing Lead for the NASA Center for Climate Simulation. It's one of the two super computing facilities within NASA and we're located in Greenbelt [Maryland] and we are, to be honest, the smaller of the two facilities. Earth Sciences is our main group; astrophysics, heliophysics, and planetary sciences. So we do all of those, all that work, on our supercomputer.
Jeniece: Amazing. And so all that work on your supercomputer, can you tell us a little bit about what it is exactly that you're studying or really passionate about right now?
Laura: So within the Earth Sciences the main goal is climate research and that can mean some climate change work but it also means things like the GMAO group—the Global Modeling and Assimilation Office—does a lot of aerosol work. So they run a climate model. but they're particularly interested in how aerosols transfer through the atmosphere; so air pollution or carbon or nitrates or sulfates. And they do beautiful visualizations which you can see through the scientific visualization studio work that just shows off the tremendous work that they do for us.
Jeniece: Amazing. So I'm fascinated by these aerosols. You can see how they're penetrating all the way through to, like, the atmosphere?
Laura: Right. Since it's a three-dimensional model, you can see it move across [the world], like you could see dust move from the Sahara Desert across the Atlantic and land in Florida. You can see it spin up in hurricanes. You can see sea salt along the bottom usually gets picked up off the ocean around the poles. So they visualize this and they do great work with it. You also can see emissions from power plants, etc., that then get integrated into our atmosphere, or fires as well.
Jeniece: Wow. Okay, so you're doing all this work you mentioned on a supercomputer. Can you tell us a little bit about it? Is there a special name for the super computer?
Laura: So Discover is the name for our supercomputer. And it was designed in 2006. And the way it works, actually, is [that] we are power constrained. We only have so much power. And we get money every year, so we build out the supercomputer until we run out of power, and then when we get more money, the oldest part of the supercomputer has to be decommissioned, and then we bring in the new piece. So over all of those years we have continued to add what we call scalable compute units. And so we are now at: we've got a little bit of 14, we've got 16 17 and 18 on the floor right now that our scientists can use. And it was in initially Intel chips and we have now moved to AMD. All of Discover is CPUs with a small pocket of GPUs for testing and learning and new development. We also have a much larger GPU facility called Prism which, again, a lot of development has gone there and a lot of really interesting science has come out of Prism as well.
Jeniece: Amazing. So what about the storage? Do you know much about the storage layer?
Laura: We have storage for all of our systems. So we have the traditional HPC which is Discover and it has 60-ish petabytes of storage. We have centralized storage which is intended for curated NASA data products. So these are things like MODIS data, Landsat data, MERRA-2 data; things that are well understood as a NASA data product. Then that data is also available through our on-premises cloud environment which includes Prism. So all of that data can be accessed through all of those systems. We like to think, and we hope we're right, that it reduces the amount of duplication of the data on the systems which allows for more room for new science to be done.
Jeniece: Yes, and more and more science as you're getting tons and tons of data coming in from all these places to ,you know, look at climate change. How are you guys able to you know deal with that data once it's coming in. What are you looking at on a daily basis? I heard, you know, I was just talking to Bill [Thigpen, Assistant Division Chief for High End Computing, NASA] and he said they archive everything so are you doing the same thing?
Laura: We do not archive as much we actually use AMES, Bill’s facility, to do some archiving. But what we are trying to do is keep this data on spinning disk [HDD] so that scientists can do science on it. We're not funded as an archive so that is really just a drag on our resources that prevents us from doing more supercomputing, but we do have somebody who is really good at his job who does data management plans. So we work with our users to get them to tell us what's their input data ,their intermediate data, which will one day be deleted, and their final data products. How big is it? How fast does it grow? And that allows us to plan our storage requirements and get the right storage in place at the right time.
Jeniece: Okay, what if I told you… you mentioned spinning discs and Bill said the same thing . We, [Solidigm], just came up with a device called a solid state drive which is 122TB in this one little guy. Feel how light it is.
Laura: That's pretty cool.
Jeniece: It's pretty light. Just out of curiosity, I know you're not a storage-storage person, but with all those spinning discs being in that system …
Laura: Would we replace them? Yes. Yes we would if we could afford them. So the spinning discs do two things. One is you know they're reliable and they're not terribly expensive. So reliable up to a point. So we're happy to have them, we're familiar with them, they work, etc., etc. but they do take a lot of power and as I said earlier we're power constrained so anything I can do to reduce our power footprint would be great. SSD, or solid state whatever it is, it's not like it's no power but it's less power. So one of my goals has always been to find the right solutions that we can afford, in order to actually move to more solid state. One of our challenges is that the data that is used for climate research is NetCDF data which is already very much compressed so a lot of the cost models for solid state are, “Hey we'll compress your data even more and it'll work out from a cost perspective,” except my data won't compress, so it makes it really, really hard. But we're still moving in that direction because of performance. And we absolutely have some that is critical to being allowed to run the models with the faster chips that we have now. Without it [SSD technology] we would be struggling. So yes.
Jeniece: Let's take a take a step back and not talk about data, but what about AI? Are you guys really doing… I mean I've talked to a lot of folks here and everyone's doing some kind of AI, but tell me a little about [what you are doing].
Laura: So what are we doing? As I said earlier, when we brought Prism online four years ago, it was just people being like, “I don't know, I'm going to give it a try.” And so it was just daytime access and they were really just working on how this might work for their systems. And after time they started to actually run things. So whether or not they're doing training model runs or doing some inference, now our utilization is much higher than it was before. We are buying an expansion to Prism which are actually going to be NVIDIA the Grace Hopper nodes. We have about 60 of those and we have two places where we're using the GPUs. One is in the machine learning where we're working on some foundation models for earth science, but also taking that climate code and converting it to work on the GPUs. And that's a different group that is taking care of that. I do want to tell one kind of fun story. [It’s] one of my favorite stories about combining HPC with machine learning. This was probably two years ago, one of our users who came in and he said, “I've got this test data from the transiting exoplanet satellite survey.” And they were looking for binary stars. What you do is you generate light curves to show how does the light change over time. And with most of these stars it'll be a pattern. And if it's a binary star, you can take a look at that pattern and say, “Oh that's a binary star!” So they used our supercomputer to calculate. I haven't looked at the numbers in a bit, maybe a million light curves or more, many, many hours of supercompute time. Then they took all those light curves and then moved them over to Prism and did machine learning on them and then they found outliers. They found a lot of binary stars. They also found unusual systems including some stars that are binaries with binaries, and one what they call sextupuly system which is six stars that are gravitationally bound. So there's three pair of binary stars and they all rotate around each other. And that was found only because we had a combination of HPC to generate the light curves and the machine learning to actually find the anomalies that allowed them to key in on the ones that looked of particular interest.
Jeniece: Amazing. Is this a new, recent discovery?
Laura: No, this is about two years old, make it two and a half.
Jeniece: Yeah, yeah, still pretty new though in the grand scheme of things. Very, very cool. Well I have to ask you because this cool image of the black hole, here you mentioned that this was also [from Discover].
Laura: Yeah, this was also done by the same group actually and while I do have a background in astronomy, this one is a little bit beyond [me]. I'm not going to be able to explain it technically but it was done on Discover, and if you watch the video which is really important, you can actually feel like you're being pulled into the Event Horizon of a black hole—which you don't want to do yourself because you're not coming back out! It hit social media and it just spread like wildfire because it is just a fascinating display. And all the calculations were done on Discover.
Jeniece: Well, thank you so much for your hard work, and the hard work in your organization, everything you've done. It's amazing. And it's a pleasure to meet you, Laura. Thank you so much.
Laura: Thank you.