#16 - Water, Watts, and Wellness: What’s the Real Cost of Medical AI? Artwork

Code & Cure

Decoding health in the age of AI

Hosted by an AI researcher and a medical doctor, this podcast unpacks how artificial intelligence and emerging technologies are transforming how we understand, measure, and care for our bodies and minds.

Each episode unpacks a real-world topic to ask not just what’s new, but what’s true—and what’s at stake as healthcare becomes increasingly data-driven.

If you're curious about how health tech really works—and what it means for your body, your choices, and your future—this podcast is for you.

We’re here to explore ideas—not to diagnose or treat. This podcast doesn’t provide medical advice.

All Episodes

Code & Cure

#16 - Water, Watts, and Wellness: What’s the Real Cost of Medical AI?

October 30, 2025 • Vasanth Sarathy & Laura Hagopian

Artificial intelligence promises faster notes, smoother workflows, and smarter clinical decisions. But behind every seamless interaction lies an invisible cost—electricity, water, and carbon emissions that rarely enter the healthcare conversation.

In this episode, we trace what happens after you hit “enter” on a clinical prompt. From power-hungry GPUs to evaporative cooling systems in data centers, we uncover the hidden infrastructure fueling AI and how metrics like PUE translate convenience into environmental impact. A single prompt may only consume “a few drops,” but scaled across a hospital, it becomes a lake.

Blending insights from an AI researcher and an ER physician, we unpack the difference between training and serving costs, the overlooked impact of iterative prompting, and how everyday uses—charting, imaging, messaging—accumulate real-world carbon. Then we shift to what you can do now: swap out large models for leaner alternatives, trim excessive input context, and build smarter prompts that reduce compute without compromising care.

We also explore operational strategies: batch non-urgent tasks during off-peak hours, negotiate SLAs that trade slight latency for sustainability, and push vendors on the things that matter—like data center efficiency, water use, and renewables—not just performance scores.

Sustainable AI isn’t a dream—it’s a design choice. So where will you start: documentation, imaging, or patient messaging?

Reference:

Sustainably Advancing Health AI: A Decision Framework to Mitigate the Energy, Emissions, and Cost of AI Implementation
Anu Ramachandran, et al.
NEJM Catalyst (2025)

Credits:

Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0
https://creativecommons.org/licenses/by/4.0/

SPEAKER_01: 0:00

AI promises to make healthcare faster, smarter, and more accurate. But every algorithm runs on energy, and energy has a cost. Today we'll break down what those emissions look like, why they matter, and how the industry can balance innovation with sustainability.

SPEAKER_00: 0:28

Hello and welcome to Coding Cure. My name is Vasant Sarathi.

SPEAKER_01: 0:31

I'm an AI researcher, and today I'm with Laura Hagopian. I'm an emergency medicine physician, and we are here as usual to talk about the intersection of healthcare and artificial intelligence. But we're kind of flipping it a little bit today because we've talked about so many different applications of AI in the healthcare realm, from billing to operations to clinical decision support to even patient-facing tools. But today I want to talk about something a little different. I want to talk about how all of these AI use cases can actually impact the environment and what that toll could look like and how we could sort of govern it. So it's like there are so many possibilities here.

SPEAKER_00: 1:22

Yeah, yeah. I mean, we want, we want to use all these cool to uh features that the AI is offering, but at the same time, we want to make sure that it's sustainable at some level. And so I think that trade-off is very, very important to talk about. Um, you know, we're gonna focus more on the on sort of the healthcare use cases, but really everything we're seeing now applies across the board to other use cases of AI as well.

SPEAKER_01: 1:44

Yeah. And one of the words that comes to mind for me is efficiency, right? We talk about, oh, this like this AI could improve our efficiency, this LLM could improve our efficiency, we could get summaries faster, or we could um, you know, get the radiology read faster or whatever it is. But then what about energy efficiency and how might that impact um people? How might that impact our country? How might that impact you know the social determinants of health? So there's a lot at stake here. I was actually kind of interested and surprised when I read some of the stats in the article. Um, I don't know that I was fully aware, like, oh, every time that I do like a short, relatively like like a single relatively short response, that's using 45 mls of water. That's like almost that's like three tablespoons. It's almost a quarter of a cup.

SPEAKER_00: 2:38

Wow. Yeah.

SPEAKER_01: 2:39

That's like, so if I do 10 a day, man, that's like a lot of water.

SPEAKER_00: 2:44

Yeah. And, you know, on I've also heard about not just from this paper, but in general, people have talked about energy use and uh carbon emissions, for example, uh from AI systems. And uh, you know, one stat that we heard from last year was that the total AI uh emissions from um carbon dioxide emissions from training all of AI uh was more than all of global air traffic last year. That's nuts. So that's a astounding statistic. Um, and there's lots of papers and people have researched and looked into this quite a bit. And um, you know, we're gonna talk about all of these little aspects of it in a second, but it just to get the scale here, which is this is an amazing revolution. It's doing a lot of great things. However, it is there is a trade-off. And are we and how deeply are we aware of that trade-off? And that's really the point of today's podcast, right? If we want to episode so that we want to be able to really start to think, have people think about every time you use Chat GPT or whatever tool that is that's being used, especially in these healthcare applications, what's the trade-off? What are we giving up?

SPEAKER_01: 3:51

And and how can we like try to actually make it more efficient? We're not saying don't use AI, right? But like, how can we make the trade-off a little more palatable? Yeah, right. I mean, one of the stats in this article that we read was like, hey, um 4.4% of total US electricity consumption 2023 was from these LLMs. And they're projecting it's going to account for as much as 7 to 12% of total US electricity. That's like so much.

SPEAKER_00: 4:30

That's um incredible. Wow. Yeah, and they're building, I mean, to support these, they're you know, uh building larger and larger data centers. There are some that are coming, you know, in various places, and I think in Louisiana and in Oregon and so on, um Ohio, um, that are massive. That are the size of you know, half of the size of Manhattan Island, right? And so like these are they're building these so that you can get more and more compute power so that you can get faster and faster AI responses, faster and more accurate AI responses, and make the AI use case more palatable, maybe even you know, uh have the pros outray the cons kind of thing. Um, and but that's really the question is what are the cons? And it's not clear to everybody what those are.

SPEAKER_01: 5:15

Yeah. I it's interesting to me too, because you're like, well, what would happen if X like what would happen if we didn't have enough electricity? Or what would happen if we were in a drought and we didn't have the water to cool these systems? And how do you balance with that, with like people need water to drink? So I think this is a really interesting topic, but I I want to start off by asking you, like, about how the energy is used. Like, can you walk me through what that looks like? Like, why do we get these carbon dioxide emissions? Like, why are we using this electricity? Why do we need the water? Like, how does that actually work?

SPEAKER_00: 5:54

Yeah, let's let's talk a little bit about that. So think about the time when you open up your computer and you are typing in a prompt, right? What happens, or a thing to Chat GPT? What happens when you hit enter? Well, the moment you hit enter, uh you submit the the question or whatever, um, your device sends a request across the internet. There's routers and switches and that use a little bit of power here and there, uh, but it's tiny compared to what's coming up. So they send this information across and a data center wakes up. And your prompt lands typically on the front-end server of that data center. And a server is just a fancy term for a machine, a computer, that's job is to serve up a website, to host a website, to hold it open while you are able to use it. Um so it lands in the front-end server, and then there's some AI accelerator that has a bunch of GPUs. These are the actual hardware units that are doing all of the difficult computation that's happening inside of an AI system. A GPU stands for graphical processing unit. And uh basically the idea is that they use the GPUs exist in all of your monitors, and they're used to they typically use for graphics cards and stuff like that to show better, you know, um uh better video games and such. But now they're used extensively for AI, and that's one of the reasons why we have the AI revolution we have is because GPUs can be used for AI, you know, um computations. Uh, but they consume power. So they do all the math that the AI system is doing under the hood. So your prompt lands in the front server, there's an AI accelerator that does some of the math. It you know, think of this math as thousands of tiny little multiplications, matrix multiplications, uh, like a little orchestra of calculators that are all kind of turning numbers, you know, words, numbers, whatever, right?

SPEAKER_01: 7:36

I've got a good visual in my head right now. Yeah. Calculators playing the violin. But uh Yeah.

SPEAKER_00: 7:40

So then you know you have electricity flowing, right? At this point, because that's what's happening. Yeah. Um, so both the accelerator, the host server, all of these things draw electricity. And remember, the data center isn't just the the electricity drawing servers, it also runs fans, pumps, lights, other things, right? Um, and so there's uh what they call a non-compute overhead, which is kind of fancy speak for all the other stuff, right? And it's the effectiveness of the data center. So there's a term for it called power usage effectiveness of PUE. And it's, you know, they give you a number, like 1.2. That means for every one watt of power used by the chips, there is 0.2 watts that runs the whole building. So there's extra stuff that's happening as well. Um, now when all of this power is being used and the chips are all running and everything is running, these chips get hot. That's just how you know they work. Thermodynamics works that way, right?

SPEAKER_01: 8:39

Yeah. I mean, I think about like my phone. If I overuse it or something, it gets hot and it's like, oh, yep, shutting down, need to cool off. Come, we'll check back later. So same idea here. Yeah. Except you it's not just ambient air. It's there's so much of it that you need like water to cool off.

SPEAKER_00: 8:54

Yeah, and these data centers can't just shut down because you're in the middle of an important, you know, healthcare application that uses your AI tool. You can't just be like, oh, I'm too hot, I'm hot, I'm gonna see you later. I can't do that, right? So these chips get hot, and to keep them at a safe temperature, the data centers use uh various forms of cooling, air conditioning, they have chilled water loops, they have evaporative cooling towers, which literally take the water, heat up the water and let the water evaporate and take the heat away. So um on site, they have these evaporative systems, and sometimes they even spray water through the tower. And and some of the, you know, evaporates, you know, is is lost to the air. And so you end up losing water in this process. And people, you know, you people might have heard the term, you know, a phrase, few drops per prompt. Uh, that's just a simple way of expressing this idea that you have to drop or throw water on this for every prompt you submit. Um, and then there's other things, there's off-site water, because you might not have water actually physically running through evaporative towers. You might have air conditioning systems that are powered by an external power plant that also uses electricity and other things. So you have these sort of indirect off-site water sources, um, you know, like a power plant that uses water for its cooling. Um, and and so whether it's the data center or the power plant, you're pulling in water. Um, now what about the emissions? We talked about energy use, we talked about power. What about emissions? Well, there is indirect emissions, which is uh these things are all connected to the electricity grid of some some level, right? And so you have and if and if the power came from fossil fuels, then you have carbon dioxide emitted at the power plant. Right. Now I I know that there's a lot of work right now with build building sustainable data centers that might use solar and other things. Different different equations. But as far as indirect electricity indirect emissions is concerned, if it comes from fossil fuels, which a lot of it is right now, then you're using then you're releasing carbon dioxide. Um there's also sometimes these plants have diesel generators on site, and so there's a little bit of direct stuff that might happen of that's not I don't think it's that's not as common. Um and then once you're um think about you also want to think about upstream in the whole supply chain. It's not just a data center. Those machines were built at some point by various manufacturing units along the line, right? Those chips were built somewhere, and you have manufacturing chips, you have building, you have servers, you have all the other stuff that has a carbon footprint also in your eventual use case. Now, those things already happened, but still they happened. And they're gonna keep happening if we build more data centers.

SPEAKER_01: 11:29

Exactly. If we find more use cases and we want to keep using AI, then it's like, hey, we need the the data centers in order to do that.

SPEAKER_00: 11:37

Yeah, yeah, exactly. So I think that you know gives you a sense for the data center. Now we have what we've talked about is just the data center, effectively. We haven't, because we you know, as soon as we sent our request, it goes off into the data center and does its thing. What we haven't talked about. Yeah. What we haven't talked about is um these models themselves and how they're trained and how they're um, you know, built in at its very core. So, you know, in past episodes, we talked a little bit about LLMs, and and here we're focusing on LLMs because they're the ones that are part of the AI revolution right now. But if you think about LLMs, then you can start by saying, okay, at some level, you have to start training this neural network. There is a neural network that then you have to give it trillions of pieces of internet data to train the original model. There's that. Once you've trained the original model, then you have to do what's called fine-tuning to make it serve the purpose of being a suitable AI assistant to you. So there's an additional fine-tuning, and that means more training, more data, more runs of these systems. Then uh there is uh additional uh training to make sure that it only provides safe output, you know, that sort of thing. And then once you have all of that, you have a model that is then served up via an interface that somebody can use. Now, when you type in a prompt, the prompt goes to the data center, yes, but the model itself, there might be more cycles of the computer that runs in order to run the model multiple times behind the scenes for every question you have. Oh, interesting. Because it might be the case that in order to provide more accurate answers, it might need to think a little. And what thinking means, every time you see thinking pop up on your Chat GPT or whatever, what's happening is the model might be running multiple times. It might produce some output, look at that output, send it back to the model, and use that output to produce better output, and so on and so forth.

SPEAKER_01: 13:28

And so that's not a simple prompt using 45 mls of water. That's going to be using a lot more. Right, right.

SPEAKER_00: 13:35

And so you get um a lot of electricity that's being consumed by training the model, original training. That's usually one off, but it's not that one-off. It they release new versions of these models constantly. So people are doing these large model training sessions constantly. Um, and you know, I think the order of magnitude is uh training some of these models. I'm not sure what these numbers exactly, but you want to think about hundreds to low thousands of uh megawatt hours of electricity and tens to hundreds of tons of carbon dioxide, just depending on how they're trained and how green the power is and so on. Um but you know, the same so so there's that, and then there's you we talked about training, and we talked about any other use cases. Then we, you know, we have a layer that's the data center, we have a layer that's the model, and then we have the layer that's the application that is the people actually using these things. And if you think about a healthcare system, it's not just one use case, right? It's not just one doctor using the prompt once.

SPEAKER_01: 14:32

No, it's all of us using they all and multiple, yeah, multiple use cases, multiple cases simultaneous use cases. Ambient documentation, radiology reads, writing back to a patient um after they send a message in the portal. There's so many use cases for this. Yes, yes. So if you multiply it by all the doctors and all the use cases, yeah, it's it could be huge.

SPEAKER_00: 14:53

And you know, I also want to take into account when in the year and time of day these things are happening. Because you have, you know, I I have this happen all the time, and we uh hang out, you know, go to the pool or whatever. I leave my phone out in the sun, and the phone gets too hot and it turns off. Now there was nothing running. The phone was just sitting there, but it was sitting in the sun, the hot sun, and it it got too hot and turned off. Well, the same thing happens here too. You have temperature differentials during daytime, nighttime, you have temperature differentials in the summer or winter. So all of these can affect how much energy is needed.

SPEAKER_01: 15:26

How much water is needed, right?

SPEAKER_00: 15:27

How much carbon dioxide is released, all of that, right? It affects all of that. So there's that piece as well. Um now, this is not to say that companies are not working hard to optimize various pieces and make this more sustainable. Everyone's kind of wanting that. But I don't think the consumers are as as aware as they should be about every single time you use it, this is what kind of what's happening behind the scenes. And it's I I hope we do get to a place where we're getting more and more sustainable, that the data centers are using solar and other forms of energy, that you know, we are aware of what's happening. But I think what this paper also talks about is the fact that we as users must take into account this and really think about why we're using AI, whether we need to use AI or not, and how what kind of AI do we actually need? We don't need an LLM for everything.

SPEAKER_01: 16:14

Right. I and I do think I think we can transition now because, like, there was a great section in this paper about how we as individuals can influence the energy and carbon costs of these tools. And data centers are not where we have the most influence, right? Like what type, what's the efficiency, what's the energy mix, can they use advanced cooling technologies? What are the design choices? That kind of stuff. We don't have so much influence on that, but but we do have influence on like how we apply things and what model we choose and why. And so I think it's worth walking through some of those things because those everyday choices, when you multiply them by, you know, so many providers using these tools or um multiply them by so many different use cases, that can add up to making a big difference. Yes.

SPEAKER_00: 17:04

Yes, exactly.

SPEAKER_01: 17:06

So I think it's interesting. One of the things you started off was was like, do we even need AI to do this? Like, is a large language model an appropriate choice? Could we use something else? Like, how long does it take to, I don't know, write back a patient message or whatever? It's like, is this actually an efficiency gain? Like how much? So I think that's like an interesting question to ask up front. It's like, it's not just can you, but like should you? Isn't appropriate.

SPEAKER_00: 17:31

Right. And it's uh I think I think the point we're making also is that what does efficiency mean in this context? You know, it's not just about saving people time. It's about the, you know, are people taking into account other factors, sustainability factors? And sustainability factors can be completely monetarily incentivized as well. It's not just about wanting to keep the environment safe, it's also practically speaking, you want to reduce the amount of electricity used in a hospital or wherever else, right? Right, totally. So spend less on the electricity and and save money there. And so it's not just a uh generous, you know, philanthropic endeavor here. It's we want to practical. Yeah. Yeah, exactly.

SPEAKER_01: 18:13

The other thing that you started to bring up was like, hey, do you need this like crazy model that thinks and runs multiple times and um reanalyzes itself? Or like, could a simpler model work? Sure, we can like maximize, we can say, oh, we want the most complex model, we want it to perform the best, we want it to reduce bias, we want it to give the highest quality output. But at the same time, like, can you use a simpler model to do pretty much the same thing? Like, do you need that really complex model? Because the more complex model obviously uses more energy, uses more water, releases more carbon dioxide. So, so based on the task that you've chosen for it to do, would a simple model work? Or do you even need an LLM and could you use a different form of AI, like rules-based AI, for it?

SPEAKER_00: 19:03

So it's interesting because before the LLM revolution, AI was simpler models and very specific uh use cases, and they were highly specialized. And the goal for an AI researcher was to find this more generalized model. And this here comes an LLM that can do all of the simpler tasks, if fine-tuned, or can just you can just ask it, but it can do all the simpler tasks. So then why wouldn't we just use the one model um and not have all these little mini specialized models working? Now it turns out that if you the sum total of all the little mini models working is still significantly vastly less energy consumption and power and all that stuff compared to one monolithic LLM. And I think that that's the piece here, which is it might be easier to use a single or large model, but it comes at this huge cost. And maybe it's beneficial to think about those specific use cases. I mean, and we talked about diabetic retinopathy in our episode episode one, and that was a use case uh in which there was no LLM. And there's plenty of other examples throughout our podcast we talked about where we don't use an LLM, we use other machine learning models to help, you know, um various at various points in the healthcare system. Now, LLMs come in when you have either human interaction or some kind of natural language interface. That's kind of where a lot of people put LLMs.

SPEAKER_01: 20:24

Yeah, so ambient documentation, we've talked about um, you know, drafting a note back to a patient might be an application of this too, where it's like, hey, I don't have time to write an email. Can you know I'm getting 50 of these a day? Can an LLM help me do that?

SPEAKER_00: 20:40

Yeah. And the question is, what is the job you're assigning the LLM? Is it is it like, here is this thing, here's this document, write me the whole thing, or is it, hey, help me talk me through this thing, and then we'll use other ways to kind of compose together a better solution. So it's the the question is whether the LLM should do all of it or whether it should just be like a surface-level thing that makes sense of what you're talking and use it for its sort of natural language understanding capability as opposed to all of the reasoning that would be needed for a particular task. And so I I know researchers are coming up with what what they call architectures, different ways of arranging LLMs plus other components together in a system. But I think again, the key point is we need to keep into a take into account the uh not just how accurate it is or how much efficient how efficient it is from the perspective of human labor, but also how um you know and energy efficient is it? You know, we might prefer a slightly less accurate solution that has significantly lower energy use, and would might be lower cost, right? And lower and therefore lower cost, but because we have a human who's looking at this stuff anyway, so maybe the accuracy, slight variations in accuracy is is okay. Humans can handle that, right?

SPEAKER_01: 21:48

Yeah. It's very interesting. And another thing that you sort of started to touch on there was like, hey, how much information are we putting into the LLM? Like if we're putting, you know, 50 pages worth of clinical notes into the LLM for it to summarize, that's a lot bigger task than writing a single prompt.

SPEAKER_00: 22:07

Yeah.

SPEAKER_01: 22:08

So, you know, how can we be efficient about what information we give as an input for the LLM to analyze, also?

SPEAKER_00: 22:16

Yeah. And so I, you know, I really encourage AI engineers and folks working with AI engineers to bring up these issues because these are issues that are actively involved in designing these systems and using these systems. And when you incorporate these systems into your uh healthcare system, then you want to have asked these questions to whoever is selling you the tool. Um, it's not just about accuracy, it's about accuracy given um at what cost, right? Right. Accuracy divided by the energy energy use, right? So um that would make an LLM lab much less um desirable um just because of the high energy use.

SPEAKER_01: 22:55

Yeah. Um The other thing um that you started to touch on earlier was this concept of like how like the time of day when the demand is high, etc. And it I'm I'm so used to instant gratification from these LLMs, right? Like I type something in and I get a response right away. Or if it's like we're busy right now, you're gonna have to wait a minute, I'm like, ugh, well, I don't want to wait a minute. But it's interesting because like it depends. Like, what if the model could run at night? Um, what if the model could run on a cooler day? Do I need it right now, or could I wait till it has sort of like a pause or a gap where it's more efficient to run? And so it's like, hey, can we can we like increase the response time? Is that okay? Yeah. Uh, could we do things like in batches? If we, if something's like not urgent, could we accept a longer turnaround time? And how could we bake that into the system itself?

SPEAKER_00: 23:50

Yeah, absolutely. I mean, that's that's the piece of scheduling, right? That you want to run those heavy jobs at night uh in cooler hours and um you want to avoid the peak grid hours as much as possible. Um, that's definitely a piece of it, I think. Um, you know, having uh being, you know, asking your providers and people who provide these tools to disclose all of this information, um, to disclose their carbon dioxide uh use, disclose where the data centers are located, and so on and so forth. So I think we are demanding more, I think we should and we should be demanding more, uh, because it's energy sources are not infinite. And um I I you know I yes, in the future we might find new ways of optimizing. We might find solar is really powerful, we can use that more. Great. That's if that's the reason we move forward towards renewables, that's wonderful. But while that's happening, I think it is very important to keep into account, uh take into account the energy use.

SPEAKER_01: 24:46

Yeah, and I think um there are so many possibilities with AI, but we do have to balance that with governance. We do have to think of the trade-offs. And even at individual levels right now, there are things we can do, right? There we can decide whether or not AI is actually the appropriate solution. And then which type of AI does it have to be an LOM? We could choose to use a simpler model, knowing that that is gonna be more energy efficient. And we could choose to say, hey, like a longer turnaround time is okay, or I'm gonna try to have this run overnight or something like that, in order to um improve the energy efficiency. I I know I'm gonna have this like visual in my mind of like 45 mls of water getting poured out for every prompt I write, which is just, I think it like puts a reminder in your head that there is a trade-off here. And we have to take that into consideration when we consider all of these AI use cases. Yeah.

SPEAKER_00: 25:43

And just to just to recap, you know, an AI prompt turns electricity into heat on a chip. Cooling removes that heat, sometimes by evaporating water. And and the power plant that supplied that electricity may emit carbon dioxide. Per prompt, it's tiny. Uh seconds of a light bulb or you know, just a few drops of water. However, across millions of prompts and big training runs, the total depends on where, when, and how efficiently we run these systems.

SPEAKER_01: 26:09

Yeah. And um, when you think about it at scale, it's not surprising to me that it there are predictions that it could be using 10 or more percent of electricity in the United States. But there are things we can do to try to offset that. So I think we can end here. We will see you next time on Code and Cure. Thank you for joining us.

Laura Hagopian

Host

Vasanth Sarathy

Host