
Code & Cure
Decoding health in the age of AI
Hosted by an AI researcher and a medical doctor, this podcast unpacks how artificial intelligence and emerging technologies are transforming how we understand, measure, and care for our bodies and minds.
Each episode unpacks a real-world topic to ask not just what’s new, but what’s true—and what’s at stake as healthcare becomes increasingly data-driven.
If you're curious about how health tech really works—and what it means for your body, your choices, and your future—this podcast is for you.
We’re here to explore ideas—not to diagnose or treat. This podcast doesn’t provide medical advice.
Code & Cure
#4 - From Florence Nightingale to AI: Revolutionizing Outbreak Surveillance
What if a 19th-century nurse laid the foundation for 21st-century disease surveillance?
Florence Nightingale, widely known for her compassion, was also a pioneering statistician who used data to reveal a hidden crisis: more soldiers in the Crimean War were dying from infections than from battle wounds. Her insights led to life-saving reforms—and sparked a revolution in how we understand public health.
Today, that same spirit of data-driven action lives on through artificial intelligence. In this episode, we explore how modern AI systems are transforming outbreak detection by scanning signals across the digital world—social media, search trends, news in multiple languages, even environmental data—to identify early signs of emerging health threats.
From tools like HealthMap to natural language processing engines that monitor disease mentions across continents, AI has already proven its value by detecting outbreaks like H1N1 and COVID-19 before official systems sounded the alarm. But history reminds us that data can be misleading: Google Flu Trends famously overestimated flu cases by mistaking media buzz for actual spread.
That’s why the most powerful systems today pair AI with human epidemiologists, combining rapid pattern recognition with expert judgment. It’s a modern-day continuation of Nightingale’s legacy—a partnership where algorithms spot weak signals, and people decide how to act.
This episode uncovers how statistical thinking has evolved into intelligent surveillance, offering public health leaders a critical advantage: time. Time to act, time to intervene, and time to prevent the next outbreak before it becomes a crisis.
References:
Artificial intelligence in public health: the potential of epidemic early warning systems
Chandini Raina MacIntyre, Xin Chen, Mohana Kunasekaran, Ashley Quigley, Samsung Lim, Haley Stone, Hye-young Paik, Lina Yao, David Heslop, Wenzhao Wei, Ines Sarmiento, Deepti Gurdasani
Journal of International Medical Research, March 2023
Digital Disease Detection — Harnessing the Web for Public Health Surveillance
John S. Brownstein, Clark C. Freifeld, Lawrence C. Madoff
The New England Journal of Medicine, May 2009
HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports
Clark C. Freifeld, Kenneth D. Mandl, Ben Y. Reis, John S. Brownstein
Journal of the American Medical Informatics Association (JAMIA), 2008
Surveillance Sans Frontières: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project
John S. Brownstein, Clark C. Freifeld, Ben Y. Reis, Kenneth D. Mandl
PLoS Medicine, July 2008
AI systems aim to sniff out coronavirus outbreaks
Adrian Cho
Science, May 2020
Real-time alerting system for COVID-19 and other stress events using wearable data
Arash Alavi, Gireesh K. Bogu, Meng Wang, Ekanath S. Rangan, Andrew W. Brooks, Qiwen Wang, Emily Higgs, Alessandra Celli, Tejaswini Mishra, Ahmed A. Metwally, and many others
Nature Medicine, January 2022
Real-Time Digital Surveillance of Vaping-Induced Pulmonary Disease
Yulin Hswen, John S. Brownstein
The New England Journal of Medicine, October 2019
Advances in Artificial Intelligence for Infectious-Disease Surveillance
John S. Bro
What do Florence Nightingale and 21st century AI have in common? More than you might think.
Speaker 2:Welcome. My name is Vasant Sarathy and I'm here with Laura.
Speaker 1:Hagopian.
Speaker 2:Hi Laura.
Speaker 1:Yeah, I'm excited about today's topic. We're here to talk about early detection of disease outbreaks with AI. Oh, that's cool, yeah, but first I think we should talk a little bit about Florence Nightingale.
Speaker 2:I guess that was the hook.
Speaker 1:When I say her name, what do you think about?
Speaker 2:I think that she was the first nurse and she did some amazing things during the Crimean War, I think and yeah, I just I think of her as a nurse.
Speaker 1:Yeah, I mean she was an amazing nurse, but did you also know that she was an amazing statistician?
Speaker 2:I mean, no, but I kind of do because we're doing this podcast.
Speaker 1:I know, just like, come on roll with me here, okay, no, I didn't know, thank you, thank you, okay. Um, well, one of the amazing things that she did was she started to notice trends in the deaths that she saw and she started to categorize them and she started to map them out. So, basically, what she did was, during the war, she noticed, okay, there are a lot more people dying of kind of potentially preventable disease than of battle wounds. And so, month by month, what she did was she marched out, okay, how many deaths are there due to battle wounds? How many deaths are there due to disease, and how many deaths are there due to disease and how many deaths are from other reasons, and she categorized those in different colors. You know red, black and blue. And she found that the blue was the largest. Guess what the blue was Not battle wounds. It was not battle wounds, it was not, it was disease. Wow, and like by a large order of magnitude.
Speaker 2:Which must have blown people's minds back then, because obviously the natural assumption is you got hurt and injured and you died from that right, and so that's what everything must have pointed towards, and so this must have been completely counterintuitive.
Speaker 1:Right, and so that was one of the first ways that she was able to use statistics to advocate for. Oh gosh, we need a sanitary commission to come in and intervene. Right, we need better sanitation, better hand washing, better cleaning, better nutrition, all of this stuff. And when that happened, guess what happened to the blue wedges?
Speaker 2:Oh, became smaller.
Speaker 1:Yeah, exactly Like almost down to almost zero, right. And so once she tracked this and brought in this Royal Sanitation Commission to fix the problem, after synthesizing her data, she was able to basically move forward, create an action plan and create goals, and then you could see, with the way that she tracked it, gosh, this worked.
Speaker 2:That's incredible, Right. I mean just just, it's so many, so many levels. That's incredible, right. I mean she figured out that nutrition, ventilation, shelter, those things matter more. Sanitation and sanitation matter and matter more, and she was able to then translate that knowledge that she acquired from gathering all the data to something that was an action plan that she could act on quickly, and it actually resulted in incredible results. I think that's amazing.
Speaker 1:Yeah, so what does this have to do with AI? Well, I think this concept of taking lots of data and synthesizing it in order to create an action plan is something that AI can help with Not as much the action plan necessarily. That's where you might need public health officials to come and step in. But I do think that she manually noticed, she noticed this and then she manually aggregated the data and plotted it out and charted it, but I do think AI could step in here. So I wanted to talk a little bit about and today we're going to talk about disease outbreaks like how can AI help us kind of figure out early warning signs that a disease outbreak might be coming? And that's traditionally, you know, not been done with AI.
Speaker 1:As a physician, I remember reporting on certain diseases, or my hospital would, or my laboratory, would you know, notifying the public health authorities. Hey, like I saw a case of gonorrhea, or here are a couple of cases of salmonella, and that's how you end up ultimately reading about it on the news is, you know, physicians or providers or public health officials have noticed a trend. Lyme disease is another one where you know that's something that I would get a notice of. The lab is going to report on this automatically for you. Flu is something we report on too, although it's usually like a number of cases and not individual people, and so traditionally we've relied on sources like me who you know. It might be a little bit slow, right.
Speaker 2:Yeah, I mean, I think there's that. But I think I want to take a quick step back right now and talk about these sorts of early warning systems. And you mentioned disease outbreaks and that's great, but there's also this notion of potentially new diseases that come about, and again you want the early warning signs to those are going to be even harder, I would imagine, because then you don't know what you're looking for, but you just start to see this sort of common set of symptoms or something like that. And so those are also, I would argue, part of this bucket of a group of early warning systems that I believe AI can in fact help. And, like you mentioned, those approaches, the more formal methods of reporting and such, are potentially slow in their recognition of these outbreaks, but they also are labor intensive, oh for sure, and they're somewhat passive in some sense, and the labor intensive part requires people to some people to actively do something and report these things.
Speaker 2:But also, you know, keeping track of all of these different pieces is really hard, and and they're potentially not as good because they're not using as many data sources that could possibly give you signs and hints. And I mean, all of this is speaking to the same thing, which is that we now live in a world where there is all forms of data and data and new forms of data are constantly being developed, right, I mean? So you're collecting so much information about people's conversations, people's behavior, people's movements, physiological statuses of people and all of these things globally, which, then, is very powerful. But, of course, now, if it was labor-intensive before, it's really labor-intensive now to actually piece through all of that, which is where, you know, computational methods and AI can really jump in and be helpful.
Speaker 1:Yeah, it's like all of this data is at your fingertips, but it's almost too much data for it is too much data for like a human to sort through and you think about you're doing this kind of at scale at one level, but also you really want to understand what's happening in your local, in your location.
Speaker 2:Yeah, we talked about this before and I really thought that was an interesting insight, which is you both want sort of the global level things that are happening, but practically speaking for a healthcare provider or for a human being living in a particular location, they care about what's happening around them, and so you need both the local and sort of the really hyper focused local and the overall global at the same time, and I thought that was very interesting. We talked about this a bit earlier and that's a hard problem too. But again, potentially the quote unquote data has all the answers right.
Speaker 1:Right, if you can, you know, sift through it all, synthesize it all and translate it all. Right, If you're talking at the global level? Now you're talking about things in so many different languages and, again, that's hard for a human to parse through.
Speaker 2:Yeah, so can we kind of talk through potentially what this world of data looks like. So it's one thing to say we have formal, the old methods where you see somebody coming in, you have some set of symptoms or the CDC tells you about something that's happening.
Speaker 1:Right, I might get an email from the Department of Public Health. Hey, we've seen a number of cases of this. Please report if you see any. We do track things like wastewater too. I mean, people probably remember COVID wastewater numbers coming out and seeing peaks there, and that would sometimes come ahead of the actual cases that were reported too. So there's definitely a bunch of different ways that we traditionally go about doing it, but it doesn't involve always things like oh, let's synthesize news articles, let's look at what people are saying on social media. Oh, let's look at what people are searching for. What are their search queries about? You know what does their wearable data say? Or what's happening from a meteorological standpoint. You know what's the temperature, what's the rainfall, what's the humidity and how could that impact early disease spread. So there are so many different sources of data out there that go beyond what we think of traditionally as what we might use for surveillance.
Speaker 2:Yeah, no, that's incredible, and what we've seen from our reading of this is that there are systems out there that have been out there, I would say, for quite a bit now, like some of them have been out there for 15, 20 years now that have been tracking using data and using whatever the state of the art computational methods are to do this right. Should we talk about some of those?
Speaker 1:Yeah, I mean there's a bunch of them, but HealthMap, ProMedMail, are just a couple of examples. There are a bunch of these out there Blue Dot, et cetera and so I think I started playing around with Health Map just as an example to kind of figure out how does this work? How user-friendly is this? I mean, you can go on and look at it too. It's not behind any sort of wall. And I was curious to see, like, what does it say about? We're in Massachusetts right now? What does it say about where we live?
Speaker 1:And there was a couple of red dots on Massachusetts. I clicked on all of them and it said oh, you know, watch out, there's West Nile. There's all these news reports of West Nile being detected in mosquitoes in a variety of towns in Massachusetts and being found in potentially some animals as well, and so well, I'm going to make sure that I put on bug spray when I go outside. But it was interesting to see, like what is it synthesizing, what is it looking at and how does it do that at the local level? But if you zoom out, there were dots all over the map all over the world that show a similar thing in a different location. So I thought that was that was really cool to see.
Speaker 2:Was the red dot an indication of the number of cases or what was it representing? What was the redness? Do you remember that? Or do you know if Health Map also allows you to give you more details about? I know that it aggregates a lot of information, but I guess the question really I'm wondering, is it's actionable to you? You see the red dot and you're like you know it's like a pollen count, right, so you know right away that the pollen count is high. Is it similar to that? I, like you, know it's like a pollen counter, right, so you know right away that the pollen count is high. Is it similar to that? I have no idea.
Speaker 1:There was a dot and I clicked on the dot to see what it said.
Speaker 2:Yeah, no, but I mean it's still useful, right, Because you still know. I mean it's still useful because it's aggregating all this information, and West Nile is a thing right now in Massachusetts, so that makes sense, right, it's timely. It linked the news articles that were in my area with dates on them and it said this is what you're looking out for, right, right, right, right, yeah. So I mean, I think that that is an example of an AI maybe not AI necessarily, but a technology use case of aggregating lots of data and maybe we can talk a little bit about that, right, about how exactly it's doing that.
Speaker 1:That's what I'm curious about. It's like how does something like this work? It's so cool.
Speaker 2:Yeah, I mean one piece of technology that's been around for a while that actually led to the AI revolution that we have now with large language models, was natural language processing. Natural language processing has been around forever. It's just using various algorithms and computer techniques to read text and figure out patterns from those pieces of text. And our current generation of large language models is in some sense, a natural language processor, because it actually takes in text, it's learned patterns from that piece of text to detect the next token, and it does that. So it's a form of natural language processing.
Speaker 2:But NLP, as it's called, is a pretty broad field and a very actively researched field as well, and some of this for HealthMap, for instance, their tool goes back to 2007, 2008. And there's lots of tools like that. At the time, who did a lot of NLP work and what they did was they would take a piece of text and they would start breaking it down. And one task that machine learning folks like to do. In fact, now, if you take a machine learning course, that's often like the first task is something called classification, where you're given a body of text and you're told to classify it as one category versus another. For example, you get an email and it's either classified as spam or not spam. So that's an example and the idea is that the NLP system is able to look at that piece of text and figure it out.
Speaker 1:Wait, is that how my spam filter works?
Speaker 2:I mean, that's a very old-school spam filter. There are more sophisticated systems now, but that's essentially spam or not spam, and the way these NLP systems are trained is that you give them lots of examples of text that looks like spam and text that doesn't. Now, obviously, that's a lot harder if the text starts to look really good and you're not sure if it's spam or not, and these things get pushed and tested and expanded in that manner. But that's just one example. That might not be enough. Like it might not be enough to look at a social media post and decide, you know, like spam or not spam.
Speaker 2:You can't just decide if it's disease outbreak or not disease outbreak, right? I mean that's too coarse grained at some level. So what you might need in these cases is something a little bit more narrow, a little bit more focused and, again within the NLP space, there is a classification that you can do, not just on the entire body of text but potentially on words in the text. You can identify looking at a piece of text what location are they talking about? What disease are they talking about? Do they mention the number of cases? Disease are they talking about? Do they mention the number of cases, right. So that's an example where it's not the whole text you're after, but you're really trying to figure out. Okay, they're talking about this location and this number of cases and this disease and that information is useful because we can then use that later for something else to put together in our system.
Speaker 1:Okay, so I'm going to play this back to you, so like if there were a news article that said hey, there were 17 new cases of pertussis that were detected in the greater Boston area. Now you're telling me it's going to parse out things like the number of cases and the fact that it was pertussis and the location of being Greater Boston.
Speaker 2:All those things are sort of parsed out from that text Exactly, and it's a topic or a task, I would say in the NLP world, and it's often known as named entity recognition, which is you're identifying pieces of text that have a certain type of type associated with them. So like Pertussis or Boston, for example, is a location type. So it would identify all the things in the text that are location type and Boston is one of them and then that way, be able to extract useful data from it. Now you can take it and that's you know, in general. Entity classification, named entity recognition all of these talk about entities that are things that are being talked about in the text Boston, pertussis, those things are entities.
Speaker 2:And then there's more sophisticated things you can do. You can do what's called entity and span classification, where you're not looking at the whole piece of text but you're looking at a span of words that is a contiguous sequence of words or a phrase, if you want to think about it that way, and that could mean something, right? So maybe you're looking at just a phrase and it's not the whole thing you care about. There's lots of other things being talked about, but you only care about a certain piece, so you scan that. Then there's other relationships that you can extract. You can extract causal relationships, right. That might be suggested by the text, right? So maybe a person flew from one country to another and they somebody they talked about it on social media about how they travel from here to here, so then you might be able to draw connections that the person was there before Now the person is here now and then build that up.
Speaker 2:And you know, we talked about knowledge graphs before you could imagine, you could extract, look at these pieces of text and extract out graphs that represent the whole world and all the movements of these people right out graphs that represent the whole world and all the movements of these people right, which is, again, potentially very useful.
Speaker 2:One really cool thing that Health Map did, in addition to looking at all of these social media posts and extracting useful things, is then overlay it on a map. And that's super interesting too, because social media posts not only contain text but they contain metadata, and this metadata has all kinds of other information GPS, location, timing, all these other things that you don't necessarily have to see when you read the text, but it's in there and it's very useful if you want to know where in the world this was posted from and if you know where in the world it was posted from and the timing and such, that can help you correlate timelines for the disease progression, that can help you figure out spatially, geospatially, where in the world this was happening, and that's what you know. Those are the kinds of things that HealthMap is doing behind the scenes to potentially do that overlay that we saw before.
Speaker 1:Yeah, it's so interesting because it's like there's so much information out there so it is able to figure out like basically like what's important and where is it happening. And I saw also that it's able to use Google Translate too. So, you can get information in different languages and it's still able to utilize and aggregate that.
Speaker 2:Yeah, and in fact, one thing.
Speaker 2:So HealthMap is from the era of NLP.
Speaker 2:That was before a lot of the modern day LLM work that we have seen in the AI world, and so you could imagine applying these modern techniques and enhancing systems like Health Map quite a bit, because not only now, when you extract out pieces of text and you're doing this business of entity classification and classification of text and such, it's often very contextual, so like something being said is often only understood in the meaning in which it was said and the it's often very contextual, so like something being said is often only understood in the meaning in which it was said, in the context in which it was said Maybe there was other posts that were related, and so you understand it from that context and modern day LLMs are very good at understanding context and so they would potentially be better at doing these kinds of tasks because they would be able to figure out that you know, bank here means financial institution, not the side of a river, and so, like they're able to figure out those kinds of differences and so that I think you know you can easily imagine LLMs being used at various points in this early warning system to extract out interesting features from pieces of text, to identify what's being said there.
Speaker 2:And as we develop new factors, maybe we care about not just position and disease, but maybe we care about other things. We can start to introduce those as well and start to get more and more useful pieces of data.
Speaker 1:The challenge always is separating the noise from that's literally what I was just going to ask. How do you know what's really the signal, the right stuff, the stuff that HealthMap or these other applications want to pull in, versus stuff that's irrelevant or wrong or whatever? How do you differentiate the signal from the noise?
Speaker 2:I mean, you don't to some degree, and people, even some of the best tools, make mistakes, and there's a certain degree of accuracy and there's a certain you know, there's this notion of false positives and false negatives and things like that.
Speaker 1:But do they try Like do these tools try? They must try.
Speaker 2:Try. What do you mean by try?
Speaker 1:Try to separate the signal from the noise.
Speaker 2:Yeah, yeah, yeah.
Speaker 1:The right stuff that they want to synthesize and aggregate from. You know stuff that may be wrong, especially when you talk about. I mean news articles, hopefully, are well researched, but other sources of information like social media, it may just be one person saying one thing and you don't know how accurate that could, or couldn't be.
Speaker 2:So that is a great question and it's not clear to me what the tools exactly are doing. But it seems like you could have a tool that is purely passive, in the sense that it sees all the data, it does all the classification that you ask it to do, figures out all the positions, you know, the locations and the disease and all that stuff, and then it does some processing and then makes a conclusion as to whether or not there's a. You know it should be a red dot or a green dot, right? That's kind of roughly the pipeline, but of course some of this may need to be active. If a human were in that loop there, they might say, hey, wait a minute, you're basing your red dot conclusion on this social media post, but that's not what it's really saying. That media post is really talking about this other thing and we need to go check that other thing. So there's more active back and forth. You need to go back and look at things and come back and verify all the pieces of data.
Speaker 2:I'm not 100% sure to what degree these systems are doing that, but then again, with modern AI systems, especially AI agents that can go out and gather more information or answer a particular specific, targeted question that's come up because of the way they've set up their intelligence. It's possible that you could have those kinds of systems that are better at actually working through why something's an epidemic risk versus something's not. But again, you don't want these systems to be end-to-end. You don't want one bunch of data going in and then all processing happening behind the scenes and then one output. The problem with that is, if all the processing is happening in one system, then you have a bigger sense for error. There's a possibility that the system doesn't fully understand the complexity of the task involved, whereas if you have humans designing these things and extracting out useful pieces of information or breaking up that big system into smaller submodules, then you have potentially, I would argue, a better, more accurate system.
Speaker 1:Well, yeah, and then, of course, when you have this like alert oh, this epidemic might be happening, or this is an early warning sign of a disease you really do need the public health officials not only to step in and verify, like you were talking about, but to actually intervene. It's like, if this is happening, what do we do about it and how do we go about it?
Speaker 2:Yeah.
Speaker 1:Have we seen any good successes with HealthMap? I don't. I'm trying to remember, yeah, some of these applications we have seen successes with. I mean there was applications we have seen successes with. I mean there was there was, interestingly, an outbreak of, like a vaping related lung disease. So this is not something that you know is like a bacteria or virus or something, but they were able to identify that In 2009, the influenza H1N1 outbreak it was involved in, and some of these actually were able to detect early signs of COVID, like HealthMap and Blue Dot. So it is really interesting to see that they do work, they could work. But there have also been examples of AI misfiring too. One example of that is Google Flu Trends.
Speaker 1:I don't know if you have heard of that, yeah, yeah. One example of that is google flu trends. I don't know if you have heard of that. It's been shut down.
Speaker 1:Um, probably like 10 years ago it got shut down, so, um, but it was live for a while and at first it was pretty accurate. There were some papers where it was citing like a 97 accuracy in terms of detecting flu trends, but later on it ran into a couple of missteps. One was that it really did not detect this 2009 influenza outbreak and then the second was that it started to really overestimate how much flu was out there and that was sort of on the tail end and that's kind of when it ended up closing down around 2011 to 2013. It really started to overestimate and part of the reason for this was that was the way that it was basically like I'm not gonna do this justice, but the way that it was basically like I'm not going to do this justice but the way that it was programmed, where people were searching for flu and cough and flu-like symptoms in their searches and there were a lot of news articles about the flu, right, but that did not necessarily mean that there were rising flu cases.
Speaker 2:Right, so that is super fascinating.
Speaker 1:Right, and so in my mind it's like well, just because someone's searching for it or because there's a news article about it doesn't mean that there is necessarily an outbreak or there are necessarily a ton of cases happening at the moment. But I can imagine how, you know, an NLP model could think I'm using the word think could think that it's like can't really think, but you need to retrain that sort of model in that case.
Speaker 2:Yeah, I think that oh, that's you know. It points to a really important piece, which is the there's good data and bad data, and there is. While these social media posts give us a lot of interesting insights and potential, you know, directions and even early warnings, they're also, you know, susceptible to this, and it's not like this was a, you know, was a bad performance on the model itself. It was a situation where there was new data that did not seem to, should not have been considered right, should have been ignored, and knowing what to ignore and what to take into account and why some data points are more important than others is an open challenge. I think that's a really difficult problem to solve and I'm sure there's lots of people working on it, but that's a very difficult technical AI problem to work on.
Speaker 1:Yeah, and even though it was great at first, it was clear that as the news changed and the searches changed, we needed to keep learning, have the model, keep iterating and maybe bring the human back into the loop to do that, right, yeah, absolutely.
Speaker 1:Maybe bring the human back into the loop to do that, right, yeah, absolutely.
Speaker 1:But I do think, from some of these examples and from what we've talked about, ai is excellent at identifying these early signals rapidly from vast amounts of noisy data, and that's when you need to bring in the human and say, okay, let's verify that this data is correct and let's figure out how are we going to intervene on this vaping induced lung injury or on this, you know, flu outbreak, or on this potentially new disease Right, absolutely.
Speaker 1:So I want to leave us with a couple of take homes before we say goodbye. One is that AI can generate early warning signs for disease outbreaks, and it can do this using large quantities of unfiltered data, and it can do this both at the global level and at the hyperlocal level, and it allows us to be more and more proactive about our public health approach here. And then my second take home is that it's not enough, right, we also need humans to be involved in this process in order to verify is this actually happening Do we see this pattern and then, secondarily, to intervene if it is right. So we see the early warning signal and we need to do something about it to stop the outbreak, and this is where a tool like this could be just amazingly helpful.
Speaker 2:Great On that note wear your bug spray. Yeah, that's right. Thank you for joining us.