#10 - Skill Erosion in the Age of Medical AI Artwork

Code & Cure

Decoding health in the age of AI

Hosted by an AI researcher and a medical doctor, this podcast unpacks how artificial intelligence and emerging technologies are transforming how we understand, measure, and care for our bodies and minds.

Each episode unpacks a real-world topic to ask not just what’s new, but what’s true—and what’s at stake as healthcare becomes increasingly data-driven.

If you're curious about how health tech really works—and what it means for your body, your choices, and your future—this podcast is for you.

We’re here to explore ideas—not to diagnose or treat. This podcast doesn’t provide medical advice.

All Episodes

Code & Cure

#10 - Skill Erosion in the Age of Medical AI

September 18, 2025 • Vasanth Sarathy & Laura Hagopian

Could AI be making doctors worse at their jobs?

As artificial intelligence becomes a trusted tool in modern medicine, a surprising question emerges: could relying on these systems actually erode human expertise? We explore a compelling study from The Lancet that found a 6% drop in detection rates for endoscopists who initially used AI to identify precancerous polyps—then lost that edge once the AI was removed.

This episode unpacks how AI isn’t just a helpful assistant—it may be reshaping how physicians think, reason, and make decisions. Unlike a stethoscope or scalpel, which extends physical capabilities, AI intervenes in cognitive processes. What happens when that crutch is suddenly gone?

We delve into the subtle but important distinctions between tools that amplify skill and those that risk replacing it. From seasoned practitioners to medical trainees raised on AI support, we ask: what kind of clinician emerges when core diagnostic thinking is offloaded to machines?

Through the lens of interaction design, we explore different models for integrating AI—whether as a second reader, background assistant, or tightly scoped tool—and how each impacts long-term expertise. The right design, we argue, could support true human-AI partnerships without compromising clinical judgment.

Tune in for a provocative conversation that challenges simplistic narratives about technology in healthcare—and rethinks what it means to be an expert in the age of artificial intelligence.

References

Are A.I. Tools Making Doctors Worse at Their Jobs?
Teddy Rosenbluth
The New York Times, August 28, 2025

Endoscopist deskilling risk after exposure to artificial intelligence in colonoscopy: a multicentre, observational study
Krzysztof Budzyń et al.
The Lancet Gastroenterology & Hepatology, 2025

Relying on AI in Colonoscopies May Erode Clinicians' Skills
Joedy McCreary
MedPage Today, August 12, 2025

Expert reaction to observational study looking at detection rate of precancerous growths in colonoscopies by health professionals who perform them before and after the routine introduction of AI
Science Media Centre, August 12, 2025

Upskilling or deskilling? Measurable role of
an AI-supported training for radiology
residents: a lesson from the pandemic
Mattia Savardi et al.
Insights into Imaging, European Society of Radiology, 2025

AI-induced Deskilling in Medicine: A Mixed-Method Review
and Research Agenda for Healthcare and Beyond
Chiara Natali et al.
Artificial Intelligence Review, 2025

Credits:

Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0
https://creativecommons.org/licenses/by/4.0/

Speaker 1: 0:01

Is AI making doctors stupider?

Speaker 2: 0:13

Hello and welcome back to Code and Cure. My name is Vasant Sarathy and I'm an AI researcher and I'm here with Laura.

Speaker 1: 0:19

Hagopian, I'm an emergency room doctor. Hey Laura, how are you?

Speaker 2: 0:24

Good, wait is stupider, even a word.

Speaker 1: 0:27

It's a good question. It sounds kind of a little weird coming out of my mouth, but let me look that up real quick yeah, because it feels like it should not be one, but it could be. It actually is a word. According to Merriam-Webster, stupider and stupidest are both words, and they are both in good standing. They are the comparative and superlative adjectives and they actually have a long history of usage, even though they sound kind of strange.

Speaker 2: 0:55

Oh, there we go and we're done with the podcast. No, I'm kidding. Everyone's had their vocabulary lesson for today, but maybe we should actually try to answer the question yeah, and maybe we should talk about a little bit about, uh, you know, the topic of at hand itself, which is, um, de-skilling yeah, and we kind of got interested in this topic based on a recent paper that was published in the Lancet that was looking at whether or not doctors got de-skilled while performing endoscopies.

Speaker 2: 1:33

Yeah, this came out of a New York Times article too. Right, there was a New York Times article recently about the idea of AI being used in the medical profession in different use cases, but sort of doctors losing their expertise at some level in performing procedures or in analyzing things, or Exactly, exactly, and so this particular study.

Speaker 1: 1:55

What they did was they had endoscopists or surgeons who were doing scopes, use an AI tool, and the tool was supposed to help them find adenomas, which are these little growths that are precancerous lesions. You want to remove them. If you're looking at someone's colon and you find an adenomatous polyp, you want to get rid of it, and that can actually help prevent cancer. So they had this AI tool that they use to help find these adenomas, these precancerous lesions, and so what they did was they said okay, let's measure your adenoma detection rate before you use the AI tool.

Speaker 2: 2:38

What's adenoma detection rate?

Speaker 1: 2:40

How often you find these adenomatous polyps right, and it's usually around 25% or so. And then let's give you the AI tool which will kind of like help circle them or box them off and show them to you. And then let's take the tool away and see what happens.

Speaker 2: 2:58

After some period of use with the tool.

Speaker 1: 3:00

Yeah, they got to use a tool for like three months, gotcha okay. So now, three months goes by, they stop using the tool. Yeah, they got to use a tool for like three months, I gotcha okay. So now, three months goes by, they stop using the tool and they see okay, are these trained clinicians, these endoscopists, these surgeons, able to still detect these precancerous lesions? Because the whole point is you want to remove them, right, you want to find them and you want to remove them to prevent cancer.

Speaker 2: 3:26

And drum roll please what they found was that the adenoma detection rate decreased by like 6%. Wow. So by that metric, it seemed like the doctors at the end of the study without the AI, after having used the AI's assistance throughout, when they did not use it, at the end, their job performance in, specifically in the adenoma detection rate metric went down.

Speaker 1: 3:57

Yeah, exactly, and so that's this concern around. Did they de-scale? Yeah around, did they? Did they de-skill? And I guess, kind of in parallel with that in my head, is this question of like, how bad is it to depend on AI tools like this, because other papers I've looked at have actually shown that AI helps you find more adenomatous polyps.

Speaker 2: 4:27

Yeah, and I think that was the big motivation for this paper was that prior work, both in endoscopy but also in other applications, found that AI assistance can actually improve the quality. And we, you know, we talked about that with diabetic retinopathy in our first episode, right, when we talked about the AI being really good at finding certain things from images and being really able to help out. So the fact that it was able to do some assistance and actually improve that might have suggested that maybe it's a good thing. But what we're seeing now from the study is that, yes, maybe that's a good thing, but it seems like the doctors have lost their own ability to do that effectively, right? That's kind of what this study is suggesting.

Speaker 1: 5:10

Yeah, and it's an observational study. It's not a huge study, but that's like the question that it's raising is you know, in just three months did these providers lose some of their skills and then so it doesn't matter, right?

Speaker 2: 5:27

right. So let's take a pause here for a second and think about what that provider skill might look like. Like, can you walk us through what it might be like to do one of these I don't know colonoscopies, like what is the doctor doing and and you know, and then we can talk about the AI use in connection with that.

Speaker 1: 5:46

Yeah, so you know they've got a scope in their hands, they're putting it in, they're taking a look at your colon and they're visualizing the inside there, right. So they're they might have to suction, they might have to press on areas they where the colon curves.

Speaker 2: 6:03

They're going to have to, like, kind of turn around and look backwards, um, and they're trying to see if there are any abnormal growths there and this is all through like an endoscopy, like a little camera light thing, exactly yeah camera and light they put in, yeah, and they're observing it on like a screen or something, exactly right right and they're controlling that're controlling that scope and deciding where to look, how to look, what else they can do along with it to get a better look at something right.

Speaker 1: 6:32

Right the. Ai is not doing that piece of things looking at what was actually displayed on the screen and flagging whether or not it might be an adenomatous polyp.

Speaker 2: 6:48

Right. Right, and just to be clear, we're not sure exactly how the AI was used in this particular observational study, and I think that's something that I want to get into at some point as well, but this is good to understand from a contextual standpoint. As to the task here is not just looking at an image, but it's a very active task, um, unlike other imaging type tasks where you take one picture and you can then let the ai do whatever it needs to do. Here taking that one, there's no one picture.

Speaker 1: 7:18

It's sort of a exploration, in some sense, and reasoning as you explore, deciding where to go next and what this means and what this could mean, and then doing some tests, pushing on it and stuff, and so that's yeah, like I want to pause here, or I need to suction because I don't have a good enough view, right, or I like I want to turn, or turn back and take a look at this from a different angle. So your endoscopist is is actually actively involved in that whole process, right, and it's active video that's happening. You're right, it's not a single image that can be sent for detection.

Speaker 2: 7:52

Right, right, right, right, yeah. So that's very interesting. And of course, our metric here is the ADR, or the adenoma detection rate, which is sort of an end metric. Right, it's at the end of it all. It doesn't account for all the specific actions that the doctor is taking, so we don't know what skills specifically are being de-skilled, so to speak. Is it that the doctor is not looking enough because they're trusting what the AI is saying, or is the doctor not checking what the AI is saying and looking enough but not checking what the AI is saying, and looking enough but not checking what the AI is saying? It's unclear, I think, from the study, what specifically the modality was of that AI use. But if it's okay with you, I'd like to jump in a little bit and talk a little bit about that, a little bit more.

Speaker 1: 8:35

About human-AI interaction, because I think you're right, that's what this problem is. It's not just an AI issue.

Speaker 2: 8:43

Right, right, right right, and we have the issue. You know, we have the sort of the overarching issue of a so what, the doctor is getting descaled, and maybe we'll return to that towards the end of the episode. But I think the the question that I have in this is that what exactly is the doctor and the ai doing together? And there's a whole field of study human AI interaction, human robot interaction. There are thousands of researchers, there are full conferences that people do just thinking about how that interaction could be for different use cases, be it AI assisting in tutoring, ai assisting in medicine, ai helping pilots fly planes. It could be a whole range of different things and every one of them has slightly different priorities and therefore different interaction design aspects.

Speaker 2: 9:31

And when you think about AI, human interaction design, you kind of want to think about four different things. You want to think about the human involved and what they bring to the table, and you want to think about the AI system involved and what it can do for the human. You want to think about the AI system involved and what it can do for the human. You want to think about the actual interaction modality how exactly is the human interacting? Is the human asking questions? Is the human just reading off of the AI? Is the AI requesting information from the human, and so on?

Speaker 2: 9:58

But then you also have other contextual factors the environment and so on that the interaction is taking place in. All of this matters when it comes to designing a good human interaction. And all of this is to say that it might be the case that you get results like this, where it seems like the doctors are performing worse. The confounding variable here could be that it's not that the doctor is necessarily getting worse. It's that the specific interaction design wasn't optimal for that purpose you could have an AI system working in real time with the endoscopist and you know, trying to identify these polyps.

Speaker 1: 10:48

Or you could have an AI system, kind of like in the background, that's like passive or more of like a second reader, where you don't necessarily get the results right away, or you could have it act totally autonomously. But you're right, like now, that I think about it, it's not necessarily what is the AI doing, what is the physician doing, but actually, like, what is the interaction between the two that is so important?

Speaker 2: 11:15

Yeah, and you have to remember that the physician has more information than the AI system, because the physician is not only looking at the images, but they're actually feeling around inside and they're actually, you know, sort of deciding where to go next and making plans about that. Reasoning about that. Those are not the things that the AI system is doing. For the most part, they're looking at images, right, and so if and if you imagine an interaction in which the AI and the human are looking at it together and there's things popping up on the screen and maybe this little square box is shown around certain things and given names, then that's going to offload that thinking for the human. Assuming that that is correct, right, but you can see the issue here, which is that that little boxing and labeling of that from the AI system is likely to be incorrect because it doesn't have all the information the human has. So now the human has to work with. It's interesting to explore this more deeply and I think that you hit the nail on the head in, the AI system could be concurrent with the human, and that's a scenario it could be a second reader where the human is encouraged to come up with their own analysis first before looking at an AI systems analysis, right? In which case it's not really. It's assisting the human in a not in an efficiency manner, but for an accuracy purpose. Maybe the human can discard what the AI is saying after the fact, right? Because if the human has considered things that the AI hasn't, that might be useful. And there are other ways to do this too. I mean, it could be the case that the AI's role is limited.

Speaker 2: 12:47

The AI is only used for polyp detection versus polyp characterization. Maybe the AI is only supposed to do certain things and therefore you control the degree to which the human begins to rely on the AI system. And this gets into a really hazy area here, because again, we had the same similar sort of thing with episode one and diabetic retinopathy, but the difference there was that the accuracy of these systems was very high. The diabetic retinopathy AI systems were up in the 99.9% accuracy ranges, and so you could have the AI system, just look at a picture and tell you whether or not there was something bad. It seems to me here, based on the research, that the accuracy rate of these systems are much lower for polyp detection and, more importantly, the false positive rates are way higher, which means they're going to flag things that are not really polyps as polyps and or as I don't know, like the specific kind of polyp that's like pre-cancer as the adenomatous one.

Speaker 2: 13:52

Right, and so then it becomes a question of okay, how do you build trust? How does the human build trust with the AI system in that scenario If it's producing false positives all the time?

Speaker 1: 14:01

Okay, like I have a confession to make, though I don't know, I feel like, if AI like put a box around something and was like this is an adenoma, I like I might trust it more than I trust myself. Like, and that's probably not a good thing, right, that's a problem in this situation, and so I don't, like I don't really know what to make of that. Like, sometimes I do feel like, oh, it's objective. It put a box around it and it says that it's a, it's an adenoma, like we've got to take it out.

Speaker 1: 14:30

Instead and maybe this is part of the AI human interaction that you design Maybe it just puts a box around it and says look at this one, what do you think? Instead of flagging exactly the type, it is right, because the more information it gives me, the more I'm like, oh, I believe exactly that. But if it's simply flagging it, you're creating a human AI interaction where you're saying, okay, I got to look at this one, but it's not telling me exactly what it is, cause I, I personally just like I assume it's correct if it flags it as X, but you're telling you know, and the research shows, and you're telling me it's definitely not always correct.

Speaker 2: 15:05

Yeah, and you know, in support of the, in support of the study, you know, one point that could be made is that it's actually working right. Whatever interaction design they have going works because their adr rates or, I don't know, my detection rates, went up right.

Speaker 1: 15:20

So, in a sense, with the ai, yeah, they went down when they took the ai away, but it went up with human plus ai.

Speaker 2: 15:25

So you know, I would imagine that there might be other follow-on studies, um, in which they compare different interactional design patterns between the human and the AI and see which ones produce the best improvements in performance for the human AI team. Right, yeah, that's what you have here. You have here a human AI team and that's not the worst thing in the world, right? And so that's kind of what motivated the study in the first place. Now the question is they went back and they said well, the humans don't have that skill anymore. After three months, seems to have lost, seems their performance declined. Now it's unclear again whether the performance declined temporarily because they had just come off of using the AI system and they were used to that new paradigm, or it was there some actual degradation and memory loss. Memory loss is a strong word, but there's, you know, there's a bit of if you, you know you lose it if you don't use it, kind of you know kind of thing. Is that happening as?

Speaker 1: 16:24

well, it's not like riding a bike. As it turns out, autonomic detection is not like riding a bike.

Speaker 2: 16:28

That's right. It's not as internal. Maybe that's the case, we don't know that. Again, that's something we don't know from the study itself, but the study shows that, yes, that the humans do better with the AI system when they've been exposed to the AI system.

Speaker 1: 16:42

But okay. So this is interesting to me because we have humans here that are trained, that have been doing this for a long time now. Imagine new trainees coming into the system, new doctors who are being trained for the first time in how to do endoscopies, and that's like a whole different ballgame, because now they're what it's kind of like the digital native version. They're AI natives, like they're coming into a system where AI is basically table stakes, right, and they're used to having these detection tools from the beginning. And this is interesting to me because we talk a lot about having a human in the loop and obviously there's definitely humans involved here, because there is human AI interaction. But this is so interesting to me because if you have humans who have only developed the skill with AI, how do you make sure that the adenoma detection remains good? Like what if the lighting were different or the camera change or something else happened to the AI system, would these humans be able to detect that?

Speaker 2: 18:02

That's a great question and again I think I want to go back to the human AI interaction paradigm, which is this task of interaction design, where you're designing what the interaction should look like for different tasks of different purpose, different designs, your task in a real clinical setting. For a doctor trying to do this, it might be very different from a training session. In a training session, that interaction design might be entirely different. So when you can actually use an AI tool like this to help trainees by not giving them the answer right, by having to work with them, and then slowly building up the use case.

Speaker 2: 18:33

So you train them. You teach them answer right, by having that passive slowly building up the use case. So you train them, you teach them. Okay, this is the thing you missed, or that is the thing you missed totally different problem to solve, but totally doable, right. And that's a different interaction design. And so you're not just giving them all the tools immediately, you're helping them understand first principles before they work themselves up to using the tools.

Speaker 2: 18:51

It's really funny because you know, the example that I keep thinking about is I play a lot of chess and in chess you can play chess and computer systems are way better than any human right now. And people use computer engines, as they call them, to help them with their chess game. And the way they do that is they play the chess game and then afterwards they analyze their own game to look for mistakes. Now you can just turn on the engine at that point and the engine will tell you this move you made was really bad. It might not tell you why or how, it'll just tell you this move is really bad and it's correct. Now the question is it's up to the human to decide what to do with that information and they can move on with it. Now, as a trainee, you're encouraged to not use the engine to think for yourself and figure out what things, why that move was bad, and the way you do that is literally work through and work hard. You know you rack your brain and think about what was difficult and what was wrong about that move, but by doing that once or twice you are now better equipped to not make that mistake again in the future. And so you could imagine a simple, similar training paradigm here as well, where you have trainees working without the AI system, training and learning the actual detection task and then utilizing the tool as almost like a superpower for them to be able to then do things faster and better. But it comes right back down to the specific interaction design.

Speaker 2: 20:10

But with all that said, I do want to return to sort of an overarching question that we started off with, which is so what? Who cares? So there are these skills without that. I mean, we all use calculators now. We all use computers for things. We don't use typewriters anymore. Pilots use autopilot all the time. They rely on it. So humans do rely heavily on technology right now. And yes, of course you take all that away. We're not going to do things as well as fast as accurately. Is that such a bad thing?

Speaker 1: 20:39

Yeah, I mean, that's a really good question. It's like we're developing new tools all the time, whether they're ai or not. In medicine, you think about, okay, people practice without having stethoscopes. Right now, I couldn't imagine walking around without a stethoscope, um, but yeah, I think that's. I don't have the answer to that question, but I think in some situations it could be okay to use ai as a tool, to depend on it in a way. When you've designed the interaction, that's appropriate, but you still need to be able to have oversight of what that interaction looks like and how that tool functions and make sure it's functioning properly.

Speaker 2: 21:20

Yeah, and also, I think there are different kinds of tools. I think about stethoscopes sometimes, and I think that, before the stethoscope, I mean, how do people listen to heartbeats? Do they literally put their ears on someone's chest and listen to the heartbeat and listen to their lungs? You know, and that was clearly not efficient, not from a. You know, that was a better way, right, stethoscope was way better than doing that. It just amplified the signal that already existed, right?

Speaker 2: 21:47

That's not what this AI is doing, though here. It's not taking your light image and giving you a clearer, better look. That would be a tool that would improve, give the human more information or more relevant information, but what it's doing here is doing some of the human's job. Right, it's doing the reasoning job of figuring out if something is a bad type of, you know, adenoma or whatever and that's a different type of task, and a lot of AI right now has been about automating human reasoning, and I think we should be a little careful with that, because it might seem like the AI systems are doing reasoning, but they're not, and it's, to some degree, that's where the over-reliance comes in, because you also, as a doctor, might feel like, oh, it's giving me a nice little box and giving me a very medical-sounding name for the thing Exactly.

Speaker 1: 22:32

It must know better. And it's like taking the mental load off me, right, so now I can concentrate on just like getting the best picture or whatever it is, and like that mental load has been removed. It's like you basically like take that weight off and you concentrate on other things, which is not necessarily the best thing to do, right, right right, and it's a decision making step.

Speaker 2: 22:55

Right, it's a step of characterizing, detection, decision making. It's not a step of receiving a signal and improving the quality of that signal. I think that's how I think about the differences between them. I mean, autopilot is another example where people you know airplanes, fly, fly themselves for the most part, and that's great, but there's always a human sitting there looking at everything, observing everything. It can jump into manual mode as needed. You know, you can imagine a system like that here, but I don't know that we're at that level of robustness. In these autopilot situations there's a lot of redundancy, there's a lot of guarantees about the performance of these systems and such. Here we don't have all of that, and so I think that it's a little different. In that scenario you have to really trust the system to do the right thing for the most part, and I don't think we're quite there yet with these sorts of computer-aided diagnosis.

Speaker 1: 23:46

Yeah, I mean it sounds like we're not quite there yet, but there's a potential promise, um, for some of these systems, especially when there is this appropriate human ai interaction.

Speaker 1: 23:59

Yeah, yeah so I mean, I guess it comes back to this question that I asked at the beginning, which is grammatically correct, I'm just pointing out is AI making doctors stupider? And I don't think we really have an answer, because what we unpacked in this episode is just that it's like not so simple. It's not so simple as the human and the AI, but it's also has to include the environment that you're in and the interaction between the human and AI, and that design and the task that at hand training versus doing the actual.

Speaker 2: 24:36

Thing.

Speaker 1: 24:36

Right, the design of that interaction. Yes, is key here.

Speaker 2: 24:41

Yes, it can make all the difference. I think that's the big takeaway. So, you know, when people have read these stories, I think I would want them to think about not AI as this monolithic thing that just lives, but specifically about how it's being used. Because if, without that clarity, we won't know exactly what the result means, right, as we've seen here.

Speaker 1: 25:01

Yeah, well, this is a great conversation. Thanks for joining us and we'll see you next week on Coding Cure. Thank you,

Laura Hagopian

Host

Vasanth Sarathy

Host