Code & Cure

#9 - Ambient Documentation Tech: Reducing Burnout or Creating New Problems?

Vasanth Sarathy & Laura Hagopian

AI is writing medical notes, but can doctors trust what it creates?

Burnout is quietly eroding the medical workforce—and documentation overload is a major culprit. Physicians now spend nearly half their workday writing notes instead of treating patients, pushing many to the brink of exhaustion. Could artificial intelligence offer a lifeline?

In this episode, we explore ambient documentation technology (ADT)—AI tools that automatically generate clinical notes by listening to patient-doctor conversations. On paper, the promise is bold: let physicians focus on care, not charting. But reality is more complicated.

Laura shares her firsthand experience with late-night charting and the emotional toll of juggling empathy and efficiency. We unpack the deeper roots of burnout—beyond paperwork—including overwhelming patient loads, chronic understaffing, and a culture that often punishes vulnerability.

AI-generated notes surface an intriguing paradox: human communication is effortless for doctors, but incredibly complex for machines. What a physician instantly grasps from a patient’s gesture or tone can easily confuse an AI system. The result? Notes that sometimes omit critical context, add irrelevant details, or introduce factual errors.

Early research reveals mixed outcomes—some clinicians spend extra hours editing AI notes, defeating the intended time savings. Yet there’s potential. With advances in multimodal input and smarter evaluation tools, ADT could still become a powerful support tool—not to replace doctors, but to restore their time.

Tune in to discover why turning conversation into clinical documentation is one of AI’s most challenging—and potentially transformative—tasks in modern healthcare.

References:

Evaluation of an Ambient Artificial Intelligence Documentation Platform for Clinicians
Stults CD, McDonald KM, Niehaus KE, et al.
JAMA Network Open, 2025

Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation
Tierney AA, Gayre G, Hoberman B, et al.
NEJM Catalyst Innovations in Care Delivery, 2024

Credits: 

Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0
https://creativecommons.org/licenses/by/4.0/

Speaker 1:

Imagine spending half your day not with patients but with paperwork. For many clinicians, that's the unfortunate reality. But what if the note could write itself while they focus solely on care? Could that be the future of medicine, or is it too good to be true?

Speaker 2:

true? Hello and welcome back to Code and Cure. My name is Vasant Sarathy and I'm here with Laura.

Speaker 1:

Agopian hey, Laura Hi.

Speaker 2:

How's it?

Speaker 1:

going. I'm doing well. How are you?

Speaker 2:

Good, thank you. Today's topic is all about how we can use what's called ambient documentation technology, or ADT, which is jargon for the idea that you can record patient-doctor conversations, and that, in turn, can be used and AI can be used to turn that into a doctor's note. And of course, the argument is that it can maybe help with the documentation burden that doctors experience. But before we really jump into the nitty gritties of that, laura, I want to just talk to you more about the causes of doctor burnout, because it seems like documentation is one of them, but I would love to hear a broader perspective from you on that.

Speaker 1:

Yeah, absolutely. I mean providers have. You know there are sky high burnout levels and it definitely depends on the type of practice you're doing, your specialty, etc. But there are a lot of factors that contribute to it. There are, you know, organizational factors, things like inefficient processes, and I would put the electronic health record in there as one of the top things. But there are other organizational problems like high workloads, insufficient staffing. If you don't have enough people working, you have too many patients coming in. It's stressful. A lack of autonomy, not being able to necessarily choose what you're going to do next. A lack of autonomy not being able to necessarily choose what you're going to do next.

Speaker 1:

Definitely leadership or leadership support the culture that exists within the organization. For instance, if there's conflict in between departments, that can lead to burnout. And well, near and dear to my heart, because I worked in the emergency department frequent interruptions, which happened all the time there, can also lead to burnout. Necessary, right if a new emergency comes in, but can really have an impact over time. And, like I mentioned before, it's definitely worse in high stress environments in places like the emergency department.

Speaker 1:

There's also individual characteristics that come into play here, things like empathy and altruism, but also perfectionism. I don't know. I might have a little bit of that myself and the culture of medicine that sort of stigmatizes self-care in a way. So there are a lot of things that come into play here, and the electronic health record is certainly one of them. And I can tell you I'm old enough that I have in fact documented on paper before and not just electronically. And the electronic health record actually increases the amount of time that is needed to document. I mean, there are statistics out there that show, you know, people are spending a quarter to half of their time documenting in clinic.

Speaker 2:

That is incredible to me. It seems like that is such a high number and the benefit of the electronic health record is there's some kind of unification across different hospitals, across different medical centers and such. But it's meant to make life easier for people and it seems like it's really not in some ways.

Speaker 1:

I mean it's interesting, right. It's like, okay, all the information is in one place. Okay, I can see my x-rays, I can see the labs, I can see what the specialist said, but there's also I mean, we've talked about medical summarization before there's also so much information there. So, unless you have an infinite time, it's like impossible to go through all of it. And these notes can get pretty bulky over time too, because there's a lot of copy and paste. That happens. And so sure, you have all the information at your fingertips, but then you have to, like, sift through it all and you have to, of course, write it all, and if you want to be able to bill at an appropriate level, you have to make sure all that documentation is present to be able to do so.

Speaker 1:

And so, yeah, I mean and I talked about how much time people spend, you know, when they're in clinic but I would bring this stuff home, like I would be doing notes in the evenings, because I was taking care of patients during the day and or night, as it was. I did work as a nocturnist but then you come home and you're like, geez, I have to finish my notes because I wasn't able to do that. Then you come home and you're like, geez, I have to finish my notes because I wasn't able to do that, and sometimes you have to finish them right away, especially if you're like transferring the care of a patient to somebody else in the emergency department. So there's a lot of there's a lot of pressure, there's a lot of extra time spent on things like that and often, you know, not always accounted for in terms of reimbursement know, not always accounted for in terms of reimbursement, yeah, I mean.

Speaker 2:

So I think that's kind of where we're at right now with, you know, this sort of challenge of burnout. But specifically, you know, in today's episode we're going to focus specifically on the burden of actual documentation, like you just talked about, and the question I mean sort of you know, we brought this up at the beginning of the episode which is the can AI help with this? Because of the way AI systems are set up, you know, they're meant to be um language and chatty and all of that. So they have the. In theory, one would think that they have the capabilities to listen in on a conversation and then be able to quickly translate that you know into something that's more structured, maybe into a format, and then be able to quickly translate that into something that's more structured, maybe into a format that's beneficial to doctors, or maybe a format that's amenable for the notes themselves.

Speaker 1:

Yeah, exactly, and actually my doctor does this now, so if I have a visit with her, whether it's a telehealth or in person, she turns on this what is it called Ambient documentation technology?

Speaker 2:

So it will help write a note in the background.

Speaker 1:

Yeah, and she actually sent me the text once, the full transcript and the note that it created.

Speaker 2:

And it was interesting.

Speaker 1:

When I saw it I was like okay, some of this is true, some of this is not quite true. The full transcript didn't necessarily all make it into the note itself, and what she showed me was what's called a soap note, and that's one of the traditional ways that we write notes in medicine. There are lots of ways that you could write notes in medicine. Not everyone writes a soap note, but I think for the purpose of discussion it's a great example of what we might write.

Speaker 1:

And so SOAP stands for Subjective, objective Assessment Plan, and so subjective is the stuff that comes right out of the patient's mouth. It's their own words, it's their experiences, it's their symptoms, it's their concerns, it's what they're telling you. And then objective is like what you observe, what you measure, whether that's the physical exam, whether that's the x-ray finding, whether that's the lab test, whatever it is. And when you put those together, you can then make an assessment where you kind of interpret the subjective and objective piece together and think through okay, here's what the diagnosis could be, or here's a list of what the diagnoses could be. Here's kind of like a summary. And then the last part, p plan, is like here are what the next steps are here's what we're going to do for testing, here's what we're going to do for treatment, here's what we're going to do for follow-up appointments, et cetera, and so that's a pretty generic structured note that I would want to see out of this kind of ambient documentation technology.

Speaker 2:

How long are these notes?

Speaker 1:

Well, it depends, right, right. I mean, if you have like a five-minute visit for a sore throat, that could be a very short note, right, and you build that in line with that. You're not doing necessarily a full head-to-toe physical exam on someone who's got a sore throat, whereas if someone comes in with I don't know, abdominal pain and you're taking a longer history, you're doing a more in-depth physical exam. You're doing more testing. You're doing a more in-depth physical exam. You're doing more testing. Those notes can get significantly longer.

Speaker 2:

Right right, and I would imagine that these conversations with the patient I mean not just imagine, I've also had these experiences are not just about the one thing that you came in complaining about. Oftentimes that opens the doors to maybe that's a sign that something else is wrong and you sort of shift gears and talk about that something else, because that's the root cause, and so it's almost investigative in a sense.

Speaker 1:

Sometimes these conversations and they could take turns that you didn't plan for at the beginning- yeah, sure, there's definitely like a lot of back and forth, trying to figure out what's going on, trying to observe what's happening in the room and, honestly, like being able to actually fully interact with the patient rather than documenting during the visit is something that sounds really great, right? I think a lot of times providers have trouble with that If you're supposed to be doing your documentation at the same time as you're in the room, like you're not making eye contact, you're not always getting the nonverbal cues potentially, and so having this ambient documentation technology could actually open that up, give you a little bit more freedom to truly interact with a patient.

Speaker 2:

Yeah, and I think you know I want to take a step back before we dive into more about the specific technology itself. There are other ways that people are dealing with this problem, right, people have scribes now and interns and such to help write things down, and so can you talk a little bit about what's there kind of outside of ADT? Yeah, absolutely.

Speaker 1:

When I charted, we could obviously like type on our own.

Speaker 1:

There were macros, so there were like templates so you could say, okay, here's what a normal physical exam looks like and you could import that into your notes so it'd be quicker to document. There was also like voice documentation so you could dictate your note, which was much faster. I did work in the days where you could actually like call a phone number and dictate your note and someone would type it up for you Wow. But then we, you know, progressed to like voice recognition kind of technology and then, of course and I never actually got to experience this where I worked but a lot of places do have scribes and so a scribe could come along with you to the appointment, do all the documentation. They can observe what's happening in the room, not just what is being said, but what's occurring right, what piece of a physical exam someone might be doing, or maybe some of the nonverbal cues, and can work on documenting and structuring that, and so that's something that you know it requires another human being to be present, but definitely had decreased the workload on providers as well.

Speaker 2:

A question on that. What is the training that a scribe gets? Are they also like medical students or are they like a human scribe? Is it just a person writing things down, medical students, or are they like a human scribe? Is it just a person writing, writing things down, or what's the? You know, what is the interaction between you as a doctor and maybe a scribe, you know, are they just kind of, um, even outside of what their training is? I'm just curious what the interaction is. Does the doctor then take the notes the scribe produces right away and puts it, files it, or what happens with those notes?

Speaker 1:

I mean, I personally have never used medical scribe, so it's hard for me to answer that question. They definitely have to have some training, right? They need to know medical terminology, they need to know anatomy. They need to understand HIPAA regulations, and so they often have training through like online programs, community colleges. There are scribe companies out there, et cetera, but I never got to experience a note written by a scribe, so I'm not sure how much editing it required after the fact.

Speaker 2:

Yeah, yeah, no, no, I was just curious more than anything else about that. So in this context, in this world of ways to reduce documentation burden, now comes, now enters AI, ambient documentation technology.

Speaker 1:

Or dun dun, or maybe I should have been more positive Dun dun dun.

Speaker 2:

That's right. It's funny because it seems like such a natural use for conversational AI. We already have these machines that can chat, that can talk. This seems like a great use case for it and would potentially help reduce the burden. Now, you and I, over the past week, have been reading some of the papers that people have studied using this technology in a real medical center in various parts of the United States and we were kind of surprised because, yes, it did reduce, it seemed like it helped some people, but it didn't help everybody, right? Can you speak to that a little bit? Just sort of what, what we had we? I remember reading in the and I'll we'll share these papers in the show notes, but they were like snippets of free, open comments that people were allowed to make about. You know, in addition to answering the survey questions, they were allowed to make these free, open comments about their experiences with using the ADT system. And you know, just, do you remember what some of them said? It was very interesting, I thought.

Speaker 1:

Yeah, well, actually, before I answer that question, I want to back up, because a lot of the providers didn't actually participate, like they invited people and then they like didn't fill out the surveys, and then the surveys were like really long, so I don't blame them.

Speaker 1:

Like if I'm going to get a 30 or 50 or 70, I forgot how many questions it was Like I don't think I would fill that out either. So I think you have like a skewed population to begin with of people who like maybe are already using it or invested in it and want to see it move forward. But even those people, I mean, they would say things like hey, this note is really bulky. Well, that's going to cause problems moving forward because I'm going to have to read that note when the person comes into the emergency department and I don't have time to read that bulky note in addition to the 700 others that are located in the chart.

Speaker 1:

People also noted it did a bad job with certain things.

Speaker 1:

If you had a longer visit, like a 90 to 120 long visit like you might get with a psychiatrist it just like distilled it into this really short note, no matter what. It didn't do a great job with like spiritual stuff. It didn't do a great job with the pediatric physical exam, and so there were a lot of things that got called out like okay, it could decrease our documentation burden. But it also has these problems, and the one that I saw in the free text and they weren't really studying in this paper, because they were studying, you know, well-being improvement and decreased burnout. But somebody noted hey, there's frequent errors in here and that's the thing that scares me the most because, like, you're sort of relying on it as your memory, especially if you're not doing the note right away, you're doing the note a day or two days or five days later. You're requiring, you're relying on it to like, help you remember what actually happened. So if there are frequent errors like that's, that's not great yes and uh that I is.

Speaker 2:

A big topic of today's podcast is the accuracy of these models. I also noted that one of the participants in that study and in that study, I think, they gave people a survey and then had them voluntarily choose to use or not use this ADT tool for like 42 days or something, choose to use or not use this ADT tool for like 42 days or something, and then they had a mid-term survey and then they had them use for another 42 days and they had an end survey and they got some interesting results from all of that.

Speaker 2:

And I thought that one comment was particularly relevant, which related to in my view, related to accuracy, which was somebody said it added an hour to two hours to their work To fix it Right, which to me, is ridiculous. That's the whole point of this is so that you don't have to do that Now you have to look at the thing and make sure that it's correct, and that, to me, is harder than actually writing it, because if you are tired and you're not look, you don't care much and you think that it mostly is correct, you're going to let it go, you're going to let it go, you're going to let it pass, and so that, to me, is more problematic, and so, yeah, but so why, like why does that even happen?

Speaker 1:

Like shouldn't it be able? I guess, if you're talking about a word for word transcription, then like, fine, it will, it could do a good job of that. But now I guess you're asking it to interpret it into a structured, for example, a soap note format, and now it's taking that information and changing it and adjusting it. I'm just curious why do these inaccuracies occur?

Speaker 2:

So I think that's a great question, and I'm very excited about this topic for this reason, because this is something I've been researching for several years now about human natural language understanding and the layers of the understanding. It's not just the words, it's often what you don't say, it's often how you say it. All of that matters, and so if you look at the accuracy here, we can start thinking about accuracy in terms of what types of errors are being made by these systems, and the types of errors are things like omissions, for example, where they leave out something that was potentially important, because the system doesn't realize that something is more relevant than something else. There could be additions, where they're adding things in I don't mean like incorrect things, but just adding random things in that may not be relevant. Then there's just incorrect facts. There's just like things that were said but was summarized incorrectly, as in somebody-.

Speaker 1:

Like translated wrong.

Speaker 2:

Translated wrong, right, right exactly so, and there's other modalities of errors as well that can come up, but that's just. That's assuming that everything else is correct. So here let me just talk for a second here about the process by which you have a voice chat turned into a soap note. Yes, please a second, that we have this AI model that is there available to you. First thing that happens in a lot of these models is they take the voice conversation, which is an audio format, and transcribe that into a text version of it.

Speaker 2:

So, just if you stop there for a second and think, you know I might say a lot of things and it would track all my words. What am I missing in that? Well, I'm missing timing information, I'm missing pauses, I'm missing what's called back channeling, which means things like uh-huh, uh-huh. Yes, you know, when you have conversations to signal that you understand something or don't understand something, I'm missing any kind of body language, like when you make a weird expression and I'm like wait, did I say something wrong? You kind of pause that, right. So you kind of pause that right. So you're missing that. You're missing prosodic information, which is the stressing of certain words versus other words.

Speaker 1:

all of this is something that a human doctor and I would argue that a human scribe is seeing and capturing.

Speaker 1:

Yeah, I think your scribe is definitely going to see that and your scribe is also going to see, just to kind of piggyback off what you said, like things that are shown instead of said right, like Like if I say where's your pain, and someone points to the right lower quadrant of their abdomen, that's what I want to know and I want that information in my note. Patient presents with right lower quadrant abdominal pain, not patient presents with pain. I asked where it was. It's here.

Speaker 2:

Like where's here? They might just be pointing and saying it's here, right, right. And so there's a whole host of this is called referring expressions, where you use things like that and this and it and so on. These are compact words that we use in conversation to refer to things in the real world without having to fully describe the thing that you're referring to. You can say that mug or this cup, but you know, and sometimes there's no words, right, because you're just pointing and it's here and here is the word that's being recorded, but it doesn't mean anything by itself because it's a referring expression. And not only is it a referring expression, but it's like a, like a pronominal kind of shortened form of expression which is inherently the word here or it or that doesn't have any meaning outside of the context that it's being used, right. So you Just the audio transcription to the text already loses potentially very critical information. Then you have the text piece, which is what they're evaluating accuracy on.

Speaker 2:

Contains what the patient said. Completely true, I agree. It contains all the words that the patient said. Does it take into account the patient's history while interpreting what the patient said? Probably not. Maybe it could be. You could submit all that information to the AI model and have it take that into account to make sense of what the words actually mean in context of that specific patient.

Speaker 2:

People use words differently, right, I might say a particular word. You know, my nine out of 10 pain might be different from your nine out of 10 pain, that sort of thing. But that's a bad example. But broadly, that's the idea that my words might mean differently for me. And so there's that. It's something again that a doctor who has interacted with this patient repeatedly can suss out, can identify. And then we talked about how the patient said it. What we also didn't talk about is all the things the patient didn't say but implied which the patient said it. What we also didn't talk about is all the things the patient didn't say but implied, which the doctor knew, because the doctor understands the patient, because the doctor is human.

Speaker 2:

So, to describe, there's a lot of things we say in human conversation that are and this is outside of like here and that, but just things we assume that you know we take into account. This is all outside of figuring out what is relevant versus irrelevant to include in the notes. So there are layers here that are just in your soap note in, just the subjective part of your soap note that needs to be recorded, that could easily be missed and is not taken into account in the accuracy measures. And actually, what I found particularly troublesome is that in all the papers that I read granted, I didn't do like an intensive survey, but just from my reading of the papers that I read there was very little focus on the actual accuracy of these models.

Speaker 2:

There was certainly concerns about burden and burden reduction and there were some. There's a host of data sets out there to train these models and if you again look at the data sets, they'll give you conversations, be it audio, whatever models. And if you again look at the data sets, they'll give you conversations, be it audio, whatever, and they'll give you soap notes to train these models. Right, so human-written soap notes.

Speaker 2:

Maybe, sometimes there'll be multiple human-written soap notes, and so we have example data points. However, how do you know that a new person producing a piece of note is correct or incorrect? Right, it's not just matching the words with somebody who has written these notes before. It's more than that. It's much more involved than that. And so you know, I think, that you're still going to miss a lot of information, and it's hugely problematic from that standpoint that even an accuracy measure has to be taken into account.

Speaker 2:

I think more research needs to be done in developing better metrics for identifying whether or not a SOAP note is correct, and when I say correct, I don't mean just accuracy, I mean, like, captures the relevant information, captures, you know, leaves out the irrelevant stuff, but and then does the right. You know, in case of a soap note, you would have an assessment over and above that. So there's a whole bunch of other things that need to happen. Right, it's like here's what the patient said, here's what the vital signs say, here's what the history is. I have an, I have an assessment.

Speaker 1:

right, that requires clinical reasoning right, and that that part is interesting to me because, like, sometimes I have an assessment and plan like ready to go, and sometimes I have to like sit for a moment and think about what that is and so, or I might say to a patient hey, we're going to run some lab tests, we're going to get you started drinking this for a CAT scan that we need to do. I'm worried about x, y or z. I'm worried that it could be your appendix. It could be, this could be the other thing, um, and so I'll give them sort of a plan. But the plan that I might document in the note is not we're going to run some lab tests, it's we're going to get a CBC, a CMP you know what I mean.

Speaker 1:

Like there's like more details to what lab tests I'm going to order and it's like yeah, I'm going to get a CAT scan. It's a CAT scan of the abdomen and pelvis, right and so, and we're going to get you some pain medication. Okay, well, I'll probably tell the patient exactly what pain medicine I'm going to give them, but you see what I'm getting. There's like definitely some nuance of what is communicated to the patient versus like what exactly you're doing in the chart, or if you need to step out of the room and be like hmm, this is an interesting case. I want to figure out what is the best next step here. The assessment might just be happening in my head. It's not necessarily always fully communicated to the patient, especially when pieces of that might be happening outside of the room or might be happening over time. Right, like, maybe we get the information back about the CAT scan and that changes the assessment and plan.

Speaker 2:

Yeah, and you know, maybe the assessment is not something that is extracted from the dialogue, maybe the assessment is separate from the AI system just doing its own reasoning and that has its own set of problems.

Speaker 1:

I'm sure you could just like go down a rabbit hole here.

Speaker 2:

And we won't go down that rabbit hole. But I do want to stress again for our listeners that this is not just a conversation. You have a chat, gpt asking it what the weather is or how far Paris is from you know Marseille or something, but like it's more than that. It's what you know NLP people call situated dialogue or AI people call situated dialogue. Where you have a dialogue that's happening, it's often goal-oriented. In this case it is, and it's situated, which means there is a host of contextual factors that factor in in order to interpret the language itself.

Speaker 2:

And I've done some research in this space, and not quite in the medical space itself, but in a different setting in which we had LLMs listen in or look at, you know transcripts of conversation and extract out mental states of people and see if I believe something that you believe right and you can. A human reading that transcript would immediately know whether or not, if you tell me that something, you know there's a chair in this room. I know that. You know there's a chair in the room, right. So you know that sort of thing and LLMs are not very good at figuring out mental states of the people talking right, and that's important in this setting.

Speaker 1:

I also think, when you're talking about mental states in particular, like I would think, okay, how do I feel in this room, like, how is this interaction making me feel? Or, like you were saying before, if I asked you a question and you paused and took a long time to answer it and your voice kind of sounded down, that's very different than if you answered really quickly and were excited about it, right, and none of that stuff gets captured.

Speaker 2:

Yes, yes, and there are. I mean, in the defense of AI folk. There are language models that take audio input defense of AI folk. There are language models that take in audio input directly. They don't transcribe. Maybe they'll do better. Maybe they're trained on human conversation and they can actually do a little bit better. We don't know that, but so but. But independent of even even the audio transcription we talked about all the issues even if you had a good transcript, how much of the text actually conveys what's happening is still an open question, and I think a lot more research needs to be done on that front. And some of the research that was done, even looking at the text, they found a host of inaccuracies. I mean the omissions and the additions. There was a substantial number. The percentages are pretty high. We'll share some papers, but they're pretty high. What that means to me is you're going to have transcript, you're going to have notes that are produced that have errors in them, multiple errors in them, and it's going to take position time to fix them.

Speaker 1:

Or if they don't get fixed. Now that's like a permanent part of the chart that gets carried forward and could impact that patient's care.

Speaker 2:

Yeah, yeah, yeah.

Speaker 1:

So I think you know there's definitely an opportunity here, right? I don't want to like poo-poo this, because I think there is an opportunity here to improve documentation burden and to lower burnout. And I know I personally hated documenting and I definitely hated documenting when I was taking it home after work. But there are significant concerns here, and now I understand a little bit more about why. But there are significant concerns about the accuracy of these notes, and so you know, I think, like you said, this should be an area of active research, but I am concerned about where it's at right now. I guess I'll leave it at that.

Speaker 2:

Yeah, on that note, thank you for joining us.

Speaker 1:

We'll see you next time on Code Cure.

People on this episode