#20 - Google Translate Walked Into An ER And Got A Reality Check Artwork

Code & Cure

Decoding health in the age of AI

Hosted by an AI researcher and a medical doctor, this podcast unpacks how artificial intelligence and emerging technologies are transforming how we understand, measure, and care for our bodies and minds.

Each episode unpacks a real-world topic to ask not just what’s new, but what’s true—and what’s at stake as healthcare becomes increasingly data-driven.

If you're curious about how health tech really works—and what it means for your body, your choices, and your future—this podcast is for you.

We’re here to explore ideas—not to diagnose or treat. This podcast doesn’t provide medical advice.

All Episodes

Code & Cure

#20 - Google Translate Walked Into An ER And Got A Reality Check

November 27, 2025 • Vasanth Sarathy & Laura Hagopian

0:00 | 31:42

What if your discharge instructions were written in a language you couldn’t read? For millions of patients, that’s not a hypothetical, but a safety risk. And at 2 a.m. in a busy hospital, translation isn’t just a convenience; it’s clinical care.

In this episode, we explore how AI can bridge the language gap in discharge instructions: what it does well, where it stumbles, and how to build workflows that support clinicians without slowing them down. We unpack what these instructions really include: condition education, medication details, warning signs, and follow-up steps, all of which need to be clear, accurate, and culturally appropriate.

We trace the evolution of translation tools, from early rule-based systems to today’s large language models (LLMs), unpacking the transformer breakthrough that made flexible, context-aware translation possible. While small, domain-specific models offer speed and predictability, LLMs excel at simplifying jargon and adjusting tone. But they bring risks like hallucinations and slower response times.

A recent study adds a real-world perspective by comparing human and AI translations across Spanish, Chinese, Somali, and Vietnamese. The takeaway? Quality tracks with data availability: strongest for high-resource languages like Spanish, and weaker where training data is sparse. We also explore critical nuances that AI may miss: cultural context, politeness norms, and the role of family in decision-making.

So what’s working now? A hybrid approach. Think pre-approved multilingual instruction libraries, AI models tuned for clinical language, and human oversight to ensure clarity, completeness, and cultural fit. For rare languages or off-hours, AI support with clear thresholds for interpreter review can extend access while maintaining safety.

If this topic hits home, follow the show, share with a colleague, and leave a review with your biggest question about AI and clinical communication. Your insights help shape safer, smarter care for everyone.

Reference:

Accuracy of Artificial Intelligence vs Professionally Translated Discharge Instructions
Melissa Martos, et al.
JAMA Network Open (2025)

Credits:

Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0
https://creativecommons.org/licenses/by/4.0/

The Stakes Of Translation In Care

SPEAKER_02 0:00

Your father is being discharged after surgery. He doesn't speak any English. Would you trust an AI system to translate his care instructions?

Meet The Hosts And Today’s Focus

SPEAKER_00 0:18

Hello and welcome to Coding Cure, where we decode health in the age of AI. My name is Vasant Sarathi, and I'm an AI researcher and cognitive scientist. And I'm here with Laura Hagopian.

What Discharge Instructions Really Include

SPEAKER_02 0:28

I'm an emergency medicine physician and I worked in digital health. Awesome. So we're going to talk about machine translation today. Yeah, and specifically of discharge instructions, which is something I used all the time when I worked in the emergency department.

SPEAKER_00 0:42

Yeah. So I I mean, uh not being in the medical field myself, what are discharge instructions and where does translation, where do translations fit in?

Interpreters, Gaps, And Night-Shift Realities

SPEAKER_02 0:51

Yeah, sure. I mean, uh maybe it's easiest to run through an example. Like say you come to the emergency department with an asthma flare, right? And now we've treated your asthma flare, you're doing better, and it's time for you to go home. Now we have to say, hey, here are here's what, here's what's next. And um give basic information about an asthma flare, but also some personalized information, like, hey, here's this new medication you're gonna be taking and you're gonna be on it for five days. And here's what you should do if um your symptoms get worse. And here's when you need to see your primary care provider again. And so there's some sort of generic stuff, but then there's some stuff that's very specific to like your use case. And if someone doesn't speak English, what you do is you either have translated instructions, which we had in some languages. A printed copy. Like a printed copy that you're so you hand it to someone and you go over it with them too, right? Like you're gonna give them the instructions verbally, you're gonna give them a packet so that, you know, it's like I can tell you it's impossible to remember all the stuff someone someone says to you. So you want to be able to like go back and review it. Um, but you can only do that if it's in like the language that you understand. Oh, yes, right. And so it's it's great when those discharge instructions are are translated. And sometimes that generic information, like say general information about asthma, that might be available in Spanish and other languages, right? But then the specific information about your medication change and your follow-up and et cetera, that doesn't necessarily get translated. And so you often get an interpreter either in person or over the phone or over video to go over those instructions with the patient. And then you like hope that they remember them because what you've given them is in English, right there. Like that's that not all of it ends up getting translated.

SPEAKER_00 2:35

And they're not necessarily annotating it as you're talking, you know, uh giving them this personalized stuff, right?

SPEAKER_02 2:41

Yeah, they're trying to like absorb it and and take it all in. And um, it's this is why the written discharge instructions can be so useful, but of course, like very difficult if they're not in a language that you can review them in.

SPEAKER_01 2:54

Yeah.

SPEAKER_02 2:54

And and by the way, like I worked in inner city emergency departments. I I I used interpreters all the time.

SPEAKER_00 3:01

Right.

SPEAKER_02 3:02

Um, there were even times, like occasionally, where especially on the night shift, if you had a rare language, you couldn't find you couldn't find an interpreter even over the phone. But most of the most of the time we had them um either in person, and if not in person, you could get a video or a phone interpreter.

SPEAKER_00 3:17

Okay. Okay. So okay, so that's the world of discharge instructions. And the ideal situation there is to get something very personalized in a language that the patient can understand and follow. And um, and that's challenging because of course you need interpreters and all that stuff. And so, and also it needs to be accurate, right? The translation needs to be actually clinically accurate.

SPEAKER_02 3:38

Mm-hmm. And like I said before, when you have generic information, you can have that sort of pre-translated, right? Yeah. But if it's 2 a.m. and you're discharging someone who speaks Khmai, and you're trying to give them very specific instructions about their follow-up or their medications, that's really hard. That's that's really hard to get interpreted into text in the middle of the night, just as.

Why Accuracy Matters In Medical Translation

SPEAKER_00 4:04

Right. Yeah. Right, right, right, right. So that's where sort of AI slash machine translation techniques come into play, right? And um and the idea the hope is that these machines can do what you know you're asking it to do in this sense.

SPEAKER_02 4:18

Um that it's like time, you know, this this even more timely access at your fingertips, right? But you have to make sure it's accurate. That's that's the thing. It's like if it's not accurate, then you're giving people the wrong information. Right, right, right.

SPEAKER_00 4:30

And you're not in a position to know that it's accurate because you can't do the you can't tra I can't vet the translation, not in every language.

SPEAKER_02 4:37

I mean, in this paper that we specifically looked at, they looked at um simplified Chinese, Somali, Spanish, and Vietnamese. And it's like, well, I I'm I might be okay in Spanish, but not in the in those other languages.

SPEAKER_00 4:49

Yeah, yeah, yeah, yeah. And then that's the whole point, right? If you were okay with those other languages, then you could just communicate directly in those other languages to the patient. But the problem is you there's a lot more languages than you know.

SPEAKER_02 4:58

And ideally, honestly, you're getting an interpreter though. Yeah. Like ideally, you're getting an interpreter who has like the medical translation background to be able to do this stuff.

Neural MT Vs LLMs: What’s Different

SPEAKER_00 5:08

Yeah, yeah, yeah. Yeah. And and and and honestly, translation has been machine translation has been a task um in the um technology space for a while now, and people have tried many different techniques to get it to be really good. And so that's something that's ongoing research, but it's also been around for decades now. People have been doing this stuff.

SPEAKER_02 5:26

So this was interesting to me because what what they did in this paper was they said, hey, we're gonna compare like a professional translation to an AI translation. And when I started to look under the cover, I was like, under the hood of this, I was like, oh, what kind of AI translation are they using? And they said, Oh, well, we were thinking about two different versions. One is large language models, LLMs, which we talk about all the time, right? And the second was neural machine translation. And they ended up choosing that second option, neural machine translation. And I wanted to ask you a little bit about that. Why is that different than a large language model? And like why would one be chosen over another? Like, what's the difference between the two?

SPEAKER_00 6:10

Okay. So, in order to answer that question, I'm gonna do a little bit of a history lesson slash like step back towards machine translations in general, and then kind of work my way back towards this. So bear with me as I Okay, I will bear with you.

SPEAKER_02 6:22

Yes, because I I want to understand this. Yeah.

SPEAKER_00 6:24

Yeah, yeah, yeah. So let's talk about the problem of translations in general, right? There's a lot of issues. It's not just that give a sentence has that has five words, give bring me a cup of water or something like that. And um, the translation is exactly word for word, a different word in the different language, right? Because different languages have different sentence structures. The way something is said is different. Maybe they have all kinds of politeness norms that are in place. So you can't just say bring me caught me. You might have to say please in those languages in a particular way. Maybe you're addressing uh there might be gender terms uh in the language. Uh, maybe you're addressing an elder and then it's different. Then you say it completely differently from how you would say it otherwise. So there's all of these little nuances. So it used to be the case decades ago that people had rule-based systems to do the translations. And of course, those rule-based systems uh didn't work because there were too many exceptions, there were too many rules, and they were too brittle in the sense that you could not get this free-flowing natur translation of the whole sentence in context, how it what it means for that particular subculture accurately. So that became really challenging. That's when machines, uh, machine learning came into play, where people were like, okay, instead of humans figuring out all the rules that are involved in translating a particular language, let's look at um um let's give it data so that we can have it just figure out all the rules for itself, right? But of course, if it's going to figure out all the rules for itself, it's going to need data to do it, you know, to do it um to understand what the patterns are in the translations, right?

SPEAKER_02 7:57

So you're not like saying, oh, this is the pattern. Yeah, you're not saying you're not telling it what the pattern is. Yeah. You're actually letting it kind of figure out those patterns for itself, essentially. I mean, I don't I don't know if that's fully reasoning, but like it's able to recognize it by repetition, essentially.

How Machine Translation Evolved

SPEAKER_00 8:13

That's it. And and because there's so many patterns, so many rules, it's just important. And they vary by language, right? They vary by language, and like I said, they have all kinds of nuances that are not captured. So assuming your data set has all those nuances captured, right? Assuming the data set that you have, which is when I say data set, I mean has a set of English, you know, for example, English sentences and the corresponding same meaning, you know, Spanish sentences or whatever, right?

SPEAKER_02 8:38

You're giving it correct translations. So you're training it not on like all the data in the world, you're training it on like here is the correct translation from English to Spanish or English to Somali or whatever.

SPEAKER_00 8:47

Yeah, yeah, yeah. And remember, again, we're in the this is I'm I'm in back in history a little bit because I'm talking about machine translations as a field. Okay. Right. And so this is the problem they were trying to solve, and they fed figured that machine learning systems are going to do better at it. And there's many different ways to do machine learning. There is uh linear type of models, there is all kinds of other models that we've talked about in this podcast before. Um, and neural networks is another type of model that also does machine learning. Now, neural networks are really good at figuring out really complex patterns. If these patterns between um the translation that's sort of inherent in this translation, if it's really complex and non-linear in many ways, then you have um neural networks that are actually really good at being able to handle this. And uh during the sort of the boom of neural nets and you know, resurgence, I would say, in the 2000 to 2010 range, uh, and that happened again because um neural networks were around forever, but um there were now graphics and hardware that were um there now that that made it that sort of you know, Nvidia graphics cards and GPUs and those things made it so that you could now use neural networks again because they were very computationally heavy. Okay. So the hardware kind of caught up, and now people brought neural networks back, and all of a sudden you could do all these different tasks, uh, which are really difficult, and you could use the powerful neural networks behind the scenes and to do neural machine translation. Well, to do neural machine translation, but to do various other tasks image classification. Is there a cat or a dog in this picture? Yeah, yeah. Like all these other tasks came up. Now, when you think about a neural network, um, you're giving it some input, right? And in an image, it's always the same size. You're always giving it an image that is you know, how many of a pixels by how many of a pixels? And that means there is you know so many million pixels, but that number is fixed. And you're asking it to give you an answer that is also fixed. Is it a cat, is it a dog, maybe like 12 other things that it could be, right? And in those set settings, when you build a neural network out, it's very fixed. And so in some sense, you can once you've got those parameters set, you can just give it data, you know, pictures with the correct answers, and then train it. Right. And that's straightforward. Language is a little different. Rang language is a form of sequential data where you're giving it words one after the other. It's sort of time series when you speak to each other. We're not just saying, you know, all the 12 sentences that I'm about to say in one go to you, not necessarily happening quite that way. But maybe you are processing that way a little bit.

SPEAKER_02 11:12

Right. But like what I say next may depend on what I say said before or what you said before in the conversation. Like there is like there's like big contextual elements of this. Yeah.

SPEAKER_00 11:22

And so, and also you don't know how long something is going to be that's a meaningful chunk that needs to be analyzed.

SPEAKER_02 11:28

I mean, I could talk forever, right?

Transformers And The Road To LLMs

SPEAKER_00 11:30

There you go. So it's unlike an image in the sense that it's not a fixed length, it's uh it's sort of ongoing, and uh there's sort of a time series time relationship between them. In an image, there's no time relationship between the pixels. There's like locality, like pixels close to this are maybe similar to other pixels close to this other piece of the uh, you know, image, but that's not that's not that issue is not there. So you have this sort of time series notion. And so people are trying to solve the sequential and in machine translation, you have that, right? You send in words and you want to get out sequence of words, no matter how long the sentence is, right?

SPEAKER_02 12:02

Yeah, and you want those what you say at the end has the knowledge of what was said at the beginning.

SPEAKER_00 12:08

Right, exactly. So people started developing new architectures, new ways of organizing the neural networks to enable that. And some of those involved keeping track of memory a little bit. And one of those um techniques was called recurrent neural networks, and it was a very powerful technique in which you send it one word at a time and it sort of remembered what was said before, and then you'd output, and then you can, you know, it would output one word at a time. Um and it was able to do uh, you know, quite a bit, and it was actually quite quite amazing at the time. Um, the issues that it had was that it would, um, if you give it a really long sequence, it would forget like the thing that you said first. And as a result, the translations were not very good. Um at least not by the end. Right. And so then people came up and said, wait a minute, we need another architecture, which just means another way of arranging the neural network um to produce better results. And some of the issues that the neural uh recurrent neural networks had was remembering uh the context of the whole sentence that was coming in. Um and so they came up with these various other techniques. I I'm gonna use jargony words right now without expanding them only because it doesn't matter as much. But just makes you sound smart. It makes me sound smart. Yeah, buy LSTMs and LSTMs and all of this stuff and encoder decoder networks and such.

SPEAKER_02 13:16

Yep, you do. You sound smart. There you go. Thank you.

Tradeoffs: Speed, Latency, And Hallucinations

SPEAKER_00 13:18

I appreciate that. But um, but all of these guys, all of these uh different architectures had the same idea, which was you send in a bunch of words, and then now what you would do is you would capture the meaning of those words in um some kind of numerical representation inside the machine, and then you would use that whole captured meaning to then produce the output. So you notice how it's different from a recurrent neural network where you're sending in one word at a time and one word's coming out to now like getting a little bit more, capturing the meaning of that, and then outputting what you need. So it's a little bit more, uh, it takes it, it remembers more. It does a better job of remembering things uh across longer, you know, inputs.

SPEAKER_02 13:58

Which you, I mean, if the sentence structure is varied from language to language, it makes more a lot of sense that that would like I'm just thinking about an example in Spanish where you say, like, I like coffee, and then when you say it in Spanish, you're saying me gusta, which is like, this is a pleas, like coffee is pleasing to me or something. Like, do you know what I mean? That the sentence structure is different, the way that you say the words is different. It's not exactly parallel. You can't, you can't do I like the same way in Spanish and in English. Yeah.

SPEAKER_00 14:25

Yes. And one, I think one quick sidebar here is that all neural networks, they compress information. That is the core idea of intelligence here, which is that you get all of this data that it's learned about, that it's trained on, and it's all compressed. And when I say compressed, I mean think of literally a file on your machine, on your computer. Now imagine you had um a bunch of data, and the data was, I don't know, 10 megabytes or something, right? Huge amount, I mean some amount of data. The learned machine learning model, which is also a file in your computer, would be much less, would be like one megabyte.

SPEAKER_02 15:02

So can I can I make a comparison here? Yeah. I know we talk about large language models all the time, but this this version to me, it sounds like a small language, small language model, right? Like you're a language, but you're right. Yeah, like we're trading on like small amounts of data, and the data that's going into it is compressed, and it's only about translation. Yes. It's not about all the other things that you could find in the world. Yeah. So it's like, yes, that's a good comparison. It's a instead of an LLM, it's an SLM.

SPEAKER_00 15:29

Which is a separate thing, by the way. You're entering a uh you're opening up Pandora's box of a different set of problems. You're welcome. But so so for all the people who know what small language models are, that's actually a thing. That's actually a thing. So this is not what we're talking about. But that said, that's okay. That's the concept is there. There's a c there's a notion that you have a more tightly focused um neural model that that is doing a thing. Now I wouldn't call it a language model, it's just a small model, right? Just a small focus model.

Study Findings Across Four Languages

SPEAKER_02 15:54

Small model that is focused on one goal. It's not like you go in and ask it anything, you're asking it to translate. And it could be to multiple languages, but like that's what that's what its goal is, and it's not doing other stuff.

Culture, Context, And Low-Resource Languages

SPEAKER_00 16:06

That's right. And uh the what's really cool is that you uh so people worked on this problem and said, okay, let me try to, you know, do this compression thing. So so all neural networks do compression, right? So that's the basic idea is you compress the data so that you get those rules of translation in a compact representation that nobody can understand because it's all just numbers in a machine. And so then you're um compressing all of that information and you're not, you know, kind of you're throwing out irrelevant details, but then using that compressed knowledge to then drive what uh the output is going to be. So compression is a really important part of any machine learning model because that's what that's the value in it, right? And so um now we're in a world where we're able to do that, but there were still issues with this new architecture because it was not able to capture all the context and it was doing a poor job of um um a number of other issues that it had, which was specifically related to this notion called attention, which was this idea that words in a sentence are linked in a uh the meaning of a particular word in a sentence is connected to all the other words in a sentence in a particular way. And depending on how you use the word, it would mean different things in different contexts. And that was really hard for these systems to do correctly. So um in 2017, a landmark paper was released. Um, it's called Attention Is All You Need. Uh, this was by uh folks at Google, and it was the first paper introducing the transformer architecture. And the transformer architecture is what all LLMs right now are based on.

SPEAKER_02 17:44

Um this came the everything came from like trying to translate.

Regulations And Human-In-The-Loop QA

SPEAKER_00 17:48

They were trying to translate, yeah. So that's really cool. Interesting, right? And so 2017, they came out with this paper and they said, look, you know what? Instead of sending these words one at a time, and instead of worrying about hoping that those sentences that you send one at a time is compressed correctly, let's just send everything in parallel and have the system just figure out what's important, like which word relates to which word, and just give it a bunch of data and have it figure it out. And they changed the architecture quite a bit, and it obviously did really well. So the transformers did really well for machine, neural machine translation purposes, and uh, which is wonderful, which is you know a good, you know, was a very big step in neural machine translation.

SPEAKER_02 18:28

And that's what they use in this paper, right?

Medical Jargon, Fine-Tuning, And Next Steps

SPEAKER_00 18:29

They chose to use that over the large language models. Yes, and it's a language model in the sense that you're uh you're sending in language and you're identifying a pattern of use of that language. And when I say language, I just mean like words, right? Right. So you're sending in words in whatever language, and you're finding out the inherent pattern in the data set of how those words are used. Now, if your data set had other languages and stuff like that, you're gonna get all of those patterns as well. And um, the remarkable piece of that whole experience of doing the machine translation and going through the transformer architecture and everything else was the dawn of LLMs and modern AI. And that only happened because somebody thought to say, hey, what if we just gave it all of the internet? What would happen then? And if you throw enough money at this problem and enough resources and all of the internet, you get large language models, and all of a sudden, they not only can they do machine translation, they can do all kinds of other things. Because it turns out our language use demonstrates not just how we understand and talk to each other, but also potentially some reasoning mechanisms that are in our brain are also evident in the language. So that's why LLMs all of a sudden are more generalized, they're not just doing translations, they're doing other things.

SPEAKER_02 19:43

Oh, so interesting. And so are L like I understand that these neural machine translation models, they're trained on multiple languages. Are LLMs like trained on lots of languages, or are they mostly trained in English? I'm just curious.

Key Takeaways And Closing

SPEAKER_00 19:56

So um, again, it's all of the internet, so all of the internet contains all of the languages that are on the internet. Uh, there's a disproportionately high amount of English usage on the internet. So obviously, LLMs uh best understand English. Um, there are languages like Spanish and German that also have good representations, Hindi, Mandarin, and so on. So those are also captured. Um, but there are lots of languages that you know scientists call low resource languages that don't have as much representation, and LLMs will do a poorer job at those. Now, I do want to say one thing, which is uh, and this paper talks about it, and people meant use this term all the time, neural machine translation. And there's a sort of sense that there's a difference between neural machine translation and LLMs, and there is, and like we talked about before, uh Google Translate, right? Let's take take the example of Google Translate. Versus like Gemini, they're not even verses, just let's look at Google Translate by itself. Um, we go on Google.com and we type in translate, and it pops up this little box on the top that says Google Translate, gives you the options of which language to select, and you type in an input and you hit a button and it gives you an output, right? Now ask yourself this question Can you uh ask a Google Translate to say, hey, make this more informal? No, you can't. Can you say, hey, here's a whole body of text? Can you give me an accurate translation of everything in the text? I mean, you you don't know the answer to that, but it probably won't be able to do that accurately because it doesn't capture all of the context. Um, can you say, hey, I like what you said there, but can we also add um something about the you know, some other detail? Like if you told that to Google Translate, would it be able to do that?

SPEAKER_02 21:34

No, but but hold on. On the other side, on the other hand, like LLMs can hallucinate, and I don't think I've ever seen Google Translate do anything weird, like yeah, yeah.

SPEAKER_00 21:45

That's it, and so that's the distinction, right? So you have to think about the use cases. If your input to your system is j if the if the job of the system is to do one thing, which is translate, right, you can't all of a sudden ask it all kinds of other questions, right? So you can that means what it's expecting as input is some sentence, and what it's expected to produce is another sentence in a different language that captures the same meaning.

SPEAKER_02 22:12

But if you want it to do more, if you want it to you want to be able to tweak it to make it more conversational, or if you want to improve the readability or something, yeah, exactly. Now and say, Oh, I need to make sure that this is simplified. I can't have the discharge instructions be complex. Yes. Now you're talking about going into LOM territory.

SPEAKER_00 22:30

Yes. But you could, in theory, say, okay, I'll train a separate neural machine translation model to only produce fifth grade level readable text. But then I'd have to give it data, input data, what the input might look like, and then output data that is all fifth grade level readable text, and then I train it and then it's going to do that thing. But that's a whole training cycle that you have to go through. And um, but but LLMs, you don't have to do that because because of the fact that they've absorbed the internet and they understand to some degree um some aspects of human reasoning, they can and and because they have been trained to follow your instructions, which is a separate thing, by the way, they can understand you when you say, Hey, make this fifth grade reading level instead, right?

SPEAKER_02 23:13

Okay, yeah, that makes sense. So in the paper we we read, they ended up choosing neural machine translation for a number of reasons. But one thing was that they were like concerned about what's the accuracy or could this hallucinate? Because when you're translating in real time and then you're giving that to someone as discharge instructions and you don't have someone to QA it, the accuracy ends up being really important. I mean, they looked at other things too, right?

SPEAKER_01 23:38

Yeah.

SPEAKER_02 23:38

They looked at like fluency for clarity and flow and vocabulary and grammar. They looked at the meaning, like, does it actually say the same thing as the original text was saying? Um, they looked at how how accurate or complete or appropriate was. And if there were errors, they looked at like how severe is this error and how could it impact someone clinically? Um, but I think there's like a very big difference between relying on an AI translation in real time and not having someone Q be able to QA it versus having someone be able to QA a translation that comes out of one of these systems.

SPEAKER_00 24:17

Yeah, and that's fair. But I have to say though, the state of the art machine translation still uses transformer models. Um, that's what everyone uses. And actually, many of the LLMs that are out there right now are quite competitive with the state-of-the-art neural machine translation techniques. So even on pure translation tasks where you only input is, you know, when when they when they put head-to-head against the best machine neural machine translation systems, LLMs in terms of accuracy are still gonna be quite quite competitive. Now, LLMs use a lot more compute power, right? Because they're doing, they're they're meant to do a whole host of things. Right. They don't have to do just neural machine translation. So they're bigger, they're potentially slower, because again, the other ones are smaller models like we talked about before. Right. So they're gonna be faster, they're gonna have low latency, which means they're gonna be able to produce your answers much quicker, which in real time, if you're thinking about a system that as you're talking translates for you, kind of futuristic in some sense, but there's some there are some um products out there that I've seen that already do that. Right. My guess is they don't use LLMs because they don't need to. They that's not what they're supposed to do.

SPEAKER_02 25:20

That's not the goal. And you and the goal needs to be like, oh, let's do this as fast as we can. Yes. Um, in as accurate a way as possible. Right. Okay.

SPEAKER_00 25:28

So you have to really think about what the goal is of the task. If your task goal requires you to have some kind of um nuance in the way it's written, or there's some styling issues, and there is potentially context involved, potentially rewrites and such, LLMs are going to be what you need. But if your goal is very quick translation of incoming text accurately, then you want, you don't, you don't need an LLM. It's potentially problematic to use an LLM.

SPEAKER_02 25:53

Right. This is so interesting. And I I do want to like go through the findings of the paper because we've been we've been talking a lot about like how do machines translate, but the the paper actually looked at, you know, this uh my poorly termed smaller models, like neural machine translation models versus um professionally translated um items and for the discharge summaries. And the AI translations were mostly inferior. They were like okay for Spanish, um, but did not do well in simplified Chinese, Somali, and Vietnamese. But also at the same time, they did not train on that many examples in the other languages. They trained on way more examples in Spanish. And so I think it's very interesting, having heard what you've said, is like the the more data that you train it on, the more information that it has about this in English is this in Somali, the better job it's going to do. And so it had the most information to be trained on in Spanish and it had the least information in Somali.

SPEAKER_00 26:55

Yes. And if you think about an LLM, an LLM could provide that extra context, right? Potentially culturally relevant information that may be useful to support uh translations in low resource languages. So if you think about your Somali interpreter, human interpreter, they grew up in a culture so they're familiar with all the cultural norms, all the politeness norms, all the speaking norms. They are pro they're comfortable with the different subcultures that might exist. So it's not just enough to be Somali, but maybe you need to be a specific tribe or a specific group of people within that. They know that right away.

SPEAKER_01 27:25

Yeah.

SPEAKER_00 27:25

They know how translations work differently in those settings. Your pure machine translation might not have that data. LLMs also might not have the data, but they might have a little bit more. They might know that there is a country called Somalia, that there is these regions, that there are maybe some politeness stuff into it. So as long as you are um, you know, you could, in theory, prompt the LLMs by telling them, look at these other factors when providing your translation. And so then you're providing potentially more nuanced translations that are closer. So there's value in the LLM, is what I'm trying to say, even in those cases.

SPEAKER_02 27:57

Yeah, it was very interesting because you're bringing back memories in the ER where I would actually use an interpreter not just for, hey, can you can you can you say this to the patient? But also I would talk to them about what were the cultural norms or like how what's the best way to tell this diagnosis to someone? Should it be, you know, should all the family be in the room versus not? Like I, you know, when you don't know the cultural norms in an area, it it can definitely affect what your relationship with a patient or family might be and getting their expertise on that, it's something that, you know, it's I I had to ask those questions essentially.

SPEAKER_00 28:32

Right, right, right, right. Exactly. Um, but yeah, I think that's kind of where the LLM still has value. Uh, but yes, hallucination is a huge problem. And what if it's just making up these norms, right? You have no way to know one way or the other, and having a human in the loop to confirm that it's in fact makes sense, you know, the human's job can be pretty straightforward, right? They can look at the translation and immediately know because they will know it immediately whether it actually is like weird or if it's like normal.

SPEAKER_02 28:58

And and honestly, like there's like in the Affordable Care Act, there's some um information about this that like essentially when accuracy is essential, you can't have automated translation of medical documents unless you have some sort of QA on it, right? If it's like a critical piece of text that has really important clinical information, you can't just have automated translation without a human review. So it's sort of like baked into some of this. And I think this concept of having a human in the loop makes a lot of sense. It's just how do you implement that at 2 a.m. when you're not able to find someone to do that? I don't know. I don't have I don't have an answer to that question, but I I think this does open up the door to really be able to communicate with patients better and provide them not only verbal but written information about what's next and what they they may need to do in order to prevent them from bouncing back to the ER or needing a hospitalization or whatnot. Right.

SPEAKER_00 30:00

And you know, I I we didn't talk about this, but there's another aspect of context, which is the medical context and jargon and specific medical understanding. I mean, again, if you're doing a neural machine translation model, you're gonna have to uh factor that in. And so it has to be catered to medical discharge, I mean in this instance, discharge salaries, uh how they're written, and there's a style in which they're written, there's a there's a type of words they use. The system has to try to understand all of that. The LLMs might come in with some of that knowledge, right? They already have a lot of uh quote unquote, I mean, medical knowledge in the sense that they might understand some of the jargon. And so they might provide a translation that already takes that into account. Um, but having the human in the loop again, again, like I said, is going to be critical, um, you know, in a sense, both from the cultural standpoint, but also from the medical standpoint.

SPEAKER_02 30:46

Yeah, yeah. Well, I think we can kind of wrap up with this. I is this is very this was a very interesting topic to me because um I think there's a lot of promise here, and there's lots of different AI models that can be used to do this, and they have different use cases. Um, what you know, I think is like eye-opening. We talked about Google Translate versus an LLM. And there's definitely benefits to all of them, but also the importance of like being able to prompt it appropriately, being able to fine-tune it, et cetera. So I I know this paper didn't find um, you know, great results in terms of machine translation, but I think there's a lot of promise here if we keep pushing the envelope on it. Yeah. And there's a lot for patients to benefit from too.

SPEAKER_00 31:28

Agreed.

SPEAKER_02 31:29

All right. Well, we will see you next time on Code and Cure. Thank you for joining us.

Laura Hagopian

Host

Vasanth Sarathy

Host