Code & Cure

#11 - The Smile Test: How AI Detects Parkinson's Disease

Vasanth Sarathy & Laura Hagopian

Can a smile reveal the early signs of Parkinson’s disease?

New research suggests it can—and AI is making that detection possible. Scientists are training machine learning systems to spot subtle facial changes associated with Parkinson’s, particularly in how we smile. These early signs, often missed by the human eye, could hold the key to faster, more accessible diagnosis.

Parkinson’s typically presents with tremors, muscle rigidity, and slowed movement. But it also affects facial muscles, leading to “hypomimia”—a loss of expressiveness where smiles become slower, less intense, and less spontaneous. Using the Facial Action Coding System, researchers broke down these expressions into measurable muscle movements like the “lip corner puller” and “dimpler,” allowing AI to analyze them with clinical precision.

Interestingly, models trained specifically on smile-related features outperformed those using broader facial data, showing that a targeted approach may yield better diagnostic results. This innovation blends expert medical knowledge with AI—not as a mysterious black box, but as a transparent and focused tool for real-world screening.

While promising, the technology isn’t without challenges. False positives and issues with lighting, camera quality, and cultural differences in facial expressions highlight the need for more testing before widespread use. Still, in clinical settings, especially where neurologists are scarce, this tool could offer meaningful support.

Tune in to explore how artificial intelligence is helping decode the smallest of human expressions—and what that might mean for the future of neurological care.


References:

AI‑Enabled Parkinson’s Disease Screening Using Smile Videos
T. Adnan, et al.
NEJM AI, 2025 

Automated video-based assessment of facial bradykinesia in de-novo Parkinson’s disease
Michal Novotny et al.
npj, Nature Digital Medicine, 2022

Detection of hypomimia in patients with Parkinson’s disease via smile videos
G. Su, et al.
Annals of Translational Medicine, 2021

Analysis of facial expressions in parkinson's disease through video-based automatic methods
Andrea Bandini et al
Journal of Neuroscience Methods, 2017


Credits: 

Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0
https://creativecommons.org/licenses/by/4.0/


Speaker 1:

Hey Vasant, how's my smile looking today? Hmm, let me see.

Speaker 2:

I can definitely see an inner brow razor or even a little bit of a lip part and a dimpler, but I'm not so sure about the lip corner puller? Wait what? Oh, sorry, that's just the AI talking. Your smile looks great. Oh, thank you. Hello and welcome to Code and Cure. My name is Vasant Sarathy and I'm an AI researcher and I'm with.

Speaker 1:

Laura Hagopian. I'm an emergency medicine physician, hey good morning Laura, How's it going? I'm doing well. I'm smiling this morning, me too.

Speaker 2:

Big smiles on my face too, as you can tell probably from this audio podcast.

Speaker 1:

Today's topic is actually about facial expressions and Parkinson's disease, so this smiling didn't come out of nowhere, although I guess we could have done surprise or disgust or something else instead. Yeah.

Speaker 2:

I have to say I am super psyched about this topic. I mean, you know, I spent a little time this week reading about it and I dug a little deep into it not too deep, but you know enough to get a better sense of what is, you know, this topic all about and I'm pretty excited. I have to say there's a lot of cool stuff that's happening at the intersection of AI and using tools like that for helping Parkinson's.

Speaker 1:

Helping identify Parkinson's. Right right Helping screen for Parkinson's really Like who could have it. But before we dive into that, do you know what Parkinson's is?

Speaker 2:

No, please. I mean sort of yes, I know that it's a neurodegenerative disease, I know that, but please, yeah, tell me more.

Speaker 1:

Yeah, basically what happens is you lose the dopaminergic neurons in the brain, and so there are a lot of features of Parkinson's, but there are three main motor features. One is a tremor, and that happens at rest, okay. So it's like not when you're doing activities, it's when you are not moving that you have this tremor, and people often describe it as like a pill rolling tremor. So it looks like you're almost like rolling a pill in between your fingers. The second major motor feature is bradykinesia, which means that you're kind of like slow to move, and when you do move, it's like lower amplitude than you might expect someone to be, and this leads to like loss of dexterity.

Speaker 1:

If you're looking at someone walking, they might shuffle. If you ask someone to do a task, like tapping their finger over and over again, it's like they're not going to do it as high and they're not going to do it as high and they're definitely going to do it slower. And so that's the second major motor symptom. And then the third one is rigidity. It's like harder to move, they're like stiff. If you go and try to move their arm or their leg, it will kind of like feel stiff In their posture. You might see them stooping over. If you try to move their wrist, it's almost like having a cogwheel, it's like a ratchet almost. And so those are the three main motor features. There's a fourth one that gets thrown in, sometimes around postural instability, which can lead to falls, and that happens later on.

Speaker 2:

But resting, resting tremor, bradykinesia and rigidity so I guess my question then is you didn't say anything about facial expressions. So what is? Or maybe that is part of it, because there is muscles in the face too yeah, exactly, and so, but what does smiling have to do with any of this?

Speaker 1:

well, that's the thing is like. If you take some of these symptoms, like the bradykinesia and the rigidity, and you say, okay, well, how does that happen in the face? That's manifested as something called hypomyemia, so that's like a masked facial expression. So if you have someone smile, for example, it's less spontaneous, it's lower in amplitude, so it's not like such a big smile and it's more rigid right, it's like stiffer.

Speaker 2:

Is this true, even if they are actively told hey, smile as big a smile as you can, and even then they won't be able to do it they'll still do it, but it will again be that lower amplitude.

Speaker 1:

But the lack of spontaneity and smiling is a key piece too. So it's harder to make someone smile, and that is partially because of the motor features that we just discussed. But there's actually some emotional parts of Parkinson's that come into play here as well.

Speaker 2:

So what you basically see and this isn't necessarily limited to smiles.

Speaker 1:

See, and this isn't necessarily limited to smiles is diminished facial expressivity. So the emotions in your face are less intense, they don't develop as quickly, right, and they're often less symmetrical. And if you look at, like the muscles in the face, like the zygomaticus major muscle which is responsible for smiling, they've actually tested and seen. Okay, it's like a delayed onset of this smiling response when there are emotional stimuli in place. So even when you are attempting to mimic a smile, if spontaneous smile is one thing like I tell you a joke it's harder for you to smile If you have Parkinson's. It's more rigid, it's lower amplitude, etc. But even if I say, oh, look into this camera and smile while I take a video of you, it's going to be slower, it's going to be lower, less movement, lower amplitude and less expressive.

Speaker 2:

That's super interesting, that the smile has a lot of you know. That tells you a lot about whether or not this person has Parkinson's.

Speaker 1:

Yeah, and what's interesting to me and this is what they kind of started to do in this paper is like oh well, if I could recognize that as a clinician, could AI recognize this? Like is this a application for computer vision and machine learning research, you know, and in this specific paper they were talking about, hey, it's hard to in places that are resource poor, it's hard to actually have enough neurologists to make this diagnosis right of Parkinson's disease. So if we could screen people with something as simple as like smile into the camera and take a video of that it's not a photo, right, it's a video because you want to see how slow it is, for example, and what the amplitude of the smile is, et cetera. But if you could take a video of that, could AI sort of figure it out.

Speaker 2:

Right, right, right, and that's what these papers are about. Right, but you know what? What I loved about these papers was the fact that they talked about do using AI, but they followed sort of clinicians guidelines to get there.

Speaker 1:

Right, it's not like a black box, right? They said oh, here are, as you, as you said in our intro, like here are the features that we look for in someone's facial expression.

Speaker 2:

Right, right, right, right, exactly, and I believe there's I'm trying to remember the name of the group, but there's a group that sets these different guidelines to help clinicians work through looking at the person's face, what they should be looking at, and we talked about some of that, but this line of research itself is not new, right? I mean, we're talking about a paper that is released this year, but there's been people there have been people working on this for a while now almost a decade thinking about how AI systems and machine learning systems can be used to look at this. I mean, on the one hand, we might be looking at images. On the other hand, we might be looking at videos. It seems like videos are more important because you want to catch that movement and the action.

Speaker 1:

And the delay right and the delay, right, right.

Speaker 2:

So I think what's really cool is yeah, so facial expression and this notion of reduced expressivity in some sense is potentially capturable or detectable by machine learning systems. Now the question is how do we do it right? The simple, brainless approach to this would be just to dump the images into a machine learning model, get enough training data and then just have it detected. That's sort of the black box approach to a lot of machine learning approaches. Today tend to do that sometimes with other things, other applications, but here the researchers don't do that because they have a lot of human knowledge about those facial expressions that they can then use. So I thought this was really cool because they use something called the facial action coding system, which was developed in the 70s by paul ekman and and colleagues, and paul ekman's claim his claim to fame was about this whole work on microexpression and identifying deception and lying from looking at people's faces oh, is that like the uh inspiration behind the lie to me show?

Speaker 2:

I love that show great show and I forget which network it was on, but that was a great show. And yes, so there's a lot of micro expressions, which I'm calling micro expressions, but they're just different muscle movements in the face that uh ekman systematically catalogued and and tracked and he created this whole system called the facial um action uh coding system and in that there are what's called action units. These are little dynamics, face dynamics that you know are important towards a larger expression, and so when we started off the show and you asked me how your smile was and I said things like inner brow razor, or I said dimpler and so on. These are action units as coded by the Ekman system.

Speaker 1:

Well, that makes sense, right? Because, like your smile, is not just your lips going up right. You smile and maybe you see someone, their eyebrows change a little bit. Or when you say, oh, I see the smile in their eyes, like, it's often a little bit of like creasing there. So there's more to an, there's like more to an expression. Your face, like you know, a lot of the muscles move.

Speaker 2:

Yes and and. And a genuine smile has has things like your lips moving up a little bit and your cheeks going up a little bit, and, and so there's like a difference between a genuine smile and a not so genuine smile. Right Again, going back to the deception point. So these are all different action units that together signal a particular expression, and so the researchers for this study, for the Parkinson study, focused on three different emotions smile, disgust and surprise. And what they did was they had a bunch of people smiling into cameras. They took videos, short, they took videos, short videos, three short videos of them doing that um, and then they just sort of cataloged the various um. You know, they used a system called open face, which is a um openly of, open source, freely available uh coding system, uh, that couples with your camera or your, you know, webcam and so on, that allows you to look at faces and identify whether a certain action unit was activated at a certain point, whether there was eyebrow raising or whatever. Right So-.

Speaker 1:

I went on. I went on their website. It's kind of cool.

Speaker 2:

Yeah, it's super cool and anybody can use it.

Speaker 1:

You can see like all the dots, like oh, these are all the spots on the face that they're mapping out for these expressions.

Speaker 2:

Yes, so yeah there's facial landmarks, which are things like your noses and your eyes and your tips and so on. But then there's these action units, which are not just like static things, they're actually movement things, they represent movement and um it. These systems now today are able to track that. So they they sort of in this paper, plugged and played though that kind of system into into into a bunch of images. They had like 13 or 1400 patients. I think that they looked at um or 1400 people participants right, right.

Speaker 2:

Not all of them had parkinson's right right, right right about 400 of them had parkinson's um, but then what they did was they looked at the images, and then they are the videos, and then they identified all the different action units, and they focused on a selection of about seven different action units for each expression. There's some overlap between them, but they're different. So, for instance, for, uh, the smile expression, they identified that you want things like the lip corner puller and the dimpler and lips apart and so on, and and even some eye blinking, which is very interesting, whereas for disgust there was brow lowering and there was some other stuff that was a little bit different.

Speaker 1:

Nose wrinkling. Nose wrinkling and some other stuff Jaw dropping.

Speaker 2:

Yeah, I mean these are different action units but there's some overlap between them. But they identified seven of them that are unique to each expression, that best capture that expression. Then what they did was they also tracked other features of the face and this had more to do with facial landmarks than anything else, and how wide the mouth was or how open the mouth was, and so on. And I think one question that I had as I was doing this was well, we all smile differently, we all look different, we all have different.

Speaker 2:

Like, if you go to that micro level detail, how are you going to take apart individual differences? And some of the things that they did was to what they call normalize, which is they took the patient, they took numbers, like those action units, but then they divided them up by the individual patient. So you take out that issue, you don't worry If somebody's eyes are differently placed than somebody else's eyes, and that's sort of quote unquote normalized, which means that that issue is taken out of the picture and you focus on the quality of the movement itself, the actual relative movement, as opposed to the absolute values of those things.

Speaker 2:

Yeah, yeah, okay, that makes sense, and so they did that and that I thought was really cool, um, but what I really loved about this paper was the fact that, again, it was not a machine learning model that was just dumped in with um, you know, just images and videos. Instead, these are the features they used as inputs to the machine learning model it's like they had a human in the loop. Yeah, well, they did have a human in the loop.

Speaker 1:

Yeah, I know, I'm just, I know, right, we're always talking about it, but it's like they had a human kind of supervise. What are the things that we should be looking for? What are the things that you clinically look for and translating that into the AI language like dimpler and lip corner puller, but still it's like what are you looking at in terms of hypomymia and masked facial expressions and Parkinson's, and how can you sort of map that out into something that computer vision could also recognize?

Speaker 2:

Yeah, so you take the image of the video, you extract out all of these action units facial action units, per, the coding system and some other landmarks, and then you feed that into the machine learning model. But they did one more thing, which is more than just that, like you could have just taken all of those features right that you've identified and use that as input to your machine learning model and said, okay, if you have these features, um, my machine learning model predicts that you'll be, you know, your parkinson's or not, right? But they did one thing further, which is they took all of those features and then they analyzed all their data and they figured out that not all features are equal. So even in, in other words, not all features are equal in distinguishing a healthy patient versus Parkinson's patients, they figured out that smiling was the best differentiator.

Speaker 2:

At least the way they mapped it out, right as opposed to disgust to discuss, surprise and anything else, like the features associated with smiling were the ones that were best indicative of um, of that difference between a healthy patient and a parkinson's patient. Now, why, why does this matter, one could say. Why don't we just give all of the features to the machine learning model and have it figure it out, and you take that step, one step behind.

Speaker 2:

Going back to what we talked about, before you could have just given it the entire pixel image, a frame by frame of the video yeah, I'd be curious what would happen then right.

Speaker 2:

But what happens is you get if you gave it all of the features, and which is what they did initially, but they also did the smiling piece they were able to generate models that were. And if you train the model just on the smiling features now, because now that you know that smiling is important, you discard everything else and just focus on that you get a more generalizable and explainable model. You get a model that's simpler, which Occam's razor right. It's going to work better, but it's also less susceptible to what's called overfitting. Overfitting is a concept in machine learning, in machine learning models. All machine learning models are trying to do is you have a bunch of data and they're trying to find a pattern in that data. So they have.

Speaker 2:

And the simplest way to give this example is imagine a chart in which you have an X axis and a Y axis, horizontal and vertical, and you have a bunch of points. You have a bunch of dots on that chart. That's your data points, and let's say the dots kind of look like a line. So you draw a line through it. That line is your model, or predictive model of that data. So in the future, if you want to predict some other point, another future point. You just extend that line, right, but that line has. But if you did, if you had a bunch of points that are kind of scattered, maybe that line isn't the best. Maybe you need a model that's kind of curvy, that goes through those points better.

Speaker 1:

Does that make sense so far? Yeah, that makes sense.

Speaker 2:

Yeah. So the issue is that if you have a model that is very closely tied to that specific data that it's being trained on, you can get it to work really, really, really well on that data. But what the risk is that there are in the real world there might be data that's outside of it and it might not be robust to that other data. And what you've done is you've overfitted the model to that data set that you trained on. It means you really carefully tuned that model to work on that one data set, without accounting for the fact that there could be variations outside.

Speaker 1:

Especially in the real world, right yeah.

Speaker 2:

So if you have fewer things to worry about in the model, it's not going to be a perfect model, but it's going to work better in the real world and you you run the risk of not overfitting it. So it's almost as if the model did too good in its own training data. When it was actually being validated it did very poorly whereas in some populations right yeah, and that's what they found.

Speaker 2:

They found that the smile when you took a model that only used smile information, it did better in sort of the validation data when it was being evaluated, not when it was being trained. It was only trained on the smiling data. It did better than the model that was trained on all of the features.

Speaker 1:

It's interesting. I want to back up for a second because I do want to talk about how they had problems when they tried to sort of extrapolate from this model. But to me it's like if you said to me hey, like, look into this, you know video with your iPhone, like take your phone, take a video of yourself smiling, I'd be like, sure, no problem, like I can smile, I smile all the time, whatever, right. But if you said, oh, please, take a video of you with a with an expression of disgust, I don't know that I could like do it right off the bat, the same with surprise, I could like make a surprised expression, but that that might not be exactly what my surprised face looks like. Do you know what I mean? I feel like I would have to be genuinely surprised or genuinely disgusted, whereas with smiling well, I mean we talked about before like there's fake and real smiles too, but it's like a little bit easier to generate as a human being without like a stimuli.

Speaker 2:

That's a great point. I don't know how they prompted these different expressions. You know, I think there have been other studies that I saw that dealt with more spontaneous things to create those expressions. I mean, you could, yeah, I don't know what the stimuli were for this research project. Great point, which is maybe the reason smile is doing so well, is because smile was easier to to listen than um other expressions.

Speaker 1:

Right, and across the board not just for parkinson's, but for, for people who didn't have it too right and that may be a confounding variable.

Speaker 2:

For sure, that could be it. I don't know that they talked about that specifically, um, but maybe they did. I I think that they um. I think that's definitely a fair point and and this is part of the research effort right, you're trying to figure out the, the core set of features you care about, so that the machine learning model is general and not overfitted, but at the same time, you want it to actually work in all settings. Right, to actually work when it matters, and that's not, that's not exactly what happened here.

Speaker 1:

It you know they had trouble when they said oh, let's take this model to Bangladesh and see how it does across different races and different sexes. And basically what happened there was they found a pretty low positive predictive value. So if you screened positive with this like, oh, we think you might have parkinson's, there was like over a 60 chance that you like didn't all right, so I can see that being a huge problem.

Speaker 2:

Right, the problem of false positives means that it's a bigger burden for the health care system if, all of a sudden, you're sending all these people who don't have parkinson's in to get checked right and the place where there's already not enough neurologists, Right exactly.

Speaker 1:

So they, you know, the reason for doing this was like, oh, we could do a better job triaging and figuring out who to send to the neurologist versus not. But now if you have this like positive predictive value which is so low and I think part of the reason it was low is because the prevalence of Parkinson's was low in this population but that means you can't just like release something like this online and let anybody take a video of themselves and say, oh, do I have Parkinson's or not? Because now you're going to get all these anxious people who think they have Parkinson's and like, in certain subpopulations, big chance that they don't. Right, like the chance was more than 60% that they didn't in this Bangladeshi population, the chance was more than 60% that they didn't in this Bangladeshi population, and so it argues for not having something like this universally available. And you could overburden the system that you're trying to, you know, burden less with a triage tool like this. So you know, I don't.

Speaker 1:

I think there's promise for this type of technology, as we discussed. Like I think there's promise for this type of technology as we discussed. Like there's the clinical theory behind it. You really could identify the hypomymia or mass facial expressions. Then it's like how do you translate that into like a supervised machine learning and computer vision problem? But then when they went to put it in the real world it didn't fully pan out. So it's like we're not. It's like a great idea and we're also not all the way there yet.

Speaker 2:

Yeah, and I wonder why. Sometimes right, because it's not just. I mean, the microexpressions and the action units are probably very indicative of what needs are very good in terms of features, right, but what else are we missing?

Speaker 2:

And I think that there's so much. This is why this topic is super exciting. There's so much more to be done here. Up until this point, there was some other work that people had done in 2021 and 2022 where they didn't use these facial expression units or action units. They used other aspects of the facial geometry to kind of work through what might be better indicator of Parkinson's versus not, but to me this seems like the most using these. The Ekman-based system seems to be the most exciting one because, a we have things like OpenFace that allows us to detect them effectively, right, and secondly, we also have potentially an understanding of them at a very fundamental level and link them up to expressions, which we then can link them up to the you know the hypomymia right. So I think to me that is super exciting. I guess the missing pieces there's cultural aspects, there's ethnicity, there's other human aspects.

Speaker 1:

Yeah, definitely, and, and honestly I'm going to go like less technical here. But what if someone's if someone's phone camera is a lower quality? Or what if the lighting isn't good? Or what if they took it at a weird angle? I feel like there are so many things that could alter how this works in a real-world scenario that you may not have in the experimental scenario.

Speaker 2:

Right, right, right. Which then comes to the question of who should be the one using a system like this to help detect. I mean, what's one thing to say? Everybody's phone should have this, but clearly that's not great.

Speaker 1:

Yeah, I think it would just freak people out at this point.

Speaker 2:

Right, but on the flip side, maybe you know a hospital in Bangladesh where you have doctors who have a fixed system of fixed camera in a fixed room with fixed lighting and patients walking in the doctor doesn't have to be a neurologist, but patients walking in, images are being taken and they're using that.

Speaker 1:

maybe you have better luck, maybe you have like a yeah, primary care provider who's like suspicious and so they're like, oh, maybe, maybe I do want to refer this person. Or, um, you know, it happens more often at certain ages, like older ages, right 60, 70. So, like, maybe it could apply to a subset of people. I agree, though this is not something you would want universally available. It's not ready for, like public prime time at this moment. But I do think there's something here. It just needs to be developed a little bit more first.

Speaker 2:

Yeah, and I think in future episodes we're going to need to talk more about various technologies that use facial information, because that seems to be a very popular direction that a lot of people are going in in exploring that because of the fact that your phones are in your hands and you can look at your face right away and the promise that somehow your face is going to tell you about your health and your whatever sickness is amazing, right.

Speaker 1:

It's incredible. It's so easy to take a selfie or or like a self video. Yes, and if we could extract lots of clinical information from it? I mean, there are reports of people doing blood pressure from it, for example. There's a lot we can learn from someone, and the the ease of use is right there, because it's right at your fingertips all the time.

Speaker 2:

Yeah, yeah, yeah absolutely Great.

Speaker 1:

Well, I think this was a really interesting topic for us to talk about, because there is this parallel between okay, here are the clinical features that we might be looking for, like the bradykinesia, the rigidity that can be manifested as a mass facial expression, and then you layer on top of that. Okay, like computer vision and ML research and open face. Ai can identify some of these same things we call them something different in this setting like lip corner puller, dimpler, jaw drop, blink, etc. At the end of the day, this could be really clinically useful. I don't think we're there quite yet, because when they went to test it in the real world scenarios, it didn't perform as well as we might want it to. I think there's potential here, going from theory into practice and being able to have this like human in the loop, and this supervised machine learning where you're saying this is what we look for clinically, this is what you should look for. Here's a translation of that.

Speaker 1:

Yeah that's great.

Speaker 2:

Awesome.

Speaker 1:

Well, we will see you next time on Code and Cure. Thank you.

People on this episode