Code & Cure

#45 - How Machine Learning Improves Stroke Prediction With AFib

Vasanth Sarathy and Laura Hagopian

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 25:05

What if an irregular heartbeat could quietly set the stage for a stroke? Atrial fibrillation is common, often confusing, and potentially dangerous because it can allow blood to pool in the heart, form clots, and send them traveling to the brain. The challenge is not simply knowing that AFib raises stroke risk—it is deciding who truly needs anticoagulation. Blood thinners can prevent devastating strokes, but they also increase the risk of serious bleeding, making the “right” answer highly dependent on each patient’s risk, context, and values.

We begin by breaking down the clinical basics: what AFib is, why clots can form in the atria, and how those clots can lead to stroke. From there, we unpack CHA₂DS₂-VASc, the standard scoring tool used to estimate stroke risk. Its simplicity makes it practical and easy to communicate, but that same simplicity can also be a limitation. Fixed point values do not always capture the complex ways age, medical conditions, medications, and real-world patient factors interact.

Then we turn to a paper asking a practical question: can machine learning better predict one-year stroke risk after new-onset AFib using information clinicians usually have available from the start? We explore feature selection with BIC, the importance of external validation, and why even a straightforward logistic regression model can outperform a classic clinical score. We also discuss why XGBoost performs so well with tabular clinical data, how it captures nonlinear thresholds and interactions, and how SHAP explanations can make predictions more transparent and clinically useful. We close with a clear stance on “AI said so” medicine: targeted, interpretable models may help with high-stakes risk prediction, but black-box LLMs are not the right tool for deciding who should receive anticoagulation.

References:

Interpretable machine learning models for stroke risk prediction in patients with newly diagnosed atrial fibrillation
Lin et al.
Nature Digital Medicine (2026)

Credits:

Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0
https://creativecommons.org/licenses/by/4.0/

Static Scores Meet Machine Learning

SPEAKER_00

For decades, we've predicted stroke risk using static scoring systems. Machine learning may finally change that.

SPEAKER_01

Hello and welcome back to Code and Cure, where we discuss decoding health in the age of AI. My name is Vasant Sarathi. I'm an AI researcher and cognitive scientist. And I'm here with Laura Hagopian.

SPEAKER_00

I'm an emergency medicine physician. We saw a lot of um we're talking about atrial fibrillation today and stroke risk. That's probably like a lot of words that I can like, what are you talking about?

SPEAKER_01

I can't wait for you to unpack this because as we were preparing for this, those were words that I sort of kind of knew what they meant, maybe, but I'm really excited to learn about that.

SPEAKER_00

Yeah. And I mean, atrial fibrillation is very common. I saw a lot of it in the emergency room. Um, and there's a lot that goes along with it. But I think it probably makes sense to back up and and talk about sort of the clinical side of this before we dive in to the AI side of this and to the machine learning side of this. Um

AFib Basics And How Clots Form

SPEAKER_00

atrial fibrillation, uh, back to the beginning, what is it? It's an arrhythmia. It's the most common arrhythmia. It affects over 58 million people uh across the world. And it it's basically a form of an irregular heartbeat. And so what happens is that the upper chambers of the heart, the atria, they quiver or fibrillate. Um, and this often often happens quickly, it happens irregularly, and so your heartbeat is kind of uneven. It's probably the best way to describe it. It's uh it's more common as people get older, but other things can trigger it as well. Um, and and one of the main things we worry about with atrial fibrillation, it can cause a lot of a lot of issues, right? It can, especially if your heart is beating too fast and all these other things. But one of the main things we worry about is is a stroke risk. Um, because what's happening is when those upper chambers, those atria fibrillate rather than contracting in a coordinated fashion, then the blood kind of pools. Oh, it kind of sits in that one spot in the atria, right? Right. And it can then clot, right? It's because it's not moving like you want it to. Well, actually, the the clot doesn't do that. The clot can then travel. Oh, I see. Okay. So when your heart beats, that clot can travel anywhere, can travel anywhere in the rest of your body through your arteries. Your arteries lead to your legs, your arms, your gut. So you can get clots that go there. But one of the more common places, unfortunately, that that they can go with atrial fibrillation is the clots can go into your brain, causing a stroke.

SPEAKER_01

Got it. Okay. So just to take a step back though, a clot in this particular instance is just bad because it's a clot and it's going to go block something that shouldn't be blocked.

SPEAKER_00

Yeah, exactly. So when the blood pools in the atria it forms that clot, and that clot, because it's coming out of the heart with the rest of your blood, can travel into the blood vessels in the brain and cause a stroke. And so when we treat atrial fibrillation, we often do, you know, do we try to control the heart rate? We see if we can get them back to a normal rhythm. Those are all components of it. But the thing that I want to focus on today, based on the discussion we're having and the paper we reviewed, is hey, how do we figure out who's at high enough risk for strokes that we want to give them blood

When Blood Thinners Help Or Harm

SPEAKER_00

thinners? Because if you give them blood thinners, then they're much less likely to have a clot happen. Right. Much less likely to have that then go up into the brain and cause a stroke.

SPEAKER_01

So atrial fibrillation alone isn't enough to give people blood thinners.

SPEAKER_00

It's an interesting question. It's like, well, what is someone's risk, right? Because any treatment that you give someone, there's like pros and cons to it, right? Like uh something that thins your blood. What do you think might be a con of that? Oh my god, side effects.

SPEAKER_01

If you bleed anywhere, it's just not gonna clot at all. Right, exactly.

SPEAKER_00

So you could you could have like something sort of minor where you just get a big bruise where someone else might get some a small bruise, right? But you could have a bleed, I don't know, in your GI tract from an ulcer or something like that. You could have you you fall down and hit your head, and it's not a minor thing anymore. You could have bleeding inside the brain too. So there these things are, you know, there's there's like reasons to do it, and then there's reasons not to do it. And you want to make sure that the reasons to do it sort of outweigh the reasons not to do it because there's always risks associated with medications or interventions, you know. If you were to if you were to have a surgery, for example, you sign this big long consent form. You're like, okay. Long consent. Yeah. Well, it's like, here's why we're doing this, here's why it's important, here's why it's necessary, here are the bad things that could happen. They're rare, but they could happen, right? And so here it's the same thing. It's like, well, we want to understand who would benefit from blood thinners the most because we know that they can have some adverse effects too. And so there actually are prediction tools for this already and that are used in in in medicine.

CHADS2-VASc And Its Blind Spots

SPEAKER_00

The the most common one that we use right now is called the Chad's to VASC score. And what that does is it calculates the stroke risk for patients with atrial fibrillation and it estimates that risk. And so very low scores may not need blood thinners. Um high scores should have blood thinners, but of course it's a discussion you end up having with your provider because there are things that you may need to consider uh about each person's individual situation. Sure. That the score may not pick up, right? Do they have frequent falls or do they have a history of multiple GI bleeds in the past or whatever it is? Um and so what the Chad's too vast score does, it's like simple addition. Um, and things score you different points, like zero, one point, or two points based on your age, right? So older, we know, is more like more at risk for stroke based on your sex, based on any uh history of certain medical conditions like congestive heart failure, hypertension, stroke, uh prior vascular disease, like having a heart attack in the past, or history of diabetes. And so what you do is you like answer yes or no to all these questions, and you get, you know, it adds it up for you, and then you just get this simple sort of score. And from that score, you and your provider can have a discussion around hey, is it do you recommend? Do we think this person should have blood thinners? Do we think this patient should get blood thinners to try to prevent them from having a stroke?

SPEAKER_01

Got it, got it. That's super helpful. Thank you. Yeah, that's that's great.

SPEAKER_00

But it is somewhat limited, right? It's not gonna necessarily capture like a complex pattern.

SPEAKER_01

Yeah, yeah. And we'll talk about that when we talk about the technical side of it.

SPEAKER_00

And it's the same for everyone, right? Everyone has that same score that's being calculated.

The Study Question And Validation

SPEAKER_00

That's right. And so the question they were asking in this paper was like, hey, can we make a machine learning model to predict stroke risk in patients with new onset atrial fibrillation? I think that's a very interesting question.

SPEAKER_01

Yeah, it is a very interesting question, and it's a really nice paper for people who are. I would highly recommend it for people who are learning machine doing machine learning or understanding how these models are applied and the challenges. This is a very nicely written paper. Um, and one of the big things you're tackling is um external validation and making sure that the data is good and all of that stuff, right? That we talked about before in the podcast.

SPEAKER_00

Yeah. So they train like they trained on one set of data, but then they tested on a whole nother set of data that was actually distinct, different.

SPEAKER_01

Yeah.

SPEAKER_00

Uh the patients were different, their demographics were different, and it still worked.

SPEAKER_01

Yeah. Yeah.

SPEAKER_00

You know, their machine learning model still worked.

SPEAKER_01

Yeah. And and and what's what's interesting also is the underlying machine learning techniques themselves are not super revolutionary. They're not necessarily new, but it's like I said, it's a great example of a machine learning algorithm and system that is well designed for the task at hand.

unknown

Yeah.

SPEAKER_00

So I'd love to have you dive into that a little bit more because what they did was they said, okay, well, you know what? We may not have what let's let's do a real-world scenario. You may not have um lab values. You may not have like a ton of information about these patients if they're being newly diagnosed, right? So let's figure out what would be sort of generally available, right? Like their age, like their medication, medication list and whatever, you know, past medical history they might have, whether that's like hypertension or diabetes or something else. Let's just take that information and see if we can use that to predict what someone's stroke risk is.

SPEAKER_01

Yeah, yeah, yeah, yeah. So let's, yeah. So let's talk about this. So they they are predicting the one-year stroke risk after an atrial fibrillation diagnosis.

SPEAKER_00

And you can call it AFib for sure. That sounded like so awkward when you were saying that. It was really funny. In my head, I was thinking the tricks.

SPEAKER_01

In my head, I was like, maybe I should just say AF to be all cool, but that would be. AFib.

SPEAKER_00

AFib.

SPEAKER_01

You can be cool if you say AFib. All right, I'll say AFib. So predict one year stroke risk after AFib diagnosis. Um, and the we talked about this, but the hand score, the Chad Tuvasque score is problematic. Well, not problematic necessarily, but it's it's good. It's still very good.

SPEAKER_00

Could it be better?

SPEAKER_01

But could it the question is could it be better? It's very rule-based, simplistic, and it has a set of fixed weights that people apply. And it's kind of just like going through the math and doing that.

SPEAKER_00

And it's not personalized, right?

SPEAKER_01

Yeah, but the nice thing is it makes sense. People understand it, it's interpretable. Um, and so there's some benefits to that. So that the question is, can we make it better? And this paper addresses that. So one of the things they do first is um figure out how we can take what should be the input to our model, right? We have all of these different pieces of information coming in, raw hospital data, uh, people's demographic information, all of this stuff. How do we take all of that into account? Let's say we had a data set that had all for every single patient, had a bunch of those things, right? And some of those things that you mentioned too, some other measures, right? They have all of that. And let's say you know you have a data set that tells you whether or not that patient patient had a stroke a year later or not, right? Let's say you had that. A lot of those measures are potentially not relevant to the uh an

Feature Selection With BIC

SPEAKER_01

analysis. So one of the first steps in the machine learning process is something called feature selection, which is a method to basically figure out which of these variables that are put in, the inputs to the system, are actually useful for the calculation. Um because if you just include everything in, what can happen is that the results might seem really good, but then you're what and what ends up happening is you might be overfitting, which is uh it might work for this patient population, but it won't work for something else. Because you've given it too many details and it's made the patient very specific, and it might work for that very specific uh situation and causes noise when you go to a different setting.

SPEAKER_00

I could also say from like a clinical standpoint, if you have too much data that's going in, you you don't actually you may not have that data. You may not have that data on the next patient. Oh, that's true, yes. Right? Like you may have XYZ lab on this patient and JKL lab on the other patient, right? And so a lot of times, if you're taking clinical samples and you're looking at the population level, there's gonna be big gaps in the data.

SPEAKER_01

No, exactly. And so they use a technique called Bayesian information criterion or BIC. Um, and that is a mathematical trick or technique to figure out which of the ones, which of the variables actually matter. And it actually what it does is it it it it rewards cases where there's like you know, uh support with respect to the the variables correlation with respect to the output, and it it it penalizes uh too many variables or you know, uh uh it so so it accounts for this exact thing we're talking about. And so they use that technique, and what they did was they identified between nine and eleven core variables um that that they considered are important for this. And so that first step of feature selection, by the way, is a really hard problem in and of itself.

SPEAKER_00

There's they just like braced past that. They were like, by the way, we did this first.

SPEAKER_01

Yes. So they were, but but you know, it's also tied closely with the model you're choosing.

Logistic Regression Versus XGBoost

SPEAKER_01

So they chose two models they looked at. One is what's called a logistic regression model, and the other is called an XG boost model. And we'll talk about each of those models separately. But uh, for the logistic regression model, they picked nine features, like things like age and uh and some other, you know, prior case of prior stroke and chronic lung disease and some other things.

SPEAKER_00

Yeah, and medications were on this list too. Like, oh, are they on an antiplatelet agent? Are they on that's what APA stands for? Yeah, there you go.

SPEAKER_01

Yeah. So yeah, so they have all of these things, but they only figured out these nine features actually are are best uh predictive of um the data that they had. Um and this allows the logistic regression model to be more compact, right? Because your input's only nine things. Um, but logistic regression models are actually really good. And in fact, the one that they had had a very high performance, high performance. It did really well on this task.

SPEAKER_00

Better than the Chad Tuvask score.

SPEAKER_01

Yeah. Yeah. And it's this is really important because it's a very simple model and it does really well. And some might say it's interpretable because it kind of is. Because it's like your Chad Tuvask score, it also kind of sums up various of these nine variables and assigns sort of weights to them, which is more important, which is less important.

SPEAKER_00

Like, so if, for example, having a history of hypertension and diabetes, you know, you may find that those people are at higher risks for having a stroke with nuance at the end. And so you can go in and look at that. Like, here are the variables that are associated with it, here are the ones that are not.

SPEAKER_01

Yeah, yes, and it's a linear model, which means what they're trying to do is fit the data to a line, not literally a 2D line, but some higher dimensional line. And that's that's the core idea behind a linear regression model. Uh, but the problem is um some of the um behaviors and and dependencies between the different variables can be nonlinear. And so an XG boost model is a more is a newer technique uh that actually is very interesting. So the XG Boost is used heavily in cases where you have tabular data and you have you know kind of mixed data like that. And what it's based on something entirely different. So logistic regression was based on fitting things to a line. Like if you had a line, and if you know the mathematical equation for the line, and getting a little mathy here, but uh a line equation is y equals mx plus b, right? People might know that, people might not know that, but that's a like I remember that. Yeah, where x is the input and y is the output, and m and b are just the things that you can, the knobs that you can turn. And a linear model learns the m's and b's for the data you give it. And so then it finds the line and then it's able to predict for other x's what the y could be. Um that's broadly what it is, but uh uh XG boost model works completely differently. It starts from what's called a decision tree. And a decision tree is very straightforward. So, like if you think about it, if you just picked one of the variables that we talked about, maybe pick age, right? Maybe you have a rule that says if the age is greater than 75, yes to stroke.

unknown

Uh-huh.

SPEAKER_01

If it's less than 75, then check the HDN. And if that is high, um, then uh medium risk. And if that is low, then low risk. So you can imagine uh a decision logic that follows that keeps branching and creates a little tree. Now, if you have lots of variables, you have to branch quite a bit across and you'll get quite a quite quite a no, but it the the advantage of the the tree format is that it naturally captures all the thresholds you care about. If the the the data is kind of weird and non-linear, if there's like sudden changes because uh age really does matter, let's for let's say for instance, then you're able to capture those nonlinearities really well. You can capture interactions, like this and this needs to be greater than a certain value before you conclude something. So, like you can do things like that in uh a tree structure. Uh and trees and decision trees have been around for a really long time, but it's really hard in some ways to train them. They're very interpretable because at the end of it, you have a logic for how you arrived at the final.

SPEAKER_00

Yeah, which is always what you want to see with something like this before you make a clinical decision.

SPEAKER_01

Absolutely. Absolutely. And and XG Boost is a technique in which you learn the best tree by picking really crappy trees to start with, and then you improve on the error. You look at the error that was produced and you learn to change the tree to get better, lower and lower error. And it's just a technique that people use called gradient boosting. XG Boost stands for extreme gradient boosting. And it's essentially a way to get at the better, best tree through this incrementally picking the worst model and kind of working your way back upwards. Uh, so it can learn all the thresholds, it can learn all the feature interactions, it can learn uh conditional effects. For example, diabetes is only dangerous when the age is greater than 70 or something, which you'll notice a rule like that is really hard to express in logistic regression. Or that's not the chat.

SPEAKER_00

And they gave some examples in this paper. I thought this was very interesting, where it's like, oh, in this this patient, patient A, like being on a specific type of anti-arrhythmic drug increased their risk of stroke, but being on that same drug decreased the risk of stroke in a different patient who was a different age and had different health history. I think that's so interesting because it means like it's like truly personalized.

SPEAKER_01

That's right.

SPEAKER_00

And it's not just do you have X or Y, but how do X, Y, and Z interact to create a overall picture of your risk, which is like clinically what you want to see, but you're but that's not what you get out of uh like a more simple scoring model.

SPEAKER_01

Exactly. And it can handle heterogeneous tabular data really well because you might have mixed stuff, you might have noisy stuff, some might be like decimal numbers and some might be other things. It can trees are fantastic at handling that kind of thing. Uh, like we just talked about, we can get um you know, we can discover interactions between them. Um, and it there are techniques in XG Boost that prevent overfitting that we talked about before. They only used 11 of the features for the XG Boost. So they did the same feature selection thing we talked about at the beginning to arrive at um uh 11 things. XG Boost has been around for a little bit, and it's really a technique that's become very optimized. Um, things have been parallelized, it's very efficient, it's very fast, and it generally beats out a lot of the other models in most in many of these um cases where you have data and you have like this kind of data and you have this kind of task.

SPEAKER_00

And so instead of like going, so right now, for example, I go into a physician calculator MD Calc, and I'll just be like, okay, this is the age, this is the this, this is the that. Instead, I'd need to like use basically an application to do it because it's so complex. Yeah. Or I'd have to integrate it into the medical record somehow where it would pull all that data in.

SPEAKER_01

Yes, exactly. And again, I want to stress this point, which is that it adds in a linear model, you're always multiply because let's say you have a linear model that you've trained and you'll learn and it's really high performing, but it's figured out that any diabetes, sorry, any age factor is um multiplied by 0.5, right? Just making up a number, 0.5. Now, that weight that you add, the 0.5 that you add, the multiplier that you add, is a knob that was learned over the course of its training. That's fixed across all patients, right? It's the same amount of weight or risk you apply for a particular uh variable across all patients.

SPEAKER_00

And that's actually true in the Chaz 2 VASC score. The two in there gives like two points based on age greater than or equal to 75, whereas 65 to 74 is only one point. So it's like a real world thing that happens all the time. Yeah.

SPEAKER_01

Yeah. Um now on the flip side, XGBoost is wonderful and it does a really good job.

SPEAKER_00

And um both linear regression and XG Boost outperformed the calculator, the Chad's tubeass calculator.

SPEAKER_01

Yeah, yeah, yeah, yeah. So they also did some additional stuff. So there's additional um techniques that can take XG Boost's output and

SHAP Explanations And Why Not LLMs

SPEAKER_01

do more. So if you ask the question of why did this particular patient get a 39% risk score, why did that 39% risk specifically come from? Um, there is uh another technique called uh SHAP values or Shaply additive explanations. Um and it's a technique that derives from cooperative game theory, but it basically is able to figure out how much each feature contributed to this prediction and kind of walk down that tree for you a little bit. Um and we don't have to go into details here about the SHAP itself, but it allows you to get these kind of local explanations. Um, and it's useful when you want to targeted look at a specific result for a specific patient and look backwards.

SPEAKER_00

And I always want to see something that's explainable. It's like if you have a black box that's giving you an answer, it's like, well, how much do I trust it? But if you can look under the hood and be like, hey, it's your history of hypertension and diabetes along with your prior history of a stroke that is triggering such a high risk for you that's making me recommend the anti-coagulation, the blood thinners, then it it just feels better. It feels like, okay, I know why we're doing this thing that we're doing.

SPEAKER_01

Yeah. And and we're able to say things like, you know, younger age can reduce the risk and older age can increase the risk, but effect depends on patient profile, right? And so that's going to be separate for different patients. So I I'm really excited about this paper because it, like I said, it's obviously the performance is great, right? You have to do that. The performance is great.

SPEAKER_00

Yeah. It's truly letting you, especially with the XG boost model, personalize this stroke risk because it's not just about X, Y, Z, but About how X, Y, and Z interact.

SPEAKER_01

Yeah, exactly.

SPEAKER_00

Right. And it's the things that I would want to see clinically, which is like, hey, we don't need a ton of input features. We don't need everyone's lab values. We don't need 50 different things. We need 11. Um, and it's explainable. So we can go in and understand, hey, here are the things that are triggering this, and here's why. Here's why we would or would not recommend um anti-coagulation at this time.

SPEAKER_01

Yeah. And you know, the natural question that people ask always is like, why can't we use modern AI like ChatGPT or LLMs or something else like that? And we have to remember that those systems are neural networks and neural networks are black boxes. So we lose all of these. Um we can send in all the features and train it, and it probably will do fine performance-wise. But um, but do we need such big machinery for doing something that is so targeted? And when we have a great model like this already.

SPEAKER_00

And would it be explained it's not explainable then, right? At all. It's not explainable. How do you even know if it's like part of my problem is how do you even know if it's good? If you're like, well, it looks looks good, and then you like send out a bunch of people either on or off blood thinners, and they have something bad happen. Who do you go back to and say, well, hey, this didn't this didn't work out how we wanted it to? This person had a brain bleed and that person had a stroke. I mean, nothing in medicine is 100%, right? You're you're talking about stroke risk and whether or not you want to put them on blood thinners. Uh it's possible that someone you recommend no blood thinners to could still have a stroke, and someone you do recommend it to could have a bad effect from it. Yeah. But you want to understand like what's the underlying reasoning for doing this rather than just doing it because AI said so.

SPEAKER_01

Yeah, no, absolutely. And that's it. I think that's the key here, and that's the key contribution in this paper.

SPEAKER_00

I do think in general, when you think about risk scores for all sorts of things, this does open the door, though, to have uh just more, more personalization and more interaction between the different factors that are being considered, right? Yeah. Simple addition is nice and easy. It's great. You get there's tons of calculators online for it. But when you look at how well these models performed, both, you know, initially and with the external validation study, it was like, oh wow, like we can we can really hone in on this better and understand who are the patients that are the highest risk. How do the different uh pieces of their health history interact with each other in order to understand? Okay, this person does belong on anticoagulation, this person does not. And there, here's someone who's in between that we maybe need to have a discussion and figure out together what the best path forward is. Yep, exactly.

Personalized Risk And Closing

SPEAKER_00

So exciting article. Um, can you say, can you say apib now?

SPEAKER_01

I'll I'll try again. AFib.

SPEAKER_00

There we go. It's perfect. It's a perfect ending to this podcast. We will see you next time on Coding Cure.

SPEAKER_01

Thank you for joining us.