Code & Cure

#12 - Oracle Or Algorithm?

Vasanth Sarathy & Laura Hagopian

What if we could glimpse our future health—not through guesswork, but through data-driven forecasts? A new AI model, codenamed “Delphi,” is redefining what it means to predict disease by learning from massive, population-scale medical histories. Built on transformer architecture, Delphi estimates the risk and timing of over a thousand possible diagnoses—offering a personalized view of what may lie ahead.

We start with familiar ground—cardiovascular risk scores—and explore how predictions only matter when they guide meaningful actions: improved blood pressure control, appropriate statin use, and lifestyle changes that truly bend the curve. But Delphi doesn’t stop at single conditions. It captures the real-world complexity of multimorbidity, mapping how diseases co-occur and unfold over time.

Delphi doesn’t “understand” biology—it recognizes patterns. Much like a weather forecast, it turns complex statistical relationships into calibrated probabilities. We break down how the model handles irregular patient histories, simultaneous diagnoses, and time-to-event forecasting—offering practical insights clinicians can use. We also explore how Delphi was validated across extensive UK and Danish datasets, and why “reliable” beats “flashy” in the real world of medicine.

One of Delphi’s most promising features? Generative timelines. By simulating possible health futures from partial records, the model creates synthetic patients—fueling research while protecting privacy.

At the core is a human question: would you want to know your likely diagnoses decades in advance? We unpack the emotional and ethical dimensions of predictive health—when foresight helps, when it overwhelms, and how to responsibly deliver these insights. If you care about AI in healthcare, predictive analytics, or the ethics of foreknowledge, this episode offers a grounded look at what’s here, what’s coming, and how to use it wisely.

Reference: 

Learning the natural history of human disease with generative transformers
Artem Shmatko et al. 
Nature, 2025

Credits: 

Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0
https://creativecommons.org/licenses/by/4.0/

SPEAKER_01:

In the ancient world, Delphi was the oracle of hidden truths. Today, it's the name of a powerful model designed to cut through the noise, surface insights, and help us see what's coming next. This isn't myth, it's data, prediction, and technology reshaping the way we understand the future.

SPEAKER_00:

Hello and welcome to Code and Cure. My name is Vasant Sarathi, and I'm an AI researcher, and today I'm with Vora Hagopian. I'm an emergency medicine physician. I'm very excited about this topic. I had to look up my uh Greek history a little bit because I I am not well um I haven't learned enough of Greek history to be. Not well versed, yeah. There you go, the worst. That's that's the word I was looking for. Um yeah, so Delphi was an ancient city in in ancient Greece, and it's where the they there was the oracle um at uh known as Pythia, the high priestess who would prophesy about things. And I I think that uh this paper we talk a lot about um and we're gonna do this more in more detail, but it's really cool because they talk about essentially prophesizing disease progression and why that matters and and sort of assessing risk, human health risk.

SPEAKER_01:

So an oracle is real, is basically what you're telling me.

SPEAKER_00:

Uh-huh. Except now it's AI, just kind of weird in a way.

SPEAKER_01:

Yeah. Yeah. Uh yes, it is. Gives you a little bit of like a creepy vibe almost.

SPEAKER_00:

Yes. I mean, you think of the high priestess as being some kind of a supernatural human or but you know, godlike being, but now we're now we're saying that's AI?

SPEAKER_01:

Yeah. A little strange, a little strange. Um yeah, but it's it's interesting because like the whole concept behind this is okay, like, can we predict the natural history of human disease with knowledge that we already have, right? Yeah. And we actually do this in medicine quite a lot already. Um, we don't do it for like, I think what do they do, over a thousand diagnoses here. We don't do it for over a thousand diagnoses, but there are risk calculators out there. Um, one of the common ones that I used to use is a prevent calculator. And the whole idea here is to predict what someone's cardiovascular risk is. So you're gonna input certain things like, okay, what's their, what was their sex at birth? What's their age? What's their cholesterol level? What's their blood pressure? Um, do they have diabetes? Do they smoke? Are they using a blood pressure medicine or a statin? What's their body mass index? Do they have overweight or obesity? And what happens when you put all of that information in as it gets, you know, synthesized into this calculation, and then it spits out a number and it says, okay, here's what we think this person's 10-year risk of cardiovascular disease is. This is what their 30-year risk is. This is what their risk of heart failure is, this is what their risk of stroke is. And if those numbers are high, then you do something about it, right? And that's like the key piece, right?

SPEAKER_00:

If it's interesting enough to know the prediction, but the whole point of asking for the prediction is so that we can decide to take action so that it doesn't play out the way you know we don't want it to, right? Right.

SPEAKER_01:

So this is where it's like, is it is it slightly different than the Oracle? Because you could maybe even change your future. Well, yeah, I don't know if you could ask Pythia what to do if she predicts something that's not favorable for you. Yeah, exactly. But in this case, with the prevent calculator, if you had someone with high risk values, you know, you could say, hey, we really need to intensify your blood pressure control. Let's start you on um a medication for your high cholesterol, like a statin, or and or let's intensify those lifestyle changes that you've been making. Let's, you know, improve your diet, let's uh encourage more exercise, et cetera. And so the whole idea is okay, we know you're at higher risk. So let's try to change that risk profile by making these adjustments. And actually, if you ran the risk calculator again and you'd brought someone's cholesterol and blood pressure levels down, you'd actually see a lower risk. So you are, you know, potentially changing the trajectory of someone's life course.

SPEAKER_00:

Yeah, so that's great. Yeah, that's really useful then in that setting.

SPEAKER_01:

So it was interesting to read this paper because, like, that's just a couple of disease processes. Yeah.

SPEAKER_00:

So let's talk about that a little bit. Um, you know, I think that they they said, okay, that's a good starting point to prevent calculators. But what we have here is potentially not just individual diseases, but comorbidities, diseases that uh are multimorbidities, where you have diseases that either occur with other other diseases together or are somehow secondary to a primary disease, but they happen at the same time, kind of. And there's also temporal dependencies between them. If one happens now, then the likelihood of another one happening later, they're connected somehow. And you have a whole bunch of very diverse sets of data. And so they said, look, we need to incorporate all of this into our model and maybe we'll get better predictions of not just like cardiovascular health, whatever, but like this collection of things, all the things that can affect an individual's health risk. And all of that is a function, right, of all of the stuff that's happened before, is I guess the assumption here. But um, so they said, okay, let's use transformer models for helping us figure this out. Now, let's take a step back. Yeah, I was gonna say, hold on, hold on, what's a transformer model? What's a transformer model? Okay, so first of all, um, your LLMs in today's world, Chat GPTs and so on, are all transformer models. Um, and so that's a starting point to think about, visualize it. It's a model, it's a um a program, a computer program that represents uh a neural network that is arranged in a particular way. Now, transformer models were conceived of and invented back in 2017 uh by some Google researchers. At the time, the focus of transformer models, for now you don't have to worry about the internals, just imagine it to be a box, right? And the idea was you would use it for machine translation. That is, you gave it some English text and it would produce French text. But translation wasn't so easy to do. And as you can imagine, because it's not like you take it word for word and you just translate them. The sentence structures are all different.

SPEAKER_01:

Yeah, exactly.

SPEAKER_00:

Um, and so there's it's much more complicated that way. So you had to take into account all kinds of internal dependencies of that sentence in order to, or you know, one word might connect with another word or it means something in a certain context, all of that matters. And so transformer models were found to be very good at doing the task of machine translation. Now, fast forward to today, those transformer models, which were originally intended for machine translation, um, ended up being very good for language modeling, which is predicting the next word. And as you know from everybody knows now, that when you scale that up and use trillions of pieces of internet data, you then are able to produce something like Chat GPT that is really good at not just translation, but a whole host of tasks. So that's kind of the path of transformal models. But at the very core, they're taking a sequence of data. Um, you know, your words are sort of a sequence of things that happen in time, and they predict the next thing in the sequence.

SPEAKER_01:

Okay, I see where you're going with this. Can I can am I am I stealing the punchline maybe? But like the whole concept here then is you have an idea of the temporal relationships in someone's medical record, for example, or lots of people's medical records, and then you're like predicting, okay, if they had obesity at 17 and then they also had hypertension at 24, then we can predict that they're, you know, there's a 70% chance that they're going to develop diabetes in their 30s. I don't know. I'm making the exactly, exactly.

SPEAKER_00:

Your sequences in sequence in time, sort of their age, and you you have certain information about them at various points in their lifetime. And the idea is that you can then, much like um your Chat GPT produces the next sentence, not just the next word, but it can, you know, keep producing things and keep adding to the sentence. It can produce an entire essay if you wanted it to. This can essentially produce the next 50 years of your life and produce all of the different um things that it can predict from that. Okay. I mean, it's cool.

SPEAKER_01:

Well, no, I'm like, I'm it one part of my brain is like, this is really cool. And the other part of my brain is like, this is really creepy.

SPEAKER_00:

Well, yeah. I mean, and predicting, I mean, part of the prediction is all kinds of diseases and potentially even death. So that's the creepy part of it.

SPEAKER_01:

Like, would you wanna, would you wanna know? And like, how much is fated versus how much is free will? I I don't have an answer to these questions. Like the the scientist in me is like, oh, this is this could be awesome to try to be able to change the course of someone's life and the change the course of their disease progression. But like, what if what if it's a syndrome where you can't?

SPEAKER_00:

Right, right. But you know, I I do want to make the point that, okay, so I think this is a super important point, which is that these models are not predicting the disease or whatever else based on some deep understanding of how the disease works or how the diseases work in your body. You I think I think I cannot stress this enough, but it's all about statistical patterns. So this model says here in your health history or millions of people's health history, it seems like if uh certain diseases happen at certain points in time, it seems like it's likely the case that certain other diseases will happen at certain other points in that in time. It is uh taking all that's the key point I want to make here. It is not saying I understand how diabetes works in the body, and based on the information you're giving me, this is how that disease will progress. That is not what this model is doing. Instead, it's saying I have information and I believe that statistical patterns might be useful in predicting what might happen next.

SPEAKER_01:

Okay, so there's two pieces I want to unpack there. One is like you're clearly telling me there's no causality, right? That's correct. They're not, you're we're not saying that X causes Y. Like the model doesn't do that.

SPEAKER_00:

Yeah, and neither does the paper. I just want to be clear. It's not like the paper is suggesting that either. But you know, and they're being very honest and open about that also.

SPEAKER_01:

Oh, for sure, for sure. But I think it's like important to unpack that. What was my second point? Now I don't even remember. It'll come back to me.

SPEAKER_00:

It'll come back to you. Maybe, maybe it'll be predicted. I yeah. So If only if only and and interrupt me if it does, because the I I want to keep stressing this point, which is this is a statistical pattern machine, like any other neural network. It's saying I have all this data, and you're telling me that there is some temporal sequence of time and age and disease, and you're telling me to find a pattern in that sequence, and I will tell you, yes, I can find the pattern.

SPEAKER_01:

I came back to me. Oh, perfect. Um you're talking about like, oh, what is this statistical chance that X will occur? Yeah. So it's not like you, Vassanth, will get diabetes. It's like you, Vasanth, have a 77% chance of getting diabetes, right? Like it's like more like a weather report. Yes. It's not, it's not sort of written in stone. It's it's not truly like the oracle where they're like, you, this is going to happen to you. It's like there's a good chance this will, there's a chance this won't, X, Y, Z, et cetera. But there's there's like a number associated with it that's not 100%.

SPEAKER_00:

Yes. And and the weather report is a great example. You know, and I always find this funny because the weather report might say something like there's a 50% chance of rain. And you're like, what do I do with that information? 50% chance, is that just a random toss-up? Will it rain or not rain? Is that the issue here? So, you know, I I'll bring my umbrella. Yeah, but what does 50% chance rain mean? It means that in all of the past data, when the weather conditions are the way they are now, 50% of the time it rained and 50% of the time it didn't, right? You have to take that information and say, okay, I that that's really not helpful for me because I don't know if it's going to rain or not. But it's not saying that there is actually a random chance of it raining today. It's saying that based on my review of the past data, this is what I think, you know, I'm not sure. It's a toss-up. It's basically giving you a description of the past data as opposed to a prediction of what's going to happen. And I think that's a a way to way to think about it as well, to avoid the confusion of causality versus correlation.

unknown:

Yeah.

SPEAKER_01:

No, it helps. Uh it's just like my, you know, medical brain doesn't always compute that way, right? Because I'm like, oh, if they have high if someone has high cholesterol, then it starts to clog up their arteries and that's what puts them at higher risk for for a heart attack or a stroke.

SPEAKER_00:

Yeah.

SPEAKER_01:

And so like my mind always goes to causality.

SPEAKER_00:

Yeah. And, you know, I do want to, I do want to turn back into our uh, you know, kind of our model and our approach here, which is very interesting because, okay, outside of the issues with respect to the statistical causality and all of the other stuff, I think that it's worthwhile kind of digging deep into how they actually did this. Because I think it's a useful exercise into the all the things that they thought about when they put this model together. Because it's not just taking a Chat GPT model and giving it health data. That's not what they did. They had to actually change the internals of these systems and change the architecture so that it would be better suited for doing this sort of prediction. Oh, that's so cool. Yeah. So, and you know, it's one of the issues they had was how do you put how do you give this information to the model? Like, how do you give age information? You could be, you know, it's one thing giving words and text, and you know, the way we did with LLMs, you know, if I gave it a sentence, the cat sat on the mat or something. That's just a sentence of discrete words, right? There's some order to it. However, there's no like it wasn't like the the and then there's a big gap between them and then there was a cat. You know, it's not like the gaps are variable. Whereas in a health data context, uh, the inputs are all age dependent. And you might get a diagnosis of a disease at the age of 15, but then it's not like you can check every year. It's like maybe at age 17 years and eight days or whatever, and nine two months and eight days, you get another disease, right? So, like the gaps between these age groups are all kind of variable. And so, how do you encode that? So they had to figure out how to change up the model so as to encode age, which was really cool. Um, they also had to do a bunch of other things um as well, uh, particularly before comorbidity type things. Again, when you have data that's just text, like the cat sat on the mat, it's just discrete words one after the other. There wasn't like two words at the same time. Right. Whereas here you have two diseases potentially at the same time overlapping. So they had to change the model to allow for um for that to be encoded in as well. Um and then they did some other standard stuff with with the prediction itself, which is I shouldn't say standard stuff because they did something very interesting, which is again in a Chat GPT model, your output is just more words. But here, that's not enough, right? You can't just say cardiovascular disease. You have to say when that's going to happen. So they had to predict the time and age as well, right? So that's a different piece there too. So they had to, these are the small changes that they had, I shouldn't say small, but these are the changes that they had to make to the architecture of the system itself in order to even make it work for uh work effectively. But it's fascinating because the inputs that they give it is essentially a long sequence of information starting with age. Um and and sorry, at each age point, they give it various pieces of information. So they get you know, every piece of information is like an age and uh, you know, a disease token. So like they they also provide information like the the sex at birth. Uh I think they also do uh BMI and then information about whether the person smoked and alcohol and so on. But the data itself was from um the United Kingdom, and they had this large data set uh of nearly half a million uh patients that they tested on. Um and what's really cool was it performed pretty well on that data set and they were able to take the same model, not change anything that's trained on that data set and test it on a Danish register of over 1.2 million people, and it performed pretty well in that as well. Um and so when I say pretty well, it wasn't like amazing, uh, which is fine, but it was good enough to say that hey, maybe there is some useful patterns in the data. Even if it's not causal, even if it's just correlational, the fact that we can use this health data and be able to do something useful is itself a very powerful thing, right? You can in you can you can have health um information and not be causal and still be able to predict something. And that's kind of what this paper was able to show uh as well.

SPEAKER_01:

And one of the things you also pointed out to me, which I thought was interesting, was like, okay, this is sort of like a an initial version of this where they put in things like age and sex and you know, alcohol use and ICD 10 codes and all of that, but they did not put in other stuff, right? They did not put in um, you know, wearable data. They did not put in blood tests. Um they did not put in someone's zip code. Right. And we know that there are other things that are very predictive of you know what happens with your health down the line. So it's like, could these models then become better with more data?

SPEAKER_00:

Yeah, no, absolutely. And I think they could, right? I mean, that's the the there's definitely if their pattern exists with this data, there's a very high likelihood that more uh information about the patient's health is going to influence this. And I think they did a bunch of other cool things in this paper, actually. It's really worth the read. And it's um uh they were able to not only do the predictive aspect, but they also were able to generate so they could play out the next, you know, there's examples in the paper of somebody's health record up to age 40 or something. And then they were able to play out for the next 20 years what their disease progression would be like. And they tested that as well, and the pred and the generative abilities are pretty good. And so much so that what they were able to do then was to say, hey, uh, you know what? We can maybe uh just synthesize data, we can create individuals, just have this model produce text, much like uh your chat GPT makes up a poem or whatever, right? Make it produce something completely different and make sure that it's sort of consistent with the patterns that is in the original data set, which it will be, and then use that data to train other models. And that has a lot of value because if you're in fact doing this on a larger scale, you can have more data. But more importantly, um, if you're just creating synthetic data, it's all the the issues of privacy and data security, all of those things go away because you now have uh you know data that's not at personal to any individual anymore. Right. But it's still useful for the training. So that of course requires that the patterns kind of work, right? Um and and the and the and the patterns still match and they and they work across the don't shift over time. Right. Yeah, right. And and so I think so there's that piece as well, which is the predictive piece, but the generative piece that is really powerful here. So it's not just a prediction engine saying you'll get X disease at certain date. It says I can generate a lifeline for you, a timeline for you of all the things that are gonna happen. Right. And uh of course, it's again, like I said, it's not causal. So those things are not necessarily going to happen. It's just at this point in time, given the past data, this is the likelihood of these things potentially happening.

SPEAKER_01:

Okay, but I have like a question for you. Would you want to know? Like, would you want to know all the diseases that were likely to happen to you ahead of time? Um, I would, yes.

SPEAKER_00:

Always, no matter what. I'm sure you can you're probably teeing me up for an example where I maybe would. Yes, I am. Maybe I wouldn't. But I I personally think, yes, I would like to have that information. And I can choose to do what I want with that information, but I would like to know.

SPEAKER_01:

Yes. It's interesting because I think it makes sense for certain diseases where you know that you could maybe change the natural course of the disease, right? The one, the example I gave up front of the prevent calculator for cardiovascular disease is one of them because you could, you know, take blood pressure medicine or exercise more or you know, start up, start on a satin for cholesterol or whatever it is. But then there's diseases where, like, what if you can't change their natural history? Like, would it just make you more anxious, make you more nervous? Would you, would you want to know? I'll I'll give you an example. Um, and this isn't necessarily a risk calculation, but it's an autosomal dominant disease named Huntington's disease, where if you have a parent with it, there's a 50-50 chance you're gonna get it, right? So now we we go back to that 50% weather report, except this is guaranteed, right? This is a this is a guaranteed 50% chance. But you could do a blood test if you wanted to to find out if it's 100% or 0% that's available today. And basically in this disease, it causes uh some of the nerve cells in the brain to decay over time. And then it leads to problems with movement, problems with um cognition, problems with mental health. And so in their 30s to 40s, these patients get like a very significant movement disorder where they have Korea, which is like these involuntary kind of jerking um movements and other things like trouble walking and rigidity and things like that. They get um, they have trouble with their cognitive skills, uh, they can't control their impulses, they have trouble focusing on tasks, et cetera, and they develop oftentimes, you know, depression and other mental health conditions. Um and and there's no real true treatment for it. Like there's supportive care, um, but the disease will progress along its natural course, and there's nothing you can do to stop yourself from getting Huntington's disease if you are if you have the gene for it. So I bring this example up as sort of like the opposite of like would, you know, some people who are in this 50% group, they decide to go and get the genetic testing because they want to know. And there are some people where when they have a parent with Huntington's that decide not to get the test because they'd rather not, right? They'd rather not have the anxiety, they'd rather not have that sort of oracle like quality of what's gonna happen to their lives. They'd rather live in the present. Yeah. And so I think I think this it depends on what disease state there is and how preventable it is versus how much anxiety it's gonna cause.

SPEAKER_00:

Yeah. Yeah. I think this is uh, yeah, I mean, that's fair. And it's I think a very potentially a very personal decision for somebody to decide whether or not to take this information and who should be using a tool like this is the question you're sort of raising.

SPEAKER_01:

To some extent, right? Like, should it be the patient, should it be the doctor, should it be anyone who wants to? Uh like you can't, it's like you can't unsee it once you see it.

SPEAKER_00:

Well, okay, so there's a couple of things, right? One point I want to make is that, yes, I mean, of course, uh individuals can choose what they want to do in this regard. Um, and personally, I would I I think I would want to know because I would believe that maybe there's no treatment right now, but there might be one later. And unless I'm able to track what's happening and know and push this agenda, maybe I have enough uh, let's say political clout or money or something, then maybe it can drive research towards that, right? To push the fact for a treatment for that. Now, on the flip side, uh not not on the flip side, but another point I want to make is that um if you have data from not just one individual, but lots of individuals who have this, then potentially you can seek out patterns. And those patterns might help find treatments. So having if all of the patients who have certain progressions have certain things that happen to them after after you know the current current date, then maybe there's something useful there. In fact, this paper actually does one thing that's really cool, which is they do some explainability analysis. They look back at the data analyzed by these or the trained models and try to extrapolate or extrapolate, extract different um things about why a certain pro disease progression happened a certain way. So they're trying to extract out from these patterns is there any meaning? And that's useful to do individual separate to telling the individual whether or not something is going to happen, right? And so I think there are two separate issues here. One is uh the individual wanting to know or not know, but uh there's also this sort of larger question of finding new treatments and potentially using patterns that you might see here to aid that process.

SPEAKER_01:

Yeah, I mean, you definitely make a good point. It's just it's difficult in certain situations where if there isn't a treatment available or there isn't a ton of hope. I'm like I I'm sort of the opposite from you. I'm not sure I would want to know in that type of situation. I think I'd want to just kind of peacefully live my life.

SPEAKER_00:

Well, the point is you don't know how peacefully it's going to be, right? You're just gonna keep getting anxious until that moment. Until that moment, yeah. That's kind of my thinking at least. But I respect it. I mean, I get it. People don't want to, people don't want to find out. That makes sense too.

SPEAKER_01:

Um I I'd want to find out if I could change the course, you know? Like I'd want to find out if there was something I could do to help prevent it.

SPEAKER_00:

Yeah, yeah.

SPEAKER_01:

But I'm not sure I'd want to find out if it was like, here your fate is here, your fate is sealed, and uh you there's nothing that you could do about it. That's more of that oracle like quality where like the prediction is in and nothing you do can change it. Exactly. And I like to believe, you know, it's like me over here just trying to believe in free will. No, I'm just kidding. Some of the stuff you just it's I think it's difficult.

SPEAKER_00:

Yeah. And and I think I was working, I'm working under the assumption that that's not it's not there is free will and that it's not determined, and that all this is telling you is there are some risks and you can go out and maybe find a treatment, right? The assumption being that there exists a treatment that is. Yeah, I know.

SPEAKER_01:

And that's why, you know, with the prevent calculator and cardiovascular risk, sure.

SPEAKER_00:

Yeah.

SPEAKER_01:

With Huntington's disease, no.

SPEAKER_00:

Right, right.

SPEAKER_01:

So I gave, you know, two examples on each end of the spectrum for that exact reason. And I'm sure there's a lot of stuff in between. And then when it's like, oh, we run this panel on a thousand diseases.

SPEAKER_00:

Yeah.

SPEAKER_01:

All right. Well, I think we are ready to wrap up with our takeaways. I mean, these transformer models can really help predict disease risk, kind of like the weather, right? There's a 70% chance of heart disease or whatever it is.

SPEAKER_00:

And cool morbid cool morbidity stuff too, right? It's not just one disease.

SPEAKER_01:

Yeah, absolutely. A thousand plus diseases, right? And I think these are still in somewhat early stages, right? We could incorporate more data, potentially even making them better, things like um zip code, blood tests, wearables, et cetera. Because this they didn't even get to look at that yet in this paper, and the predictions were still pretty good. I still remain with this existential question of like fate versus free will. Um, and I think it's an important one to consider when we think about the ethics of making predictions like this. Because I do think if we can change the course of events, it makes a lot of sense. I think if we can't, the waters are a little bit muddy.

SPEAKER_00:

Yeah, that's fair.

SPEAKER_01:

All right. Thanks for joining us. We'll see you next time on Code and Cure.

People on this episode