Code & Cure

#35 - How AI Image Generators Portray Substance Use Disorder

Vasanth Sarathy & Laura Hagopian

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 20:06

What does an AI-generated image of addiction look like, and why does it so often default to darkness, isolation, and despair? As AI tools make it easier than ever to produce visuals for health education, those same tools can unintentionally reinforce stigma about substance use disorder.

In this episode, we explore how AI image generators shape the way addiction is portrayed. Laura brings the perspective from emergency medicine and digital health, where substance use disorder is part of everyday clinical reality and where language and imagery can influence how patients are perceived. Vasanth breaks down the technical side, explaining how diffusion models create images by gradually denoising noise into structured visuals, guided by text prompts that steer what the model produces.

That process is powerful, but it also means biases from internet training data and the connotations embedded in words can compound. The result? AI outputs that repeatedly frame addiction through dramatic “rock bottom” scenes, lone figures, and visual cues that unintentionally reinforce shame rather than understanding.

We also look at research that systematically tests prompts and applies best-practice guidelines for more respectful depictions. The difference is striking: fewer stigmatizing signals, more human-centered imagery, and practical guardrails such as avoiding drug paraphernalia and moving beyond the isolated, ashamed figure. But sanitization has a price. For healthcare AI teams, the lesson is clear: visuals should be treated like clinical content, not decoration, with thoughtful review processes that protect dignity and support stigma-free health communication.

Reference:

AI-Generated Images of Substance Use and Recovery: Mixed Methods Case Study
Heley et al.
JMIR AI (2026)

Credits:

Theme music: Nowhere Land, Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0
https://creativecommons.org/licenses/by/4.0/

tigma Hidden In Generated Images

SPEAKER_01

When algorithms visualize substance use, stigma can sneak into the pixels.

SPEAKER_00

Hello, and welcome back to Code and Cure, where we decode health in the age of AI. My name is Vasant Sarathi. I'm a cognitive scientist and AI researcher, and I'm here with Laura Hagopian.

SPEAKER_01

I'm an emergency medicine physician and I work in digital health.

SPEAKER_00

I think that was a better opener there. I didn't have the alliteration that I had the last time.

SPEAKER_01

Discuss decoding. Yeah, yeah. It was better this time around.

hy People Generate SUD Images

SPEAKER_00

Yeah. So what are we decoding today? That's right. What are we decoding today?

SPEAKER_01

We are discussing AI-generated images of a substance use disorder, which is something that I saw all the time as an emergency physician working in the inner city, working on night shifts. So I I think this is an interesting one because it's so common and at the same time, there's so much stigma associated with it.

SPEAKER_00

Yeah, for sure. And you know, I think one of the uh in this paper they talk about um using AI to generate images. So my first question was what I mean, what are people generating images for exactly? And you know, it seemed like there's a whole host of reasons why people would generate images related to substance use disorder, um, or you know, for brochures, for example.

SPEAKER_01

Yeah, I think, you know, public health communication, say you had like a newspaper article, or say you're, you know, training your future providers. I think there's lots of reasons to want an image around the concept of substance use disorder. Um, that's like anonymous too, right? That's you know, because of the stigma associated with it, it's not like actually a person, right? And so AI is like a great way to do that in theory. It's easy, it's quick, it's um, you know, it's not a not a real human, it's low cost, and then you can create images that can be used, um, like you said, in brochures, uh, for helping families, for substance use recovery centers. You could use it to um to help educate providers, et cetera. And so having the ability to have images, we're an image-driven society, right? Everybody wants to see videos, see pictures, right, etc., to be able to illustrate what's happening, what's the difference? Yeah, right, right. Exactly. And so, I mean, I use AI to generate images often, and sometimes they come out great and sometimes not so great. And sometimes I just need to tweak uh and and adjust my prompts to get the image to show what I really want it to show.

SPEAKER_00

Yeah, yeah, yeah, for sure. And there's obviously a whole host of technologies associated with that. You know, people even use it for art and for other purposes, but here we're focused specifically on generating these kinds of images that have um, you know, a connection, a clinical connection.

SPEAKER_01

Yeah. And uh when I started to read this paper, what popped into my head first, and I kind of want you to speak to this, is hey, if you're creating images based on something that was just trained on all the internet data, is the stigma from the internet going to come through? Right. And we've had prior episodes where we talked about uh, you know, terminology that people use on Reddit, for example, around substance use disorder. Um in a in a clinical setting, we tend to be very careful about how what terminology we we use. For example, you don't say the word addict, you don't say like, hey, you're an addict. Yeah. People may use that colloquially, but we'd say, you know, we would use first person language. This is a person who has substance use disorder.

SPEAKER_00

Yeah.

SPEAKER_01

Um, and and so there are a lot of guidelines about how you speak about this.

SPEAKER_00

Yeah.

SPEAKER_01

Um, and there happen to be guidelines about image creation around this that you know, maybe the LLMs out there don't know. So I guess that's my first question to you is like before you even read the paper, what what would you expect knowing how LLMs are trained? What would you expect them to do when asked to create images about substance use and substance use recovery?

SPEAKER_00

I mean, the short answer is I would expect them to create the most generic stereotype biased image you would have, right? Because they are trained on all of the internet data. And it's not necessarily just LLMs, right? When you have an LLM is a large language model, and usually the input and output to that is text. Um, when it comes to images, it's a different kind of model. It's called the what's most popular right now is something called a diffusion model. And a diffusion model works differently in that it what the way you train one of these is actually pretty cool. The way you train one of these is you give it, say you have lots of images, good images that you care about, and you feed those images into the system, and when it trains, it actually keeps adding noise to the images up till the point where it just becomes a completely grained, grainy image. And then it learns during the learning process to retrieve that original image from the grainy image.

SPEAKER_01

Oh, interesting.

SPEAKER_00

By what they call denoising. And this is kind of what's what's very popular in in image generation is diffusion model. So once it's trained, the input is just a noisy, is a bunch of noise, and then it generates an image. Now, when you do that by itself, then it can generate whatever image, right? So what you want now to say is what if I gave it some text to condition that image or to make it generate a particular kind of uh image from that noise? So that's what um these these models are using. So what they do is they take the text that you're giving it and then they turn that into something that can be input into the diffusion model to make it attend to certain words more than others.

SPEAKER_01

So explain something to me. Like when they create these images, the output is an image, right? Is that being trained on text data and image data to get the output that you want? Or how does that work?

ow Text Prompts Import Bias

hat Stigmatizing Images Look Like

SPEAKER_00

Yeah, sometimes they sometimes they are. I mean, there's a there's a whole host of different techniques to do that. Sometimes they're only image-based, sometimes they're text and image-based, but sometimes they're kind of separate too, like the text piece just acts separately. So the biases that the text piece brings in. So when when you give these text-only models a piece of text, they convert that into some numbers. That numbers inherently carries all the bias of the data. So if you combine these image generation models with the text models, which is what a lot of these systems do, they're bringing in the bias from both. And the bias from the text model is what drives the meaning of words like substance abuse. And that meaning then causes the diffusion model, the image generation model, to go in a particular direction and generate a particular kind of image. So when combined, these two are in fact amplifying some of the biases that they both carry in the general uh space of the internet, images and text. So when you have uh you know a substance use type example, the example that is highly biased is anything that shows somebody in their most hopeless possible state, um, with all of the drug paraphernalia that that might exist to suggest that it's a drug thing, and uh, you know, um in potentially environments that are about as dramatic as one can get, right?

SPEAKER_01

Yeah.

SPEAKER_00

And so these are all the things that are naturally biased, and that's what causes this the stigmatizing effect, right?

SPEAKER_01

And so And so it's like I'm looking at the images that were generated in this paper, and like you're you're obviously hitting the nail right on the head here. So when they use sort of generic, you know, general terms, whether they were stigmatizing or that or used this person first language, like persons with substance use disorder. If they just went into Chat GPT and said, Hey, make an image of this, like please make an image. They used please, please make an image of a person with a substance use disorder, a person, please make an image of an addict. That's exactly what came out. Um there was a lot of stigmatizing elements here, and they would show active use, drug paraphernalia, that kind of stuff. All of them are in this like very dark setting, um, very, very dark. And there's this sort of dramatic effect where someone has like hit the bottom of their life. There's one of them where it's like the person's in chains. They're like, it looks like they're in like, yeah, the cellar, they're in chains, and there's all these like needles attached to the chains and syringes. It's like, you know, very, very dramatic. And so, yes, there's this like image of a of a struggle of disappointment. Um, there's one image where the person has like the hood up over their head, they're in a public area. Um, they're and a lot of them have this isolation. Now, the prompt is show an image of a person. So that it's showing an image of one person, but there is this sort of isolation that goes along with it and and this chaos that goes along with it too. So it is interesting that at baseline it's it's depicting all of those things that from a clinical angle I would not want to see depicted. Yeah. But at baseline, taking all the information from the internet, that's what gets kind of put in there. I would never want any of those images in a in a brochure in a teaching moment. Right. Some of them are very from a brochure. They're not accurate either, right? This one in chains with like seven, you know, syringes injecting the chains. Like that doesn't even make sense.

emographics And Stigma Coding Results

SPEAKER_00

Yeah. And this study also, I think, had this, they they they looked at all the images that they produced and they coded them based on demographics and whether they contained one or more people and all kinds of other interesting things.

SPEAKER_01

Yeah. So in this sort of base scenario that we're talking about, where they went in and just used sort of general terms and said, make an image without giving it much more information, it was basically like always one person. Yeah. And the prompt does say a person. So, but it was always one person. They were almost always white, they were almost always male. And um and there was usually some sort of recovery signifier. But they went through and coded, hey, was it stigmatizing? And um, if so, like how stigmatizing, how many criteria? And they came up with these like six criteria, like, oh, is it dark? Does it show paraphernalia? Does it show an internal sense of shame, et cetera? And most of these were were uh stigmatizing to some degree. And then was it humanizing? Did it show any sort of happiness or peace? Did it show any um like social support? Yeah. And for the most part, you know, in in the base scenario, the answer to that was yes, stigmatizing and um not humanizing.

SPEAKER_00

Right, right, right, right, right. Which, like I said, is exactly what you would expect with these with these systems. And people working with these images presumably are hoping to iterate on them and improve them.

sing Guidelines To Improve Prompts

SPEAKER_01

Now that's what they did, right? The question is how exactly do you think they tried to do that. So, what did they do exactly? So, what they did was they they gave it guidelines. They either, you know, had Chat GPT write prompts associated with best practices, like, oh, let's make it respectful, let's make it compassionate, or they actually fed it guidelines that exist for how you would create an image of substance use disorder. And they said, Hey, adhere to these guidelines.

SPEAKER_00

Right.

SPEAKER_01

And it definitely changed the images.

SPEAKER_00

Yeah, and I think what was we can talk about some of those guidelines, right? They were the the stigma guidelines had features like you know, incorporate not just one person, right? And and and not always be in a dark setting and so on and so forth, right?

SPEAKER_01

Exactly. Don't show drug paraphernalia.

SPEAKER_00

Right, right.

SPEAKER_01

Um, and and so what they found was there was a significant difference, and it still probably needs more. Right. Um, and so in this case, in the second case, when there were guidelines.