Medtech Musings

To Err is (In)Human: AI Is Not Infallible

It has erred in the past and will likely slip up in the future because—well, nothing is perfect.

By: Michael Barbella

Managing Editor

Photo: VectographyStudio/stock.adobe.com

Nothing in this world is perfect.

Every thing—living beings and inanimate objects alike—are flawed. Even technology that seems faultless is far from ideal.

Perfection remains a constant pursuit, despite its futility. Yet in recent years, artificial intelligence (AI) has triggered a quiet, false sense of hope for finally reaching the promised land. 

Contrary to popular belief, though, AI is not infallible. It has erred in the past and will likely slip up in the future because—well, nothing is perfect. Machine intelligence included.

“…we must not forget that technology is imperfect,” a March 2024 Nature article noted. “AI systems will make mistakes, malfunction, or even break down. Mistakes can include biased outcomes, ‘hallucinations’ and AI drift…”

Can and have included all three, actually: Racial bias turned up in a U.S. hospital algorithm designed to identify high-risk patients needing complex care, as well as in AI skin cancer assessors trained predominantly on images of lighter skin. 

AI drift, on the other hand, misidentified hospitalization and mortality risk for 18,300 U.S. veterans over a five-year period. A study published last year that tracked the algorithm’s performance from 2016 to 2021 (covering both pre-pandemic operations and COVID-19) found the AI mechanism’s ability to correctly identify high-risk patients fell 4% and its overall performance dropped by 4.6%. The model also generated 0.34% more “false alarms.”

“Clinical risk algorithms inform clinical decision support and system-level quality metrics,” the JAMA Health Forum authors stated in their 2025 study. “However, algorithm performance can drift over time and possibly promote misinformed decision-making and resource allocation.”

Hallucinations can produce the same misguided outcomes.

Much like humans, artificial intelligence can sometimes perceive patterns or signals that simply aren’t there. Unlike humans, however, AI hallucinations stem from overfitting, training data bias/inaccuracy, and high model complexity rather than underactive neural activity.

Still, both AI and human hallucinations exist on a spectrum, stretching from the trivial to the profound. The AI-enabled transcription tool Whisper, for example, often invents text, and ChatGPT was caught fabricating scientific paper titles and providing erroneous citations for short write-ups on homocystinuria-associated osteoporosis, and late-onset Pompe disease.

Even worse, last summer, an AI “therapy chatbot” recommended a “small hit” of methamphetamine to a drug addict (calling himself “Pedro”) to help him “get through [the] week.” “Go ahead, take that small hit, and you’ll be fine,” the chatbot told Pedro.

An equally colorful example of AI hallucination occurred two years ago when a large language model (LLM) believed a fake illness was real.

Driving the hallucination was University of Gothenburg (Sweden) medical researcher Almira Osmanovic Thunström, who wanted to determine whether LLMs would fall for bogus health information.

She invented a disease­­—Bixonimania—and symptoms (sore, itchy eyes; discolored eyelids), created a fake research study around the condition, and uploaded a “preprint” of the research paper to several servers.

“I wanted to see if I [could] create a medical condition that did not exist in the database,” Thunstrom told Nature. “I wanted to be clear to any physician or any medical staff that this is a made-up condition, because no eye condition would be called mania, that’s a psychiatric term.” 

Such logic was lost on the LLM, as was the lead author, Lazljiv Izgubjenovic, which in Slovenian, translates into “The Lying Loser.” In addition, the LLM failed to question the study’s non-existent university or fake city, and either ignored or dismissed warnings like “this entire paper is made up” and “fifty made-up individuals aged between 20 and 50 years were recruited for the exposure group.”

Despite such obvious clues, the LLMs promoted Bixonimania as a real condition. Within weeks, the fake papers were cited in peer-reviewed literature, and Indian researchers actually referenced the fake preprints in a study published in the journal Cureus

“The Bixonimania experiment was never about exposing LLMs as flawed tools, or arguing they have no place in medicine. They do,” Thunström wrote in a LinkedIn post. “It was about demonstrating that any system can be infiltrated…”

Indeed, any AI system can be infiltrated, but as evidenced with the Bixonimania experiment, such flawed system can lead to trouble.

Such was the case with TruDi Navigation System, a device used to treat chronic sinusitis (sinus inflammation). Five years ago, the product’s developer enhanced the system with a machine-learning algorithm to help ENT specialists in surgeries.

Since AI was added to the device, however, the U.S. Food and Drug Administration (FDA) has received unconfirmed reports of at least 100 malfunctions and adverse events, with 10 people suffering injuries between late 2021 and November 2025, according to Reuters, which cited FDA data.

Integra LifeSciences, which purchased the TruDi Navigation System and its developer (Acclarent) in 2024, said there was no credible evidence to show “any casual connection” between the product, AI technology, and any injuries.

Maybe it’s just another AI hallucination.

Keep Up With Our Content. Subscribe To Medical Product Outsourcing Newsletters