The Hidden Risks of Using AI Tools in Medical Research
—In MIAMI—

American Board of Cosmetic Surgery American Board of Facial Cosmetic Surger American College of Surgeons Logo Fellow American College of Surgeons Logo American Institute of Plastic Surgeons Logo National Academy of Plastic Surgeons Logo The American Board of Surgery Logo American Board of Medical Specialists Logo

(Especially for Plastic Surgery)

AI tools can seem helpful and even personable, but they lack the reliability and judgment of human medical professionals.

Artificial intelligence (AI) chatbots like ChatGPT (by OpenAI) and Google’s Gemini are rapidly becoming go-to resources for quick answers on medical topics. It’s easy to see the appeal: they can summarize complex information in plain language and are available 24/7. But in a specialized field like plastic surgery, relying on AI for research comes with serious risks. One major concern is AI “hallucinations” – instances where the AI confidently generates false or fabricated information. This blog will explain what AI hallucinations are, why AI is not always a reliable source for medical research, and how these errors can harm patients, mislead healthcare providers, or damage reputations. We’ll also discuss how to verify information through trusted sources (like board certification listings and real patient reviews) to stay safe and informed.

What Are AI Hallucinations?

In the context of AI, hallucination refers to a convincing but false output – the AI essentially “makes something up.” According to a formal definition in a medical journal, an AI hallucination occurs when an AI system produces demonstrably incorrect or misleading output that appears confident and plausible despite being factually wrong. In other words, the chatbot might state a fabricated fact, cite a study that doesn’t exist, or give an answer that sounds right but is entirely incorrect.

These tools don’t intend to lie – it’s a byproduct of how they work. Models like ChatGPT are trained to predict likely words and phrases based on patterns in vast text data. They do not verify facts in a database the way a search engine does; they generate responses statistically. The AI may fill gaps with its best guess if the prompt goes beyond its knowledge or contains ambiguous data. The result can be a very confident-sounding fiction. Researchers estimate that chatbots like ChatGPT “hallucinate” content roughly 20–30% of the time, and nearly half of their longer responses may contain factual errors. Even Google’s own AI has acknowledged this issue – a Google AI executive identified reducing hallucinations as a “fundamental” task for improving their chatbot.

Why AI Isn’t Always Reliable for Medical Research

Lack of Source Verification and Expert Oversight

Traditional medical research involves consulting textbooks, published papers, or guidelines – sources that can be traced and verified. AI-generated answers, by contrast, typically don’t cite sources by default (unless explicitly asked, and even then, the references might be made-up). The AI might pull together bits of information from its training data, which could include outdated research or even inaccuracies, and present it in a coherent paragraph. Crucially, it has no built-in mechanism to fact-check itself. As one professor quipped, ChatGPT is like “an omniscient, eager-to-please intern who sometimes lies to you.” It will confidently deliver an answer, and without expertise, you might not realize parts of it are wrong.

In medicine, incorrect information can be dangerous. Imagine an AI that mistakenly claims a certain medication is safe during pregnancy, or suggests a nonexistent “new technique” in plastic surgery that hasn’t been proven. The Journal of Medical Internet Research published a 2024 study comparing ChatGPT and Google’s Bard (Gemini) on medical literature tasks. The results were eye-opening: ChatGPT-3.5 produced false references in about 40% of cases, and even the more advanced GPT-4 version hallucinated references ~28% of the time. Google’s model fared worse – in that analysis, Bard (Gemini) output made up citations in over 91% of its answers. In academic research terms, that error rate is unacceptable. The authors noted that this high rate of spurious output “highlights the necessity for refining [AI models] before confidently using them for rigorous academic purposes.” Until those refinements happen, any information from an AI needs human verification.

Outdated or Generic Knowledge Base

Another limitation is that AI models have fixed training cut-off dates. For instance, if an AI was trained on data up to 2021, it wouldn’t know about later breakthroughs, new surgical techniques, or the latest FDA approvals. Medicine changes quickly. Relying on an AI for “current” research could yield answers that are several years behind the times. In fast-evolving fields like cosmetic and reconstructive surgery techniques, this is a big drawback. You might miss out on newer, safer procedure options or updated medical consensus.

Additionally, if you ask a highly specialized question, the AI may revert to generic knowledge. For example, someone asking about a cutting-edge breast reconstruction technique might get a response about general breast surgery risks, which, while not entirely wrong, might not address the specific nuances of the new technique. This can be misleading by omission, giving a false sense of understanding. It’s always better to consult recent journal articles or a board-certified specialist for the latest and most applicable information.

No Clinical Judgment or Context

AI chatbots also lack the clinical context and judgment that medical professionals have. They cannot tailor advice to an individual’s unique situation the way a doctor can during a consultation. For example, consider a patient who asks an AI, “Is it safe for me to have a tummy tuck if I have diabetes?” The AI might produce a general answer about diabetes increasing surgical risks (true) and advise consultation with a doctor (fine) – but it won’t be able to evaluate that specific patient’s health metrics, control of blood sugar, etc. There’s a big difference between general medical info and personal medical advice. AI is only capable of the former.

Even in providing general info, the AI might misinterpret the question. If it latches onto the phrases “tummy tuck” and “diabetes” separately, it might hallucinate a connection or guideline that doesn’t exist. One study on medical record summarization found that AI models often introduced inaccuracies about symptoms and diagnoses. For instance, a patient record noted “sore throat due to COVID-19,” but an AI summary misstated it as a throat infection, which could have led clinicians down the wrong treatment path. Without real-world judgment, an AI doesn’t know which details are critical and which are coincidental, and that can result in dangerous conflations.

Hallucinations in Action: Examples That Hit Home

It’s one thing to talk about hallucinations in theory. Let’s look at some real (and hypothetical) examples of how AI-generated misinformation could play out in the world of plastic surgery and beyond:

How to Safely Use AI as a Supplement (Not a Substitute)

After hearing about these pitfalls, you might wonder if AI tools have any place in medical research or patient education. The answer is that they can be useful if used cautiously and responsibly, as a starting point or adjunct, never the sole source of truth. Here are some tips to protect yourself and get the best out of both AI and reliable resources:

Conclusion: Informed Caution is Key

AI tools like ChatGPT and Gemini are impressive and here to stay, and they may play a growing role in how we access health information. They can be helpful for generating quick summaries or offering educational content in simple language. However, as we’ve detailed, they are far from infallible. AI hallucinations – confidently stated falsehoods – pose a real risk in medical research and communication. In plastic surgery, where details about procedures, credentials, and patient safety are paramount, such misinformation can lead to poor decisions or erode trust.

The bottom line for patients and providers is this: use AI with a healthy dose of skepticism. Verify any critical information through reliable sources. For patients, your health is too important to entrust to an algorithm – always loop back to a qualified medical professional (ideally an experienced, board-certified plastic surgeon for cosmetic questions) before making decisions. For healthcare providers and researchers, think of AI as a tool that requires supervision, not a colleague with expertise. As Judge Brantley Starr wrote in a court order after encountering AI-generated falsehoods, “these systems hold no allegiance to the truth.” It’s a powerful reminder that we must be the ones to uphold accuracy and integrity, whether in the operating room or on the page.

Stay informed, stay critical, and when in doubt, trust the humans who have dedicated their lives to mastering the art and science of medicine. Your safety and well-being deserve nothing less.

Sources:

Get a one on one consultation with Dr. Z in his beautiful practice in Miami, FL

7540 SW 61 Ave, South Miami, FL 33143

By submitting this you agree to be contacted by Zuri Plastic Surgery via text, call or email. Standard rates may apply. For more details, read our Privacy Policy.

Call Today 786.804.1603