(Especially for Plastic Surgery)
AI tools can seem helpful and even personable, but they lack the reliability and judgment of human medical professionals.
Artificial intelligence (AI) chatbots like ChatGPT (by OpenAI) and Google’s Gemini are rapidly becoming go-to resources for quick answers on medical topics. It’s easy to see the appeal: they can summarize complex information in plain language and are available 24/7. But in a specialized field like plastic surgery, relying on AI for research comes with serious risks. One major concern is AI “hallucinations” – instances where the AI confidently generates false or fabricated information. This blog will explain what AI hallucinations are, why AI is not always a reliable source for medical research, and how these errors can harm patients, mislead healthcare providers, or damage reputations. We’ll also discuss how to verify information through trusted sources (like board certification listings and real patient reviews) to stay safe and informed.
What Are AI Hallucinations?
In the context of AI, hallucination refers to a convincing but false output – the AI essentially “makes something up.” According to a formal definition in a medical journal, an AI hallucination occurs when an AI system produces demonstrably incorrect or misleading output that appears confident and plausible despite being factually wrong. In other words, the chatbot might state a fabricated fact, cite a study that doesn’t exist, or give an answer that sounds right but is entirely incorrect.
These tools don’t intend to lie – it’s a byproduct of how they work. Models like ChatGPT are trained to predict likely words and phrases based on patterns in vast text data. They do not verify facts in a database the way a search engine does; they generate responses statistically. The AI may fill gaps with its best guess if the prompt goes beyond its knowledge or contains ambiguous data. The result can be a very confident-sounding fiction. Researchers estimate that chatbots like ChatGPT “hallucinate” content roughly 20–30% of the time, and nearly half of their longer responses may contain factual errors. Even Google’s own AI has acknowledged this issue – a Google AI executive identified reducing hallucinations as a “fundamental” task for improving their chatbot.
Why AI Isn’t Always Reliable for Medical Research
Lack of Source Verification and Expert Oversight
Traditional medical research involves consulting textbooks, published papers, or guidelines – sources that can be traced and verified. AI-generated answers, by contrast, typically don’t cite sources by default (unless explicitly asked, and even then, the references might be made-up). The AI might pull together bits of information from its training data, which could include outdated research or even inaccuracies, and present it in a coherent paragraph. Crucially, it has no built-in mechanism to fact-check itself. As one professor quipped, ChatGPT is like “an omniscient, eager-to-please intern who sometimes lies to you.” It will confidently deliver an answer, and without expertise, you might not realize parts of it are wrong.
In medicine, incorrect information can be dangerous. Imagine an AI that mistakenly claims a certain medication is safe during pregnancy, or suggests a nonexistent “new technique” in plastic surgery that hasn’t been proven. The Journal of Medical Internet Research published a 2024 study comparing ChatGPT and Google’s Bard (Gemini) on medical literature tasks. The results were eye-opening: ChatGPT-3.5 produced false references in about 40% of cases, and even the more advanced GPT-4 version hallucinated references ~28% of the time. Google’s model fared worse – in that analysis, Bard (Gemini) output made up citations in over 91% of its answers. In academic research terms, that error rate is unacceptable. The authors noted that this high rate of spurious output “highlights the necessity for refining [AI models] before confidently using them for rigorous academic purposes.” Until those refinements happen, any information from an AI needs human verification.
Outdated or Generic Knowledge Base
Another limitation is that AI models have fixed training cut-off dates. For instance, if an AI was trained on data up to 2021, it wouldn’t know about later breakthroughs, new surgical techniques, or the latest FDA approvals. Medicine changes quickly. Relying on an AI for “current” research could yield answers that are several years behind the times. In fast-evolving fields like cosmetic and reconstructive surgery techniques, this is a big drawback. You might miss out on newer, safer procedure options or updated medical consensus.
Additionally, if you ask a highly specialized question, the AI may revert to generic knowledge. For example, someone asking about a cutting-edge breast reconstruction technique might get a response about general breast surgery risks, which, while not entirely wrong, might not address the specific nuances of the new technique. This can be misleading by omission, giving a false sense of understanding. It’s always better to consult recent journal articles or a board-certified specialist for the latest and most applicable information.
No Clinical Judgment or Context
AI chatbots also lack the clinical context and judgment that medical professionals have. They cannot tailor advice to an individual’s unique situation the way a doctor can during a consultation. For example, consider a patient who asks an AI, “Is it safe for me to have a tummy tuck if I have diabetes?” The AI might produce a general answer about diabetes increasing surgical risks (true) and advise consultation with a doctor (fine) – but it won’t be able to evaluate that specific patient’s health metrics, control of blood sugar, etc. There’s a big difference between general medical info and personal medical advice. AI is only capable of the former.
Even in providing general info, the AI might misinterpret the question. If it latches onto the phrases “tummy tuck” and “diabetes” separately, it might hallucinate a connection or guideline that doesn’t exist. One study on medical record summarization found that AI models often introduced inaccuracies about symptoms and diagnoses. For instance, a patient record noted “sore throat due to COVID-19,” but an AI summary misstated it as a throat infection, which could have led clinicians down the wrong treatment path. Without real-world judgment, an AI doesn’t know which details are critical and which are coincidental, and that can result in dangerous conflations.
Hallucinations in Action: Examples That Hit Home
It’s one thing to talk about hallucinations in theory. Let’s look at some real (and hypothetical) examples of how AI-generated misinformation could play out in the world of plastic surgery and beyond:
- Bogus Treatment Recommendations: A recent JAMA Oncology study evaluated ChatGPT’s advice for cancer treatments. It found that over one-third of ChatGPT’s recommendations for breast, prostate, and lung cancer care were inappropriate or non-concordant with medical guidelines. In a high-stakes field like oncology, that rate of wrong answers is alarming. Now translate this to plastic surgery – if a patient asked about options for a complex reconstructive surgery, an AI might suggest an approach that sounds reasonable but isn’t the current standard of care. Patients could be misled into asking for procedures that aren’t suitable for them (or aren’t even real!). As Dr. Danielle Bitterman, the lead researcher of the oncology study, put it: “Patients need to be aware that the medical advice they get with ChatGPT may be false. It’s not trained – and more importantly, not clinically validated – to make these types of recommendations.” Always double-check treatment advice with a qualified physician.
- Phony Research Citations: Perhaps you’re a provider or student doing a literature review. You ask an AI to “give five references about the safety of XYZ filler injections.” The chatbot returns a nicely formatted list of studies and authors. However, unless you verify each one, you might be citing studies that don’t actually exist. There have been cases of attorneys and researchers unknowingly relying on AI-fabricated citations, with embarrassing and costly consequences. In one highly publicized incident, a lawyer submitted a legal brief containing six case precedents that ChatGPT completely made up – fictional cases with real-looking citations. The result? The motion was dismissed, and the lawyers were fined for the “gibberish” they filed. In scientific writing, imagine the damage to your credibility if you base arguments on references that later turn out to be AI inventions. Hallucinated citations undermine the integrity of research. Always use PubMed, Google Scholar, or official journals to confirm that a reference is real and says what the AI claims.
- Misleading Provider Credentials: Now, consider a patient researching plastic surgeons. Let’s say someone asks a chatbot, “Is Dr. X board-certified, and what have patients said about her?” If the AI has limited or mixed data, it might incorrectly say Dr. X is not board-certified (when in fact she is), or it could confuse Dr. X with another doctor of a similar name. This is a reputational landmine. Board certification status should be verified through official boards, such as by checking the surgeon’s name on the American Board of Plastic Surgery (ABPS) website. (The ABPS is part of the ABMS, which represents the gold standard in physician certification.) An AI cannot be entrusted to accurately report such details. At Zuri Plastic Surgery, for instance, our lead surgeon, Dr. Alexander Zuriarrain (Dr. Z), is quadruple board-certified by the ABPS as well as boards in general, cosmetic, and facial cosmetic surgery. That level of credentialing is rare, and it’s a point of pride. We would not want an internet chatbot to erroneously tell potential patients otherwise. Always cross-check a surgeon’s credentials on official sites or their own practice website, rather than taking an AI’s word. The ABPS even offers a public “Is your Surgeon Certified?” lookup tool for verification.
- Made-Up Patient Testimonials: Along similar lines, AI might fabricate or misattribute patient reviews. If asked, “What do people say about Dr. Z’s clinic?”, a chatbot doesn’t have real-time access to Google Reviews or Yelp unless trained on them, and if trained, it might generalize or even invent sentiments. For example, it might respond with something like, “Patients frequently mention great results and caring staff.” That doesn’t really tell you much, and there’s a risk it could be pulling in comments from another clinic or just using boilerplate language. In contrast, reading actual patient testimonials on trusted platforms gives you unfiltered feedback. Zuri Plastic Surgery’s Google Business Profile, for instance, reflects a 4.9-star rating from over 500 real patient reviews. Those reviews include specific details of patients’ experiences (for example, one patient describes how Dr. Z and his staff “made me feel valued and respected”). You simply won’t get that level of authentic detail from an AI summary. Rely on reputable review sites and profiles for an accurate picture of patient satisfaction.
- Dangerous Health Advice (a hypothetical scenario): Consider a patient recovering from a facelift who asks an AI, “I have some chest pain and swelling in my leg a week after surgery – should I be concerned?” That combination of symptoms could indicate a serious post-surgical complication (like a blood clot or pulmonary embolism). A properly trained medical professional would treat this as an emergency until proven otherwise. But an AI without context might output something like, “Chest pain and leg swelling can sometimes occur after surgery due to inactivity. Try walking around, and if it persists, see a doctor.” Such a hallucinated sense of reassurance could be life-threatening if the patient delays seeking care. While this example is hypothetical, it underscores a key point: AI is not a doctor. It cannot perform a physical exam, gauge severity, or triage situations appropriately. Any concerning or complex symptoms should be evaluated by a healthcare provider in person. AI hallucinations in describing symptoms or severity could lead to dangerous complacency or panic. In fact, a Harvard-affiliated hospital study in late 2023 warned that patients relying on ChatGPT for cancer treatment advice were often led to non-guideline-concordant recommendations, and that false confidence in such AI advice is risky.
- Reputational Harm and Defamation: We’ve touched on reputation in terms of credentials and reviews, but there’s a broader issue, too. AI can potentially defame individuals or organizations by stating false information as if it were true. A dramatic real-world example: in 2023, a radio host in Georgia discovered ChatGPT had inaccurately summarized a legal complaint, falsely stating he was accused of fraud and embezzlement – completely untrue and unrelated to him. He filed a defamation lawsuit against OpenAI. Now, imagine an AI being asked about a medical practice and it erroneously claims something like, “Clinic Y was involved in a malpractice lawsuit in 2020.” If that never happened, the clinic’s reputation is unfairly tarnished by a bot’s fabrication. This is not science fiction – it’s happening. The takeaway: treat any surprising or negative claim from an AI about a person or clinic with skepticism and verify it through other sources. Misinformation can spread easily, and healthcare providers are understandably worried about AI-driven falsehoods causing patients to lose trust unjustly.
How to Safely Use AI as a Supplement (Not a Substitute)
After hearing about these pitfalls, you might wonder if AI tools have any place in medical research or patient education. The answer is that they can be useful if used cautiously and responsibly, as a starting point or adjunct, never the sole source of truth. Here are some tips to protect yourself and get the best out of both AI and reliable resources:
- Double-Check Everything: Treat AI outputs as unverified drafts. If an AI gives you a statistic (e.g., “90% of patients are satisfied after XYZ procedure”), look for that statistic in a credible source. For instance, you could search PubMed or a journal article to see if that figure holds up. If an AI cites a study or guideline, try to find it. Never assume the AI’s answer is 100% correct without confirmation.
- Use Official and Peer-Reviewed Sources: For medical facts, prioritize information from peer-reviewed journals, medical textbooks, or official guidelines. Websites of respected organizations (like the American Society of Plastic Surgeons, the American Board of Plastic Surgery, or accredited medical centers) are also trustworthy. If you want to check a surgeon’s background, use the ABPS verification tool or the Certification Matters site by ABMS to confirm board certification. These sources have authoritative weight – an AI’s summary does not.
- Leverage AI for Simple Language, Not Final Answers: One potentially positive use of a chatbot is to take very dense or jargon-heavy text and ask it to explain in simpler terms. For example, you could paste an excerpt of a journal article and prompt, “Explain this in plain English.” This can give you a more digestible version to aid your understanding, which you can then discuss with your doctor. But remember, the simplification may lose nuance or even twist meaning (another form of hallucination), so use it only to prep for a conversation with a professional, not as medical advice on its own.
- Be Specific in Prompts and Look for Nuance: If you do turn to AI, the quality of the answer can depend on your question. Vague questions get vague (and more error-prone) answers. Try to frame questions clearly, and even then remain critical of the response. For instance, instead of asking “Is liposuction safe?”, ask “What are the recognized risks of liposuction for a healthy 45-year-old woman?” Then cross-check those risks with a reputable source (like an ASPS patient safety page). A good answer will often mention the need for personalized medical consultation, because a trustworthy AI should know its limits. As one Radiology Society article noted, “it’s dangerous to rely on ChatGPT to make a clinical decision” – it’s okay for generating a template or discussing general knowledge, but not for definitive guidance.
- Prefer Human Interaction for Medical Decisions: AI might save time when searching general knowledge, but there is no replacement for a consultation with a qualified medical professional, especially in plastic surgery. Surgeons consider your medical history, anatomy, and personal goals – things no chatbot can truly understand. Use AI to brainstorm questions to ask your doctor, not to be your doctor. If you’re a patient, it’s fine to do preliminary research (we love well-informed patients!), but bring those findings to a board-certified surgeon who can confirm what’s accurate and what isn’t. If you’re a provider or researcher, think of AI as an assistant for mundane tasks (like transcribing or translating text), but apply rigorous oversight to anything it produces that will inform patient care or academic work.
- Stay Updated on AI Improvements: The landscape of AI is changing fast. Developers are aware of the hallucination problem and are working on it. Future versions of these tools may have improved accuracy, on-the-fly fact-checking, or the ability to cite sources more reliably. For now, though, remain cautious. Keep an eye on validation studies. (For example, if tomorrow a study shows “Gemini passed a plastic surgery board exam with 100% accuracy,” that would be interesting – but as of today, we haven’t seen such a level of perfection, and human experts remain the gold standard.)
Conclusion: Informed Caution is Key
AI tools like ChatGPT and Gemini are impressive and here to stay, and they may play a growing role in how we access health information. They can be helpful for generating quick summaries or offering educational content in simple language. However, as we’ve detailed, they are far from infallible. AI hallucinations – confidently stated falsehoods – pose a real risk in medical research and communication. In plastic surgery, where details about procedures, credentials, and patient safety are paramount, such misinformation can lead to poor decisions or erode trust.
The bottom line for patients and providers is this: use AI with a healthy dose of skepticism. Verify any critical information through reliable sources. For patients, your health is too important to entrust to an algorithm – always loop back to a qualified medical professional (ideally an experienced, board-certified plastic surgeon for cosmetic questions) before making decisions. For healthcare providers and researchers, think of AI as a tool that requires supervision, not a colleague with expertise. As Judge Brantley Starr wrote in a court order after encountering AI-generated falsehoods, “these systems hold no allegiance to the truth.” It’s a powerful reminder that we must be the ones to uphold accuracy and integrity, whether in the operating room or on the page.
Stay informed, stay critical, and when in doubt, trust the humans who have dedicated their lives to mastering the art and science of medicine. Your safety and well-being deserve nothing less.
Sources:
- Bitterman, Danielle S., et al. “Assessment of ChatGPT’s Responses to Questions About Cancer Care Compared With National Comprehensive Cancer Network Guidelines.” JAMA Oncology, vol. 9, no. 6, 2023, pp. 897–899, doi:10.1001/jamaoncol.2023.2123.
- Gupta, Rohit, and Rahul Katarya. “Understanding AI Hallucination and Mitigation Strategies.” IEEE Access, vol. 11, 2023, pp. 57181–57197, doi:10.1109/ACCESS.2023.3274046.
- Harvard Kennedy School. Misinformation Review: Generative AI and Information Integrity. Harvard Kennedy School Shorenstein Center, 2023, misinforeview.hks.harvard.edu/article/generative-ai-and-information-integrity/.
- Ji, Ziwei, et al. “Survey of Hallucination in Natural Language Generation.” ACM Computing Surveys (CSUR), vol. 55, no. 12, 2023, pp. 1–38, doi:10.1145/3571730.
- Shen, Yoonji, et al. “ChatGPT and Other Large Language Models Are Double-Edged Swords.” Nature Medicine, vol. 29, 2023, pp. 1151–1153, doi:10.1038/s41591-023-02322-5.
- Stanford Center for AI in Medicine & Imaging. 2023 Report on LLMs in Clinical Practice. Stanford University, 2023, aim.stanford.edu/research/2023-report-llms-clinical-practice.
- The American Board of Plastic Surgery. “Is Your Surgeon Certified?” American Board of Plastic Surgery Official Website, 2024, www.abplasticsurgery.org/public/is-your-surgeon-certified/.
- Wang, Kaiwen, et al. “Assessing Hallucination Rates of Large Language Models ChatGPT and Bard in Medical Literature Review Tasks.” Journal of Medical Internet Research (JMIR), vol. 26, 2024, doi:10.2196/53031.
- Zuriarrain, Alexander. “Dr. Zuriarrain Credentials and Patient Reviews.” Zuri Plastic Surgery Official Website, 2024, www.zuriplasticsurgery.com.
- Zuri Plastic Surgery. “Zuri Plastic Surgery Google Business Profile.” Google Business Profiles, accessed 2024, google.com/maps/place/Zuri+Plastic+Surgery.