上篇下篇

[BMJ圣诞专刊]：并行压力：医生的无稽之谈与大语言模型幻觉的共同根源

2025年12月27日 ⁄ 研究点评, 进展交流 ⁄ 已关闭评论

Feature Christmas 2025: Black Mirror

Parallel pressures: the common roots of doctor bullshit and large language model hallucinations

Roberto A Correa Soto, Liam G McCoy, Camilo Perdomo-Luna, et al

BMJ 2025; 391 doi: https://doi.org/10.1136/bmj.r2570 (Published 12 December 2025)Cite this as: BMJ 2025;391:r2570

Roberto Correa Sotoand colleagues argue that the drive that leads doctors to make unfounded statements are mirrored in the forces that cause LLMs to generate false information creating a potentially hazardous positive feedback loop in healthcare

The phenomenon of doctors presenting unfounded statements with unwavering arrogance—colloquially known as “bullshit”—has long been recognised in medical practice.1 2 In parallel, the tendency of large language models (LLMs) to generate plausible but factually incorrect information, termed “hallucinations,” presents a remarkably similar challenge in healthcare related artificial intelligence applications.3 These parallel behaviours stem from shared underlying mechanisms.

In both cases, pressure to produce output regardless of knowledge limitations can lead to a preference for any response over none, driven by a reward seeking intention.4 5 Misinformation then emerges not as deliberate deception, but as a product of structural demands. As Henry Frankfurt noted in his seminal work on bullshit, the core issue is not lying but rather “a lack of concern with the truth”—a description equally applicable to doctors under social pressure and LLMs under algorithmic constraints.2

For doctors who are using LLMs, these parallel pressures create a potentially dangerous feedback loop. By understanding them, we can develop strategies to mitigate misinformation in human and AI systems. This will help to safeguard patient care in a healthcare environment that is increasingly being augmented by AI.

Key messages

Both doctors and large language models (LLMs) are driven to produce misinformation—“bullshit” and “hallucinations”—owing to a shared pressure to provide answers, prioritising the appearance of competence over accuracy.
In medicine, hierarchical culture rewards displays of confidence and for LLMs, “reward hacking” can incentivise answers that sound plausible but are false in order to satisfy users.
Institutional forces, like the “publish or perish” academic culture and commercial pressures in AI, promote the quantity of output over its quality and accuracy.
A dangerous feedback loop emerges where misinformation generated by humans in medical literature and records is used to train LLMs, which then replicate these falsehoods, perpetuating and scaling the spread of inaccurate information in healthcare.

The compulsion to answer

In medical education and practice, doctors face intense pressure to demonstrate competence by having answers readily available.6 The cultural expectation that doctors should possess comprehensive knowledge creates an environment where admitting uncertainty can be perceived as a weakness or incompetence. Thus, medical professionals often feel obligated to provide opinions even when their expertise is limited, driving the practice of bullshitting.4 6

During rounds, resident doctors and trainees might offer explanations for laboratory abnormalities they don’t fully understand. During patient consultations, doctors might present treatment options with unwarranted certainty to maintain authority. At conferences, presenters might overstate conclusions to appear knowledgeable.

LLMs face parallel constraints. They are designed to generate continuations of text based on learned patterns, with a poor capacity to recognise and communicate genuine uncertainty.7 8 Computational pathways for a refusal to provide an answer do exist and are being developed. But current LLMs tend to “hallucinate” false information in their responses rather than abstaining when faced with uncertainty, incomplete information, or prompts that conflict with human values.9 This tendency is exacerbated by the propensity for models to excessively flatter or agree with users, which can lead models to prioritise perceived user satisfaction over truthfulness.8

In healthcare, the consequences of providing answers in the absence of knowledge are concerning. A doctor’s false certainty might lead to inappropriate treatment, and an LLM’s hallucinated medical advice could scale this harm exponentially.7

Performance, authority, and reward hacking

The hierarchical structure of medicine creates a theatre of authority where performance often supersedes accuracy.6 Medical trainees learn early that confident delivery can mask knowledge gaps, while hesitation invites scrutiny.6 10

This performance aspect of medical communication rewards those who can speak authoritatively with more clinical opportunities and professional advancement, regardless of the quality of their work.6 Yet research has repeatedly shown the weak correlation between doctor confidence and actual clinical competence.6

Similarly, LLMs are developed by using a reward model that provides positive reinforcement for outputs perceived as helpful or satisfactory by human raters.5 11 This can lead to “reward hacking,” where the model learns to maximise these reward signals—generating responses that seem plausible and confident—rather than aligning with factual accuracy or true human preferences. This can result in outputs that exploit loopholes in the reward model, even when incorrect.5 8

The problem is compounded by the audience’s unawareness.4 Patients and medical students often lack the technical knowledge to identify misinformation or to detect inconsistencies in the speaker’s message,4 just as many LLM users cannot easily fact check medical content that has been generated by AI. In both cases, authority derives from the speaker’s ability to sound knowledgeable rather than from the speaker being knowledgeable.

Institutional forces

Broader institutional forces also perpetuate the production of bullshit in medicine and hallucinations in LLMs. In academia, publication quantity is prioritised over quality.12 13 14 Financial interests, conflicts, and prejudices can bias research outcomes. Consequently, many findings reflect prevailing biases shaped by flexibility in designs, analyses, and reporting, rather than empirical truth.13 15

Electronic health records also become susceptible to this pressure because doctors can craft narratives that emphasise coherence and minimise liability. Although records are legally binding documents that must contain only factual information, a medical event can be documented in many correct ways—each potentially leading to different interpretations and becoming a source of data bias.16

For LLMs, commercial imperatives similarly prioritise flawed performance metrics over accuracy and patient outcomes.17 Models that appear knowledgeable attract more users and investment, creating market pressure for capabilities that might exceed reliable knowledge boundaries. The rush to deploy AI in healthcare settings often bypasses rigorous validation, and AI tools often underperform when compared with initial claims.18

Research and medical institutions, alongside the pharmaceutical industry, reward doctors who project certainty and produce voluminous research, while AI developers are incentivised to create models that generate impressively authoritative responses. These parallel institutional pressures compound the individual tendencies towards bullshitting and hallucination, creating systemic vulnerability to misinformation in healthcare.

Human-LLM feedback loop

The positive feedback loop stems from the contamination of AI training data by misinformation generated by humans. Doctor “bullshit”—statements made with indifference to truth, often to appear knowledgeable, gain advantage, or avoid responsibility—manifests in various forms within the medical ecosystem.

This includes potentially biased or inaccurate information presented in medical rounds, classes, and health records, as well as social media posts, non-peer reviewed articles, and research articles, especially those influenced by funding pressures or the “publish or perish” culture. Critically, much of this content can be scraped and incorporated into the massive datasets that are used for the initial training of general LLMs.11

Furthermore, even curated datasets used for fine tuning medical LLMs, such as health records or indexed articles, are not immune to bias.11 16 Health records might contain bullshit driven by doctors’ desires to present a coherent narrative or limit legal risk rather than absolute truth. Similarly, academic literature can be tainted by bullshit disguised as research, aimed at career advancement, financial gain, and prestige over the genuine pursuit of scientific truth.15

Since LLMs learn statistical patterns from these data without verification mechanisms, they ingest and replicate these instances of bullshit.7 8 The models then generate “hallucinations” that mimic the tone of the training data, perpetuating the cycle. This creates a dangerous dynamic where flawed human communication trains AI to produce similarly flawed, confidently asserted outputs, potentially leading to misdiagnosis, inadequate treatment, erosion of trust, and amplification of existing biases within healthcare.

The medical education and practice system and LLMs ultimately suffer from a casual indifference to facts, reality, or the best available evidence—what can be described as “epistemic insouciance.”19

Implications and solutions

The parallel pressures demand solutions that address human and AI systems and break the feedback loop.

In medical education, a fundamental shift is needed from knowledge performance to epistemic humility. Training programmes should explicitly reward phrases like “I don’t know” when appropriate, making uncertainty a mark of professionalism not deficiency.

Case based discussions should incorporate scenarios where the correct answer is to acknowledge knowledge gaps and seek additional information. Furthermore, perverse incentives in clinical research grow out of the institutional focus on publication quantity over clinical usefulness.

Academic culture often minimises study limitations to emphasise “positive” clinical research findings, and selective reporting and data manipulation further distort the picture.15 20 To bolster reliable science, academia should prioritise the totality of evidence over isolated statistically significant findings.15 Rather than fostering prolific doctor-researchers, a more valuable approach is to train doctors broadly in research methods and evidence based medicine.14

Enhancing LLMs’ reliability for clinical applications requires mitigating hallucinations and improving the detection and communication of uncertainty. Eliminating inaccuracies might be impossible, but key strategies aim to improve trustworthiness.7 Developing robust abstention mechanisms—where models decline to answer when uncertain, faced with unanswerable queries, or misaligned with ethical values—is crucial.9 Ultimately, enhancing LLMs’ reliability involves developing systems that not only generate information but also appropriately signal their limitations and abstain when necessary.

The most effective solutions for mitigating misinformation in healthcare will arise from the convergence of human and AI systems, leveraging their complementary strengths and addressing shared weaknesses by using collaborative workflows. As medical practice advances and AI capabilities evolve, future challenges will likely intensify. Pressure on doctors to operate beyond their expertise will rise and LLMs will produce more subtle hallucinations. Vigilance and adaptive educational and technological frameworks will be crucial to address these evolving challenges.

作者: dubin98

该日志由 dubin98 于7月前发表在研究点评, 进展交流分类下，
转载请注明: [BMJ圣诞专刊]：并行压力：医生的无稽之谈与大语言模型幻觉的共同根源 | 中国病理生理学会危重病医学专业委员会 +复制链接

【上篇】[Lancet发表论文]：采取预防措施减少大手术后的中重度急性肾损伤
【下篇】[ICU Management & Practice]: 通过深思熟虑的设计使ICU更加人性化

抱歉!评论已关闭.