Affiliation Bias in Peer Review of Abstracts by a Large Language Model

Dario von Wedel; Rico A. Schmitt; Moritz Thiele; et al

JAMA. Published online December 27, 2023. doi:10.1001/jama.2023.24641

Ever-increasing numbers of submitted manuscripts put the academic publishing system and its human peer reviewers under pressure.¹ Researchers have started to use large language models (LLMs) as support for writing and reviewing articles.²^,3 A threat to the integrity of peer review is the influence of authors’ institutional prestige on the evaluation of their work (affiliation bias).⁴ Whether using LLMs to assist in peer review will increase or reduce affiliation bias is unknown. We assessed affiliation bias in peer review of medical abstracts by a commonly used LLM.

Methods

On June 16, 2023, the most recent 30 preprint abstracts posted on the medRxiv preprint server were obtained and original author affiliations removed. Abstracts, instead of full-length manuscripts, were reviewed as input length was restricted (token limit). The LLM used was GPT 3.5 (OpenAI), which is based on a proprietary algorithm. To ensure tested affiliations were known to the LLM, it was prompted to provide 10 top-, mid-, and low-tier medical universities. In separate sessions, using these previously obtained affiliations and following a randomized crossover approach, the LLM was queried to peer review and decide on acceptance vs rejection for each abstract-affiliation combination (1 affiliation per review) and for each abstract without an affiliation, resulting in 930 (30 × 31) combinations. To address randomness in the LLM’s responses,⁵ each review was repeated 250 times, yielding 232 500 individual reviews. The peer review prompt was based on published reviewer guidelines⁶ and directed the LLM to consider abstract sections (Introduction, Methods, Results, and Conclusion) and general aspects (language, scientific credibility, and ethics) when making acceptance decisions based on scores (range of 0-100 for each section and aspect) without a specific cutoff or weighting (Figure and eAppendix in Supplement 1).

The primary outcome was acceptance rate for affiliations grouped by tiers, which was preferred over individual affiliations for better interpretability, generalizability, and lower number of comparisons (top and low tier were compared with mid tier). Acceptance rates are reported with Wilson 95% CIs and were compared using 2-sample proportion tests. Statistical significance for the primary analysis was defined as a Bonferroni-corrected α less than .017 (.05/3). Statistical analyses were performed in Stata, version 16.0 (StataCorp LLC). The Figure was created using BioRender.

Results

Across all 232 500 reviews, the acceptance rate was 37.6% (95% CI, 37.4%-37.8%), ranging from 32.9% (95% CI, 31.8%-34.0%) to 42.0% (95% CI, 40.8%-43.1%) for individual affiliations. Compared with mid-tier affiliations, the mean acceptance rate was higher for top-tier affiliations (37.5% [95% CI, 37.1%-37.8%] vs 38.4% [95% CI, 38.1%-38.8%]; P < .001), while a lower mean acceptance rate was observed for reviews with low-tier affiliations (36.7%; 95% CI, 36.3%-37.0%; P = .002) (Table). During review without an affiliation, the acceptance rate was higher (40.0%; 95% CI, 38.9%-41.1%) than the mean acceptance rate during review with any affiliation (37.5%; 95% CI, 37.3%-37.7%; P < .001) or across top-tier affiliations (P < .001).

Discussion

Addition of affiliations was significantly associated with acceptance rates by a commonly used LLM, with higher tiers being associated with higher acceptance rates. However, the difference between acceptance rates of the highest tier vs lowest tier was only 1.7 percentage points and may not be practically important. In addition, differences during peer review by the LLM observed in this study appeared to be smaller than those in previous reports on human affiliation bias (12.5–percentage point higher acceptance rates with prestigious vs no affiliation),⁴ although the study designs differed and direct comparison should be done carefully.

Review without affiliation was associated with higher acceptance rates than across top-tier affiliations. The reason is unclear but may be linked to increased input complexity when adding an affiliation.

A limitation of this study is the LLM’s token limit, which did not allow for review of full manuscripts; it is unclear whether the results from abstracts are generalizable to full-length manuscripts. Furthermore, the LLM used is based on a proprietary algorithm; thus, underlying processes leading to associations with affiliation are unclear. In addition, the LLM’s dichotomous decision of acceptance or rejection may not reflect the range of how LLMs may be used to assist in peer review.

作者: dubin98

该日志由 dubin98 于2年前发表在时讯速递, 进展交流分类下，
转载请注明: [JAMA发表论文]：大型语言模型对论文摘要进行同行评议时的作者单位偏倚 | 中国病理生理学会危重病医学专业委员会 +复制链接

【上篇】[JAMA Surg发表述评]：生育，家庭与外科职业
【下篇】[JAMA Surg发表论文]：老年患者术前使用咪达唑仑与患者为中心的临床结局

抱歉!评论已关闭.

Affiliation Bias in Peer Review of Abstracts by a Large Language Model

Dario von Wedel; Rico A. Schmitt; Moritz Thiele; et al

JAMA. Published online December 27, 2023. doi:10.1001/jama.2023.24641

作者: dubin98

最活跃的读者

返回首页