Performance of a Large Language Model in Screening Citations

Takehiko Oami, Yohei Okada, Taka-aki Nakada

JAMA Netw Open. 2024;7(7):e2420496. doi:10.1001/jamanetworkopen.2024.20496

Key Points

Question How accurate and efficient is a large language model (LLM) for screening titles and abstracts for article inclusion in a systematic review?

Findings In this diagnostic study, LLM-assisted citation screening exhibited acceptable sensitivity and reasonably high specificity in evaluating 5 clinical questions, with post hoc prompt modifications further improving accuracy. The screening time for 100 studies was significantly reduced compared with that of conventional methods.

Meaning These findings suggest that LLM-assisted citation screening could offer a reliable and time-efficient alternative to systematic review processes.

Abstract

Importance Large language models (LLMs) are promising as tools for citation screening in systematic reviews. However, their applicability has not yet been determined.

Objective To evaluate the accuracy and efficiency of an LLM in title and abstract literature screening.

Design, Setting, and Participants This prospective diagnostic study used the data from the title and abstract screening process for 5 clinical questions (CQs) in the development of the Japanese Clinical Practice Guidelines for Management of Sepsis and Septic Shock. The LLM decided to include or exclude citations based on the inclusion and exclusion criteria in terms of patient, population, problem; intervention; comparison; and study design of the selected CQ and was compared with the conventional method for title and abstract screening. This study was conducted from January 7 to 15, 2024.

Exposures LLM (GPT-4 Turbo)–assisted citation screening or the conventional method.

Main Outcomes and Measures The sensitivity and specificity of the LLM-assisted screening process was calculated, and the full-text screening result using the conventional method was set as the reference standard in the primary analysis. Pooled sensitivity and specificity were also estimated, and screening times of the 2 methods were compared.

Results In the conventional citation screening process, 8 of 5634 publications in CQ 1, 4 of 3418 in CQ 2, 4 of 1038 in CQ 3, 17 of 4326 in CQ 4, and 8 of 2253 in CQ 5 were selected. In the primary analysis of 5 CQs, LLM-assisted citation screening demonstrated an integrated sensitivity of 0.75 (95% CI, 0.43 to 0.92) and specificity of 0.99 (95% CI, 0.99 to 0.99). Post hoc modifications to the command prompt improved the integrated sensitivity to 0.91 (95% CI, 0.77 to 0.97) without substantially compromising specificity (0.98 [95% CI, 0.96 to 0.99]). Additionally, LLM-assisted screening was associated with reduced time for processing 100 studies (1.3 minutes vs 17.2 minutes for conventional screening methods; mean difference, −15.25 minutes [95% CI, −17.70 to −12.79 minutes]).

Conclusions and Relevance In this prospective diagnostic study investigating the performance of LLM-assisted citation screening, the model demonstrated acceptable sensitivity and reasonably high specificity with reduced processing time. This novel method could potentially enhance efficiency and reduce workload in systematic reviews.

作者: dubin98

该日志由 dubin98 于2年前发表在时讯速递, 进展交流分类下，
转载请注明: [JAMA Netw Open发表论文]：大型语言模型筛查引用论文的能力 | 中国病理生理学会危重病医学专业委员会 +复制链接

【上篇】[JAMA Netw Open发表论文]：通过反馈及经济奖励减少驾车过程中使用手机
【下篇】[JAMA Netw Open发表论文]：新冠疫情隔离与挪威青少年的精神健康

抱歉!评论已关闭.

Performance of a Large Language Model in Screening Citations

Takehiko Oami, Yohei Okada, Taka-aki Nakada

JAMA Netw Open. 2024;7(7):e2420496. doi:10.1001/jamanetworkopen.2024.20496

作者: dubin98

最活跃的读者

返回首页