Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication

Philip Chung, Christine T. Fong, Andrew M. Walters, et al

JAMA Surg. Published online June 5, 2024. doi:10.1001/jamasurg.2024.1621

Key Points

Question Can a large language model perform risk stratification and predict postoperative outcome measures using a description of the procedure and a patient’s preoperative clinical notes from the electronic health record?

Findings In this prognostic study of task-specific datasets, each with 1000 cases, large language model GPT-4 Turbo (OpenAI) achieved an F1 score of 0.50 for American Society of Anesthesiologists Physical Status, 0.64 for hospital admission, 0.81 for intensive care unit (ICU) admission, 0.61 for unplanned admission, and 0.86 for hospital mortality prediction but was unable to accurately predict duration outcomes such as postanesthesia care unit phase 1 duration, hospital duration, and ICU duration.

Meaning Current generation large language models may be able to assist clinicians with perioperative risk stratification in classification tasks but not numerical prediction tasks.

Abstract

Importance General-domain large language models may be able to perform risk stratification and predict postoperative outcome measures using a description of the procedure and a patient’s electronic health record notes.

Objective To examine predictive performance on 8 different tasks: prediction of American Society of Anesthesiologists Physical Status (ASA-PS), hospital admission, intensive care unit (ICU) admission, unplanned admission, hospital mortality, postanesthesia care unit (PACU) phase 1 duration, hospital duration, and ICU duration.

Design, Setting, and Participants This prognostic study included task-specific datasets constructed from 2 years of retrospective electronic health records data collected during routine clinical care. Case and note data were formatted into prompts and given to the large language model GPT-4 Turbo (OpenAI) to generate a prediction and explanation. The setting included a quaternary care center comprising 3 academic hospitals and affiliated clinics in a single metropolitan area. Patients who had a surgery or procedure with anesthesia and at least 1 clinician-written note filed in the electronic health record before surgery were included in the study. Data were analyzed from November to December 2023.

Exposures Compared original notes, note summaries, few-shot prompting, and chain-of-thought prompting strategies.

Main Outcomes and Measures F1 score for binary and categorical outcomes. Mean absolute error for numerical duration outcomes.

Results Study results were measured on task-specific datasets, each with 1000 cases with the exception of unplanned admission, which had 949 cases, and hospital mortality, which had 576 cases. The best results for each task included an F1 score of 0.50 (95% CI, 0.47-0.53) for ASA-PS, 0.64 (95% CI, 0.61-0.67) for hospital admission, 0.81 (95% CI, 0.78-0.83) for ICU admission, 0.61 (95% CI, 0.58-0.64) for unplanned admission, and 0.86 (95% CI, 0.83-0.89) for hospital mortality prediction. Performance on duration prediction tasks was universally poor across all prompt strategies for which the large language model achieved a mean absolute error of 49 minutes (95% CI, 46-51 minutes) for PACU phase 1 duration, 4.5 days (95% CI, 4.2-5.0 days) for hospital duration, and 1.1 days (95% CI, 0.9-1.3 days) for ICU duration prediction.

Conclusions and Relevance Current general-domain large language models may assist clinicians in perioperative risk stratification on classification tasks but are inadequate for numerical duration predictions. Their ability to produce high-quality natural language explanations for the predictions may make them useful tools in clinical workflows and may be complementary to traditional risk prediction models.

作者: dubin98

该日志由 dubin98 于2年前发表在时讯速递, 进展交流分类下，
转载请注明: [JAMA Surg发表论文]：大语言模型预测围手术期风险及预后的能力 | 中国病理生理学会危重病医学专业委员会 +复制链接

【上篇】[JAMA Intern Med发表论文]：呼吸道合胞病毒感染老年住院患者的急性心脏事件
【下篇】[JAMA Intern Med发表论文]：津力达胶囊对糖耐量异常合并多代谢紊乱人群的糖尿病预防效果

抱歉!评论已关闭.

Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication

Philip Chung, Christine T. Fong, Andrew M. Walters, et al

JAMA Surg. Published online June 5, 2024. doi:10.1001/jamasurg.2024.1621

作者: dubin98

最活跃的读者

返回首页