{"id":26068,"date":"2024-07-23T04:16:00","date_gmt":"2024-07-22T20:16:00","guid":{"rendered":"http:\/\/csccm.org.cn\/?p=26068"},"modified":"2024-07-23T06:02:24","modified_gmt":"2024-07-22T22:02:24","slug":"jama-netw-open%e5%8f%91%e8%a1%a8%e8%ae%ba%e6%96%87%ef%bc%9a%e9%87%87%e7%94%a8%e5%a4%a7%e5%9e%8b%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b%e8%af%84%e4%bb%b7%e9%9a%8f%e6%9c%ba%e4%b8%b4%e5%ba%8a%e8%af%95","status":"publish","type":"post","link":"https:\/\/csccm.org.cn\/?p=26068","title":{"rendered":"[JAMA Netw Open\u53d1\u8868\u8bba\u6587]\uff1a\u91c7\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u8bc4\u4ef7\u968f\u673a\u4e34\u5e8a\u8bd5\u9a8c\u7684\u504f\u501a\u98ce\u9669"},"content":{"rendered":"\n<p>Original Investigation&nbsp;<\/p>\n\n\n\n<p>Statistics and Research Methods<\/p>\n\n\n\n<p>May&nbsp;22,&nbsp;2024<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Assessing the Risk of Bias in Randomized Clinical Trials With Large Language Models<\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">Honghao\u00a0Lai,\u00a0Long\u00a0Ge,\u00a0Mingyao\u00a0Sun,\u00a0et al<\/h3>\n\n\n\n<h3 class=\"wp-block-heading\"><em>JAMA Netw Open.\u00a0<\/em>2024;7(5):e2412687. doi:10.1001\/jamanetworkopen.2024.12687<\/h3>\n\n\n\n<p><a><\/a>Key Points<\/p>\n\n\n\n<p><strong>Question<\/strong>&nbsp;&nbsp;Are large language models reliable for assessing risk of bias (ROB) in randomized clinical trials (RCTs)?<\/p>\n\n\n\n<p><strong>Findings<\/strong>&nbsp;&nbsp;In this survey study with 2 large language models and 3 experts assessing 30 RCTs, a structured prompt was developed to guide the assessment of ROB, resulting in high accuracy rates for both large language models (&gt;84.5%), compared with human reviewers, across 10 specific domains.<\/p>\n\n\n\n<p><strong>Meaning<\/strong>&nbsp;&nbsp;These findings suggest that 2 large language models have substantial accuracy in assessing ROB in RCTs, suggesting their potential as supportive tools in systematic review processes.<a><\/a><\/p>\n\n\n\n<p>Abstract<\/p>\n\n\n\n<p><strong>Importance<\/strong>&nbsp;&nbsp;Large language models (LLMs) may facilitate the labor-intensive process of systematic reviews. However, the exact methods and reliability remain uncertain.<\/p>\n\n\n\n<p><strong>Objective<\/strong>&nbsp;&nbsp;To explore the feasibility and reliability of using LLMs to assess risk of bias (ROB) in randomized clinical trials (RCTs).<\/p>\n\n\n\n<p><strong>Design, Setting, and Participants<\/strong>&nbsp;&nbsp;A survey study was conducted between August 10, 2023, and October 30, 2023. Thirty RCTs were selected from published systematic reviews.<\/p>\n\n\n\n<p><strong>Main Outcomes and Measures<\/strong>&nbsp;&nbsp;A structured prompt was developed to guide ChatGPT (LLM 1) and Claude (LLM 2) in assessing the ROB in these RCTs using a modified version of the Cochrane ROB tool developed by the CLARITY group at McMaster University. Each RCT was assessed twice by both models, and the results were documented. The results were compared with an assessment by 3 experts, which was considered a criterion standard. Correct assessment rates, sensitivity, specificity, and&nbsp;<em>F1<\/em>&nbsp;scores were calculated to reflect accuracy, both overall and for each domain of the Cochrane ROB tool; consistent assessment rates and Cohen \u03ba were calculated to gauge consistency; and assessment time was calculated to measure efficiency. Performance between the 2 models was compared using risk differences.<\/p>\n\n\n\n<p><strong>Results<\/strong>\u00a0\u00a0Both models demonstrated high correct assessment rates. LLM 1 reached a mean correct assessment rate of 84.5% (95% CI, 81.5%-87.3%), and LLM 2 reached a significantly higher rate of 89.5% (95% CI, 87.0%-91.8%). The risk difference between the 2 models was 0.05 (95% CI, 0.01-0.09). In most domains, domain-specific correct rates were around 80% to 90%; however, sensitivity below 0.80 was observed in domains 1 (random sequence generation), 2 (allocation concealment), and 6 (other concerns). Domains 4 (missing outcome data), 5 (selective outcome reporting), and 6 had\u00a0<em>F1<\/em>\u00a0scores below 0.50. The consistent rates between the 2 assessments were 84.0% for LLM 1 and 87.3% for LLM 2. LLM 1\u2019s \u03ba exceeded 0.80 in 7 and LLM 2\u2019s in 8 domains. The mean (SD) time needed for assessment was 77 (16) seconds for LLM 1 and 53 (12) seconds for LLM 2.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/cdn.jamanetwork.com\/ama\/content_public\/journal\/jamanetworkopen\/939364\/zoi240441f1_1715318989.17715.png?Expires=1719698626&amp;Signature=QaKbJ7QY-~EdCcBzw2Lmk1tnaHMIetroWlse3AYPLqrwYFu5Q~jGO25W7hmKGkJ4zTAUcyW9JIymJ6-qotW8vwDRMll-vcyZh6TTSkyhdMJqtkxBK7g2WCfpswHRFb6mxiO3M0poaREI0fh82EMcxGXtpsgDtHWq5YHsrno-ZGJBTGMUR7N03G0YeEFwlqACW-mSLXbuu7~BDCzH43FN77cW6-jam1jOgZJjVf03lRLmY7MWYy0gHmxrY-43lqBskakgPseoJyjyYcDblPEbc2nmgBMctNB6fcWIFohPgHvIbjkDA-pOjmeZm6H0pf1PGrSQUw-XspPAWIt2hrXu1Q__&amp;Key-Pair-Id=APKAIE5G5CRDK6RD3PGA\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/cdn.jamanetwork.com\/ama\/content_public\/journal\/jamanetworkopen\/939364\/zoi240441f2_1715318989.24715.png?Expires=1719698626&amp;Signature=4NYxO8GkOlzHzwplBCL1ZZGXXQJ6xINQQbB1f-Oj4I2LVy6XIAHiSdWzt2VFg84qEKysuvomf8pakUZ2Fgz3RmeHNlzNjX3CkOMt3RAwrIrxx2vVLwDs~JcalX3MRQu2aG6uvw0CJEcPhhWcvgLaOPd5B0A9ns7e3C~0wekH8fh95jpYYs80FaNH-eWVo~sA4Rk-Xwe4Mb6PMLvIxlkyXezETVa5RXYzKRT5MvytvbHUj9ol6PlL--LnsnC3k2SfCMyDqpfW516sdbpgPoi9AYovBDScsX9P0zdewtqr9TMd~5XKq-3o5EGTm4dwFHBgSqEMxW9SLnXywgsWkqL~RQ__&amp;Key-Pair-Id=APKAIE5G5CRDK6RD3PGA\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/cdn.jamanetwork.com\/ama\/content_public\/journal\/jamanetworkopen\/939364\/zoi240441f3_1715318989.25215.png?Expires=1719698626&amp;Signature=RlnPqYKTGAbbadgul~KLgkaBxnwV-SaSiQHWTtP4yK31tzvXvo3LaGi27~7rEGfsUfgbFB4M-sHrwpNUSI2OAt7oxK6YmZkjHW~0Y973Jsbyp3YihaBvGc5GVabw1SXoyeirbXFbIRla0EcJqJeFWPhy8lPx-JHAMDSlNQi1vAlPlirNtPJqWsLaD~EbHJHRe9vf~kuC27ruKTQPnSTEmG~U993tv3sInEjLLwQ1ZlNU4iye~6FxBe5a1DWTsHHnA8CQKT7iY5cuKjulmOIL7oTA39l7hZHFcd1FUc3um9f4dmjksm3EmTijx7UwURHZWeVtORTx5QilF21wvx76YA__&amp;Key-Pair-Id=APKAIE5G5CRDK6RD3PGA\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/cdn.jamanetwork.com\/ama\/content_public\/journal\/jamanetworkopen\/939364\/zoi240441t1_1715318989.23715.png?Expires=1719698627&amp;Signature=b1kGlVQoCSBCks4~kDX~Q4vETeYsLpVF0emkEMm9TVTSXq-PNECPE9XdVgbCc-RtyEbHEdCqROulfpvOqICs3RWxMeB72n2KTYdHFF8ybs7vsnUfEGGrA7PFi07qwC3Oj4L1YcDlnoyp8I81p5rASDLmTufNEXn1xeV4NhzHq-e7Q5VRBe1M~RoAEpljF3EJFp09J037BoT0IGam7Wwkd3ctOUFo~GAen41neIxn5B4LWwvMPlNUiKWqGfiraytKLqlx0TCfDWiOomxVYSMcgHYs6JW65AAGI6N4bbGs9CCfh-i3KdGXtDeFOyPNlKsJ3QBAFcr5QCPMsk7qyMb-3w__&amp;Key-Pair-Id=APKAIE5G5CRDK6RD3PGA\" alt=\"\"\/><\/figure>\n\n\n\n<p><strong>Conclusions<\/strong>&nbsp;&nbsp;In this survey study of applying LLMs for ROB assessment, LLM 1 and LLM 2 demonstrated substantial accuracy and consistency in evaluating RCTs, suggesting their potential as supportive tools in systematic review processes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Original Investigation&nbsp; Statistics and Research Me [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[32,23],"tags":[],"_links":{"self":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts\/26068"}],"collection":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=26068"}],"version-history":[{"count":2,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts\/26068\/revisions"}],"predecessor-version":[{"id":26457,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts\/26068\/revisions\/26457"}],"wp:attachment":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=26068"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=26068"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=26068"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}