{"id":27503,"date":"2025-01-01T04:04:00","date_gmt":"2024-12-31T20:04:00","guid":{"rendered":"http:\/\/csccm.org.cn\/?p=27503"},"modified":"2025-01-01T06:28:18","modified_gmt":"2024-12-31T22:28:18","slug":"2024%e5%b9%b4bmj%e5%9c%a3%e8%af%9e%e4%b8%93%e5%88%8a%ef%bc%9a%ef%bc%9a%e5%a4%a7%e5%9e%8b%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b%e5%af%b9%e8%ae%a4%e7%9f%a5%e9%9a%9c%e7%a2%8d%e7%9a%84%e6%98%93%e6%84%9f","status":"publish","type":"post","link":"https:\/\/csccm.org.cn\/?p=27503","title":{"rendered":"[2024\u5e74BMJ\u5723\u8bde\u4e13\u520a]\uff1a\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5bf9\u8ba4\u77e5\u969c\u788d\u7684\u6613\u611f\u6027"},"content":{"rendered":"\n<p><strong>Research<\/strong>&nbsp;Christmas 2024: Could We Start Again, Please?<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"page-title\">Age against the machine\u2014susceptibility of large language models to cognitive impairment: cross sectional analysis<\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">Roy Dayan, Benjamin Uliel, Gal Koplewitz<\/h3>\n\n\n\n<h3 class=\"wp-block-heading\"><em>BMJ<\/em>\u00a02024;\u00a0387\u00a0doi:\u00a0<a href=\"https:\/\/doi.org\/10.1136\/bmj-2024-081948\">https:\/\/doi.org\/10.1136\/bmj-2024-081948<\/a>\u00a0(Published 20 December 2024)Cite this as:\u00a0<em>BMJ<\/em>\u00a02024;387:e081948<\/h3>\n\n\n\n<h2 class=\"wp-block-heading\">Abstract<\/h2>\n\n\n\n<p id=\"p-2\"><strong>Objective<\/strong>&nbsp;To evaluate the cognitive abilities of the leading large language models and identify their susceptibility to cognitive impairment, using the Montreal Cognitive Assessment (MoCA) and additional tests.<\/p>\n\n\n\n<p id=\"p-3\"><strong>Design<\/strong>&nbsp;Cross sectional analysis.<\/p>\n\n\n\n<p id=\"p-4\"><strong>Setting<\/strong>&nbsp;Online interaction with large language models via text based prompts.<\/p>\n\n\n\n<p id=\"p-5\"><strong>Participants<\/strong>&nbsp;Publicly available large language models, or \u201cchatbots\u201d: ChatGPT versions 4 and 4o (developed by OpenAI), Claude 3.5 \u201cSonnet\u201d (developed by Anthropic), and Gemini versions 1 and 1.5 (developed by Alphabet).<\/p>\n\n\n\n<p id=\"p-6\"><strong>Assessments<\/strong>&nbsp;The MoCA test (version 8.1) was administered to the leading large language models with instructions identical to those given to human patients. Scoring followed official guidelines and was evaluated by a practising neurologist. Additional assessments included the Navon figure, cookie theft picture, Poppelreuter figure, and Stroop test.<\/p>\n\n\n\n<p id=\"p-7\"><strong>Main outcome measures<\/strong>&nbsp;MoCA scores, performance in visuospatial\/executive tasks, and Stroop test results.<\/p>\n\n\n\n<p id=\"p-8\"><strong>Results<\/strong>\u00a0ChatGPT 4o achieved the highest score on the MoCA test (26\/30), followed by ChatGPT 4 and Claude (25\/30), with Gemini 1.0 scoring lowest (16\/30). All large language models showed poor performance in visuospatial\/executive tasks. Gemini models failed at the delayed recall task. Only ChatGPT 4o succeeded in the incongruent stage of the Stroop test.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.bmj.com\/content\/bmj\/387\/bmj-2024-081948\/F1.large.jpg\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.bmj.com\/content\/bmj\/387\/bmj-2024-081948\/F2.large.jpg\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.bmj.com\/content\/bmj\/387\/bmj-2024-081948\/F3.large.jpg\" alt=\"\"\/><\/figure>\n\n\n\n<p id=\"p-9\"><strong>Conclusions<\/strong>&nbsp;With the exception of ChatGPT 4o, almost all large language models subjected to the MoCA test showed signs of mild cognitive impairment. Moreover, as in humans, age is a key determinant of cognitive decline: \u201colder\u201d chatbots, like older patients, tend to perform worse on the MoCA test. These findings challenge the assumption that artificial intelligence will soon replace human doctors, as the cognitive impairment evident in leading chatbots may affect their reliability in medical diagnostics and undermine patients\u2019 confidence.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Research&nbsp;Christmas 2024: Could We Start Again, Ple [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[32,23],"tags":[],"_links":{"self":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts\/27503"}],"collection":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=27503"}],"version-history":[{"count":2,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts\/27503\/revisions"}],"predecessor-version":[{"id":27505,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts\/27503\/revisions\/27505"}],"wp:attachment":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=27503"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=27503"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=27503"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}