GPT-4 also surpassed junior doctors and trainee ophthalmologists AI model and medical professionals tested on eye assessment
The study involved administering a test comprising 87 multiple-choice questions from a textbook used for training ophthalmologists. This test was given to both the learning language models (LLMs) and a group of medical professionals, which included five expert ophthalmologists, three trainee ophthalmologists, and two junior doctors from non-specialized fields. Notably, these LLMs were not believed to have been trained on these specific questions prior to the test.
Test results GPT-4 surpasses trainees and junior doctors in test
ChatGPT , powered by either GPT-4 or GPT-3.5, was given three attempts to answer each question definitively; otherwise, its response was marked as null. The results were surprising. GPT-4 correctly answered 60 out of the 87 questions, significantly surpassing the junior doctors’ average of 37 correct answers and marginally exceeding the trainees’ average of 59.7. Interestingly, one expert ophthalmologist only managed to answer 56 questions accurately.
Comparative performance How expert ophthalmologists and other LLMs performed
Despite GPT-4’s impressive performance, the group of five expert ophthalmologists averaged a score of 66.4 correct answers, slightly outdoing GPT-4. Other learning language models (LLMs) like Google’s PaLM 2 scored a 49, while GPT-3.5 scored 42. Meta’s LLaMA lagged behind with the lowest score at 28, even falling below the junior doctors’ scores. These trials were conducted in mid-2023.

Get Creative with Cannabis: The Ultimate Guide to Painting and Pottery Workshops
Creative with Cannabis Cannabis and creativity go hand in hand—especially