Quick Buzz Feed

Assessing Artificial Intelligence (AI) in Patient Education: Evaluating Accuracy and Readability of Responses on Surgical Procedures for Patellar Tendon Rupture

Gary Lloyd | May 24,26 | 01:38 EST

Technology

This comprehensive study evaluates the emerging role of Artificial Intelligence (AI) chatbots in patient education, specifically focusing on responses to questions about surgical procedures for patellar tendon rupture. It critically assesses the accuracy and readability of content generated by four prominent AI models: ChatGPT 3.5, ChatGPT 4, Gemini 1.0, and Perplexity. The research highlights the increasing reliance of patients on online medical information and the medical community's responsibility to ensure the quality and clarity of AI-generated educational materials to combat misinformation and empower patients with informed decision-making. The study found consistent readability levels across platforms, though often above recommended guidelines, but significant variations in the quality and reliability of information, with Perplexity notably outperforming others, particularly due to its robust source citation practices.

Introduction

The introduction addresses the growing trend of patients using the internet, including AI-generated content, to seek medical information. It emphasizes the critical need for healthcare professionals to assess the accuracy, quality, and readability of these materials, especially for conditions like patellar tendon ruptures where clear information is vital for informed decision-making and adherence to treatment. This study's objective is to evaluate and compare the readability and information quality of responses from four specific AI chatbots—ChatGPT 3.5, ChatGPT 4, Gemini 1.0, and Perplexity—regarding patellar tendon repair.

Materials and methods

This section details the methodology used to assess the AI chatbots. Four AI models were prompted with 15 frequently asked patient questions about patellar tendon ruptures. Five orthopedic surgeons independently evaluated the quality of the blinded responses using the 16-item DISCERN instrument, which assesses reliability and treatment option information. Readability was objectively measured using three validated tools: Flesch-Kincaid Reading Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG) score, and Gunning Fog score. Statistical analysis included Kruskal-Wallis tests and intraclass correlation coefficient (ICC) to evaluate consistency among raters.

Results

The findings indicate that there were no statistically significant differences in readability among the four AI chatbots, though all provided responses that averaged above the eighth-grade reading level recommended for patient education materials. In terms of informational quality, Perplexity achieved the highest mean DISCERN score (64.2 ± 9.2), categorized as 'excellent,' and its score was statistically significant when compared to ChatGPT 3.5 (49 ± 7.97, 'fair'). ChatGPT 4 (52 ± 6.28) and Gemini 1.0 (59.2 ± 7.43) both fell into the 'good' category. Notably, Perplexity consistently scored higher on source citation criteria, while other models often lacked clear source attribution.

Discussion

The discussion interprets the study's findings within the broader context of AI in orthopedics and patient education. It reiterates that while AI chatbots are powerful tools, their current limitations include providing content that is often too complex for the average reader and generally lacking transparent source citations, an area where Perplexity uniquely excelled. The authors suggest that Perplexity's higher DISCERN scores were likely influenced by its citation-forward design. The section stresses that AI should serve as an supplementary tool, not a replacement for medical advice, and always requires physician oversight to ensure appropriate patient interpretation and to mitigate misinformation.

Conclusions

The study concludes that while the readability of AI-generated content for patellar tendon rupture information remains consistent across different AI tools (though generally higher than recommended for average readers), the quality and accuracy of the information can vary significantly. Perplexity demonstrated superior performance over ChatGPT 3.5 in providing reliable information, largely attributed to its consistent sourcing. As AI continues its rapid integration into society and the medical field, these chatbots hold immense potential to revolutionize patient education, provided they are continuously evaluated, refined, and used judiciously under clinical supervision as valuable adjuncts to traditional medical advice.