This repeated-measures study investigated the immediate combined effect of AI auto-labeling and auto-grading on acquisition time and image quality of the right upper quadrant (RUQ) Focused Assessment with Sonography for Trauma (FAST) exam window. The study involved 14 novice and 10 experienced emergency medicine (EM) physician trainees. Participants recorded RUQ windows with and without AI assistance on three standardized patients. Results showed that AI assistance significantly increased acquisition time for both novice (85 vs 53 seconds; p < 0.01) and experienced trainees (44 vs 28 seconds; p < 0.01). While experienced trainees achieved significantly better image quality than novices, AI assistance did not lead to an immediate improvement in image quality within either group. These findings suggest that such AI features do not immediately aid physician trainees in acquiring RUQ FAST exam windows during isolated attempts.
Introduction
This section discusses the increasing integration of artificial intelligence (AI) in point-of-care ultrasound (POCUS) machines to aid both image acquisition and interpretation, addressing the operator-dependent nature of POCUS. While AI has shown benefits in cardiac POCUS for novice and experienced users, its utility for the Focused Assessment with Sonography for Trauma (FAST) exam has produced inconsistent results in prior studies, particularly concerning the Right Upper Quadrant (RUQ) window. The study highlights existing knowledge gaps, such as the combined effect of AI auto-labeling and auto-grading, its impact on physician trainees of varying experience levels, and the immediate effect in isolated attempts. The objective of this repeated-measures study was to evaluate the immediate combined effect of AI auto-labeling and auto-grading on RUQ FAST exam window acquisition time and image quality for both novice and experienced physician trainees, focusing on the RUQ window due to its importance in the FAST exam.
Materials and methods
This section details the methodology of the study. Participants included 14 novice (first-year EM residents and fourth-year medical students) and 10 experienced (second- and third-year EM residents) physician trainees from a single program, with IRB exemption obtained. The study utilized the Butterfly IQ+ probe with ScanLab software, employing AI features like auto-labeling of organs and auto-grading of image quality. The protocol involved participants acquiring RUQ windows on three standardized patients, both with and without AI assistance, in a randomized order. Acquisition time was measured in seconds while image quality was assessed by three independent reviewers, blinded to AI usage, based on three criteria: visibility of essential structures (kidney, liver, Morrison’s Pouch, diaphragm), correct imaging plane, and proper probe orientation. Statistical analysis included Mann-Whitney U test for acquisition time and Pearson’s chi-square test and McNemar’s test for repeated measures for image quality, with a significance level of 0.05. A power analysis estimated a sample size of 60 to detect a 25-second difference in median acquisition time.
Results
The study recorded a total of 143 RUQ windows, with one novice recording without AI excluded from paired analysis. It was found that the median acquisition time for novice trainees (69 (72) seconds) was longer than for experienced trainees (34 (30) seconds) overall (p < 0.01). Crucially, acquisition time was significantly longer when AI assistance was used, for both novice trainees (85 (91) seconds with AI vs 53 (59) seconds without AI; p < 0.01) and experienced trainees (44 (29) seconds with AI vs 28 (21) seconds without AI; p < 0.01). Regarding image quality, experienced trainees were significantly more likely to meet all three RUQ quality criteria compared to novice trainees (p < 0.01 for essential structures, imaging plane, and probe placement). However, within either the novice or experienced trainee groups, no significant differences in image quality criteria were observed between recordings made with and without AI assistance.
Discussion
This section interprets the study's findings, highlighting that it is the first repeated-measures study to examine the immediate combined effect of AI auto-labeling and auto-grading on RUQ FAST exam window acquisition for both novice and experienced physician trainees. The significantly longer acquisition times observed with AI assistance in both groups are theorized to be due to trainees spending more time attempting to achieve higher-quality windows, despite no improvement in image quality. The authors hypothesize that novices lacked the necessary probe manipulation skills to benefit from AI assistance, while experienced trainees had minimal room for improvement. The discussion compares these findings with prior studies, noting that previous research had inconsistent results, focusing either on auto-grading or auto-labeling alone, and often on non-physician trainees. This study’s unique evaluation of combined AI features in physician trainees suggests that auto-grading likely contributes to longer acquisition times, while neither auto-labeling nor auto-grading definitively improves RUQ image quality. The repeated-measures design provides insight into the immediate effect in isolated clinical scenarios, suggesting limited benefit of AI acquisition assistance in such real-world situations for RUQ FAST exams.
Limitations
The study acknowledges several limitations. The primary limitation is the small sample size, which increases the risk of a Type II error (failing to detect a true effect). Another limitation is the qualitative nature of the image quality analysis, using a binary 'yes' or 'no' system, which might not have captured subtle differences that a quantitative scale (like the ACEP five-point or 10-point scales used in other studies) could have. However, the system's ability to detect differences between novice and experienced users lends it internal validity. Lastly, the repeated-measures design introduces a potential learning bias, where participants might improve their technique on subsequent examinations of the same patient. To mitigate this, the order of AI usage was randomized.
Conclusions
The study concludes that the immediate effect of combined AI auto-labeling and auto-grading on RUQ FAST exam window acquisition was an increase in acquisition time for both novice and experienced physician trainees. Crucially, AI acquisition assistance did not lead to an immediate improvement in image quality for either group. These findings suggest that current AI features for POCUS are unlikely to immediately help physician trainees in acquiring high-quality RUQ FAST exam windows during isolated clinical attempts. The authors recommend further research to evaluate AI acquisition assistance for other FAST exam windows and to validate these findings in a real clinical setting.