ChatGPT-4o's performance on pediatric Vesicoureteral reflux

Onder, ENA; Ensari, E; Ertan, P

ChatGPT-4o's performance on pediatric Vesicoureteral reflux

Authors

Onder, ENA

Ensari, E

Ertan, P

Abstract

Introduction Vesicoureteral reflux (VUR) is a common congenital or acquired urinary disorder in children. Chat Generative Pre-trained Transformer (ChatGPT) is an artificial intelligence-driven platform offering medical information. This research aims to assess the reliability and readability of ChatGPT-4o's answers regarding pediatric VUR for general, non-medical audience. Materials and methods Twenty of the most frequently asked English- language questions about VUR in children were used to evaluate ChatGPT-4o's responses. Two independent reviewers rated the reliability and quality using the Global Quality Scale (GQS) and a modified version of the DISCERN tool. The readability of ChatGPT responses was assessed through the Flesch Reading Ease (FRE) Score, Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), ColemanLiau Index (CLI), and Simple Measure of Gobbledygook (SMOG). Results Median mDISCERN and GQS scores were 4 (4-5) and 5 (3-5), respectively. Most of the responses of ChatGPT have moderate (55 %) and good (45 %) reliability according to the mDISCERN score and high quality (95 %) according to GQS. The mean + standard deviation scores for FRE, FKGL, SMOG, GFI, and CLI of the text were 26 + 12, 15+2.5, 16.3+2, 18.8+2.9, and 15.3+2.2, respectively, indicating a high level of reading difficulty. Discussion While ChatGPT-4o offers accurate and high-quality information about pediatric VUR, its readability poses challenges, as the content is difficult to understand for a general audience. Conclusion ChatGPT provides high-quality, accessible information about VUR. However, improving readability should be a priority to make this information more user-friendly for a broader audience.