Use of Large Language Models for Medical Synthetic Data Generation in Mental Illness

No Thumbnail Available

Date

2023

Journal Title

Journal ISSN

Volume Title

Abstract

Data quantity and quality are very important for the development of medical artificial intelligence research. Nowadays, thanks to easier access to data, studies in this field produce very successful results. However, many factors such as protection of patient rights in medical data and confidentiality of personal data prevent researchers from directly accessing the data. For this reason, synthetic data creation studies are often needed both to expand the training and test sets and to create sample cases to be used in the relevant field. In this study, various synthetic patient data are created to be presented to a language model that enables the detection of psychological disorders through patient text. Synthetic data sets were produced with 200 artificial patient data created with popular LLM examples ChatGPT and Google Bard. The quality of synthetic data was measured with the help of a pre-trained BERT model using these datasets. In the experiments, it was observed that chatbots that generate instant data, such as ChatGPT and Google Bard, produced successful results at rates of 89% and 86% with the language representation model. With the experimental results, it appears that LLM studies can provide more successful results than advanced language models in various medical text production tasks. © The Institution of Engineering & Technology 2023.

Description

Citation