Turkish Medical Text Classification Using BERT
No Thumbnail Available
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Medical text classification is mostly carried out on English data sets. The limited number of studies in Turkish is due to the compelling morphological structure of Turkish for natural language processing and the limited number of data sets in the medical domain. In addition, the use of domain specific words and abbreviations makes natural language processing studies more challenging. In this study, a classification model is implemented to assign article abstracts to appropriate disease categories using multilingual BERT and BERTurk models on a data set consisting of Turkish medical article abstracts. As a result of the experimental study, 0.82 and 0.93 F-score are obtained for multilingual BERT and BERTurk, respectively. The results show that the BERTurk is more successful than other compared models for Turkish medical text classification.