Imbalanced data classifier by using ensemble fuzzy c-means clustering
No Thumbnail Available
Date
2012
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Pattern classifiers developed with the imbalanced data set tend to classify an object to the class with the highest number of samples, resulting in higher overall classifier accuracy but lower sensitivity. A new approach based on a dynamic under-sampling procedure is therefore proposed to improve the classification of imbalanced datasets that are quite common in bio-medicine. To overcome a class imbalance, the dataset is resampled by using the ensemble fuzzy c-means clustering method. The under-sampling procedure is then applied to the majority class to balance the size of the classes. Compared to the existing classifiers, the proposed method yields not only higher classification accuracy and sensitivity but also more stable classification performance under different data sets, classifiers and their parameters, indicating that it is independent of particular clustering or classification methods. © 2012 IEEE.
Description
Keywords
Biomedical equipment , Biosensors , Fuzzy systems , Class imbalance , Classification accuracy , Classification methods , Classification performance , Data sets , Fuzzy C means clustering , Fuzzy c-means clustering method , Imbalanced data , Imbalanced Data-sets , Number of samples , Pattern classifier , Under-sampling , Classification (of information)