Imbalanced data classifier by using ensemble fuzzy c-means clustering

No Thumbnail Available

Date

2012

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Pattern classifiers developed with the imbalanced data set tend to classify an object to the class with the highest number of samples, resulting in higher overall classifier accuracy but lower sensitivity. A new approach based on a dynamic under-sampling procedure is therefore proposed to improve the classification of imbalanced datasets that are quite common in bio-medicine. To overcome a class imbalance, the dataset is resampled by using the ensemble fuzzy c-means clustering method. The under-sampling procedure is then applied to the majority class to balance the size of the classes. Compared to the existing classifiers, the proposed method yields not only higher classification accuracy and sensitivity but also more stable classification performance under different data sets, classifiers and their parameters, indicating that it is independent of particular clustering or classification methods. © 2012 IEEE.

Description

Citation