Ensemble Learning Based Feature Selection with an Application to Text Classification
Abstract
An important problem of text classification is high dimensionality. The performance of different feature selection methods can change based on the characteristics of different datasets. In this study, a feature selection method is developed, which integrates different filter-based feature selection methods by an ensemble learning approach. In the presented method, feature rankings obtained by five filter-based feature selection methods (mutual information measure, chi-square statistics, odds ratio, information gain and weighted log likelihood ratio) are aggregated by enhanced Borda count rank aggregation. In the experimental analysis, Reuters-21578 and 20 Newsgroups datasets are employed on support vector machines and C4.5 classifier. The experimental results indicate that the presented method outperforms conventional filter-based feature selection schemes.