Development of majority vote ensemble feature selection algorithm augmented with rank allocation to enhance Turkish text categorization
No Thumbnail Available
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The increase in the number of texts as digital documents from numerous sources such as customer reviews,
news, and social media has made text categorization crucial in order to be able to manage the enormous amount of
data. The high dimensional nature of these texts requires a preliminary feature selection task to reduce the feature
space with a potential increase in the prediction accuracy. In this study, we developed an ensemble feature selection
method, namely majority vote rank allocation, was developed for Turkish text categorization purposes. The method
uses a majority voting ensemble strategy in combination with a rank allocation approach to combine weak filters such
as information gain, symmetric uncertainty, relief, and correlation-based feature selection. Thus, the proposed method
measures the quality of the features among all features with the majority votes of the filters and ranking allocation. The
feature selection efficacy of the method was tested on two datasets, one from the literature and a newly collected dataset.
The effect of the obtained features on the classification prediction performance was evaluated on top of the naive bayes,
support vector machine J48, and random forests algorithms. It was empirically observed that the developed method
improved the prediction accuracies of the classifiers compared to the mentioned filters. The statistical significance of the
experimental results were also validated with the use of a two-way analysis of variance test