Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling

dc.contributor.authorOnan A.
dc.date.accessioned2024-07-22T08:10:09Z
dc.date.available2024-07-22T08:10:09Z
dc.date.issued2018
dc.description.abstractText mining is an important research direction, which involves several fields, such as information retrieval, information extraction, and text categorization. In this paper, we propose an efficient multiple classifier approach to text categorization based on swarm-optimized topic modelling. The Latent Dirichlet allocation (LDA) can overcome the high dimensionality problem of vector space model, but identifying appropriate parameter values is critical to performance of LDA. Swarm-optimized approach estimates the parameters of LDA, including the number of topics and all the other parameters involved in LDA. The hybrid ensemble pruning approach based on combined diversity measures and clustering aims to obtain a multiple classifier system with high predictive performance and better diversity. In this scheme, four different diversity measures (namely, disagreement measure, Q-statistics, the correlation coefficient, and the double fault measure) among classifiers of the ensemble are combined. Based on the combined diversity matrix, a swarm intelligence based clustering algorithm is employed to partition the classifiers into a number of disjoint groups and one classifier (with the highest predictive performance) from each cluster is selected to build the final multiple classifier system. The experimental results based on five biomedical text benchmarks have been conducted. In the swarm-optimized LDA, different metaheuristic algorithms (such as genetic algorithms, particle swarm optimization, firefly algorithm, cuckoo search algorithm, and bat algorithm) are considered. In the ensemble pruning, five metaheuristic clustering algorithms are evaluated. The experimental results on biomedical text benchmarks indicate that swarm-optimized LDA yields better predictive performance compared to the conventional LDA. In addition, the proposed multiple classifier system outperforms the conventional classification algorithms, ensemble learning, and ensemble pruning methods. © 2018 Aytuǧ Onan.
dc.identifier.DOI-ID10.1155/2018/2497471
dc.identifier.issn1748670X
dc.identifier.urihttp://akademikarsiv.cbu.edu.tr:4000/handle/123456789/15102
dc.language.isoEnglish
dc.publisherHindawi Limited
dc.rightsAll Open Access; Gold Open Access; Green Open Access
dc.subjectAlgorithms
dc.subjectCluster Analysis
dc.subjectData Mining
dc.subjectInformation Storage and Retrieval
dc.subjectBenchmarking
dc.subjectGenetic algorithms
dc.subjectInformation retrieval
dc.subjectNatural language processing systems
dc.subjectParticle swarm optimization (PSO)
dc.subjectStatistics
dc.subjectText mining
dc.subjectVector spaces
dc.subjectClassification algorithm
dc.subjectCorrelation coefficient
dc.subjectCuckoo search algorithms
dc.subjectLatent dirichlet allocations
dc.subjectMeta heuristic algorithm
dc.subjectMultiple classifier approach
dc.subjectMultiple classifier systems
dc.subjectPredictive performance
dc.subjectarticle
dc.subjectclassification algorithm
dc.subjectclassifier
dc.subjectcorrelation coefficient
dc.subjectfirefly
dc.subjectgenetic algorithm
dc.subjectintelligence
dc.subjectlearning
dc.subjectnonhuman
dc.subjectstatistics
dc.subjectalgorithm
dc.subjectcluster analysis
dc.subjectdata mining
dc.subjectinformation retrieval
dc.subjectClustering algorithms
dc.titleBiomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling
dc.typeArticle

Files