An improved ant algorithm with LDA-based representation for text document clustering

dc.contributor.authorOnan A.
dc.contributor.authorBulut H.
dc.contributor.authorKorukoglu S.
dc.date.accessioned2024-07-22T08:10:52Z
dc.date.available2024-07-22T08:10:52Z
dc.date.issued2017
dc.description.abstractDocument clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents. © Chartered Institute of Library and Information Professionals.
dc.identifier.DOI-ID10.1177/0165551516638784
dc.identifier.issn01655515
dc.identifier.urihttp://akademikarsiv.cbu.edu.tr:4000/handle/123456789/15424
dc.language.isoEnglish
dc.publisherSAGE Publications Ltd
dc.subjectBenchmarking
dc.subjectCluster analysis
dc.subjectData mining
dc.subjectHeuristic methods
dc.subjectInformation retrieval
dc.subjectInformation retrieval systems
dc.subjectStatistics
dc.subjectText processing
dc.subjectAnt clustering algorithm
dc.subjectClassification algorithm
dc.subjectLatent Dirichlet allocation
dc.subjectLatent dirichlet allocations
dc.subjectMeta heuristic algorithm
dc.subjectText Clustering
dc.subjectText Document Clustering
dc.subjectText mining
dc.subjectClustering algorithms
dc.titleAn improved ant algorithm with LDA-based representation for text document clustering
dc.typeArticle

Files