Browsing by Subject "Cluster analysis"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item An improved ant algorithm with LDA-based representation for text document clustering(SAGE Publications Ltd, 2017) Onan A.; Bulut H.; Korukoglu S.Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents. © Chartered Institute of Library and Information Professionals.Item A K-medoids based clustering scheme with an application to document clustering(Institute of Electrical and Electronics Engineers Inc., 2017) Onan A.Clustering is an important unsupervised data analysis technique, which divides data objects into clusters based on similarity. Clustering has been studied and applied in many different fields, including pattern recognition, data mining, decision science and statistics. Clustering algorithms can be mainly classified as hierarchical and partitional clustering approaches. Partitioning around medoids (PAM) is a partitional clustering algorithms, which is less sensitive to outliers, but greatly affected by the poor initialization of medoids. In this paper, we augment the randomized seeding technique to overcome problem of poor initialization of medoids in PAM algorithm. The proposed approach (PAM++) is compared with other partitional clustering algorithms, such as K-means and K-means++ on text document clustering benchmarks and evaluated in terms of F-measure. The results for experiments indicate that the randomized seeding can improve the performance of PAM algorithm on text document clustering. © 2017 IEEE.Item A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification(Elsevier Ltd, 2017) Onan A.; Korukoğlu S.; Bulut H.Sentiment analysis is a critical task of extracting subjective information from online text documents. Ensemble learning can be employed to obtain more robust classification schemes. However, most approaches in the field incorporated feature engineering to build efficient sentiment classifiers. The purpose of our research is to establish an effective sentiment classification scheme by pursuing the paradigm of ensemble pruning. Ensemble pruning is a crucial method to build classifier ensembles with high predictive accuracy and efficiency. Previous studies employed exponential search, randomized search, sequential search, ranking based pruning and clustering based pruning. However, there are tradeoffs in selecting the ensemble pruning methods. In this regard, hybrid ensemble pruning schemes can be more promising. In this study, we propose a hybrid ensemble pruning scheme based on clustering and randomized search for text sentiment classification. Furthermore, a consensus clustering scheme is presented to deal with the instability of clustering results. The classifiers of the ensemble are initially clustered into groups according to their predictive characteristics. Then, two classifiers from each cluster are selected as candidate classifiers based on their pairwise diversity. The search space of candidate classifiers is explored by the elitist Pareto-based multi-objective evolutionary algorithm. For the evaluation task, the proposed scheme is tested on twelve balanced and unbalanced benchmark text classification tasks. In addition, the proposed approach is experimentally compared with three ensemble methods (AdaBoost, Bagging and Random Subspace) and three ensemble pruning algorithms (ensemble selection from libraries of models, Bagging ensemble selection and LibD3C algorithm). Results demonstrate that the consensus clustering and the elitist pareto-based multi-objective evolutionary algorithm can be effectively used in ensemble pruning. The experimental analysis with conventional ensemble methods and pruning algorithms indicates the validity and effectiveness of the proposed scheme. © 2017 Elsevier LtdItem Large Amplitude Oscillatory Shear (LAOS) analysis of gluten-free cake batters: The effect of dietary fiber enrichment(Elsevier Ltd, 2020) Ozyigit E.; Eren İ.; Kumcuoglu S.; Tavman S.The aim of this study is to investigate the effect of dietary fiber enrichment on non-linear viscoelastic behavior of gluten-free cake batters by using Large Amplitude Oscillatory Shear (LAOS) analysis. Textural properties and specific volume of gluten-free cakes were also determined to investigate the possible correlations with LAOS parameters. Gluten-free cake batters were formulated by replacing buckwheat flour with two different dietary fiber sources, namely orange fiber (OF) and orange pomace powder (OPP) at five different levels (0%, 4%, %8, %12 and 16%). All gluten-free cake batter samples exhibited linear viscoelastic properties at small strain amplitudes but rheological properties enters the non-linear region by increasing strain amplitude. The Lissajous-Bowditch curves revealed that stored energy increased by increasing dietary fiber amount and gluten-free cake batters became more elastic in the non-linear region. The normalized elastic Chebyshev coeffients (e3/e1) indicated the strain hardening behavior of cake batters at small strain amplitudes shifted to strain softening in the non-linear region. The e3/e1 and v3/v1 ratios as a function of strain amplitude indicated that non-linearity is more pronounced in the elastic component compared to the viscous component. The S and T values calculated at 50% strain amplitude showed strong correlation with water retention capacity, hardness and specific volume of gluten-free cakes enriched with OF whereas softer correlations were obtained for the OPP-containing ones due to elastic instability of the batters. Principal component (PCA) and hierarchical cluster analysis (HCA) were performed to provide basic graphical comparison of the differences/similarities between non-linear rheological properties of gluten-free cake batters by considering LAOS parameters. The viscoelastic properties of the cake batters containing 16% OPP were found to be suitable for high gas retention capacity since the highest cake specific volume was obtained. © 2019 Elsevier LtdItem Weighted word embeddings and clustering-based identification of question topics in MOOC discussion forum posts(John Wiley and Sons Inc, 2021) Onan A.; Toçoğlu M.A.Massive open online courses (MOOCs) are recent and widely studied distance learning approaches aimed at providing learning material to learners from geographically dispersed locations without age, gender, or race-related constraints. MOOCs generally enriched by discussion forums to provide interactions among students, professors, and teaching assistants. MOOC discussion forum posts provide feedback regarding the students' learning processes, social interactions, and concerns. The purpose of our research is to present a document-clustering model on MOOC discussion forum posts based on weighted word embeddings and clustering to identify question topics on discussion posts. In this study, four word-embedding schemes (namely, word2vec, fastText, global vectors, and Doc2vec), four weighting functions (i.e., term frequency-inverse document frequency [IDF], IDF, smoothed IDF, and subsampling function), and four clustering algorithms (i.e., K-means, K-means++, self-organizing maps, and divisive analysis clustering algorithm) for document clustering and topic modeling on MOOC discussion forum posts have been evaluated. Twenty different feature representations obtained from word-embedding schemes and weighting functions have been obtained. The feature representation schemes have been evaluated in conjunction with four clustering methods. For the evaluation task, the empirical results for the latent Dirichlet allocation have been also included. The empirical results in terms of adjusted rand index, normalized mutual information, and adjusted mutual information indicate that weighted word-embedding schemes combined with clustering algorithms outperform the conventional schemes. © 2020 Wiley Periodicals, Inc.