Browsing by Subject "Classification algorithm"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Ensemble methods for opinion mining; [Görüş Madenciliʇinde Siniflandirici Topluluklari](Institute of Electrical and Electronics Engineers Inc., 2015) Onan A.; Korukoglu S.Opinion mining is an emerging field which uses computer science methods to extract subjective information, such as opinion, emotion, and attitude inherent in opinion holder's text. One of the major issues in opinion mining is to enhance the predictive performance of classification algorithm. Ensemble methods used for opinion mining aim to obtain robust classification models by combining decisions obtained by multiple classifier training, rather than depending on a single classifier. In this study, the comparative performance of opinion mining datasets on Bagging, Dagging, Random Subspace and Adaboost ensemble methods with five different classifiers and six different data representation schemes are presented. The experimental results indicate that ensemble methods can be used for building efficient opinion mining classification methods. © 2015 IEEE.Item Artificial immune system based Web page classification(Springer Verlag, 2015) Onan A.Automated classification of web pages is an important research direction in web mining, which aims to construct a classification model that can classify new instances based on labeled web documents. Machine learning algorithms are adapted to textual classification problems, including web document classification. Artificial immune systems are a branch of computational intelligence inspired by biological immune systems which is utilized to solve a variety of computational problems, including classification. This paper examines the effectiveness and suitability of artificial immune system based approaches for web page classification. Hence, two artificial immune system based classification algorithms, namely Immunos-1 and Immunos-99 algorithms are compared to two standard machine learning techniques, namely C4.5 decision tree classifier and Naïve Bayes classification. The algorithms are experimentally evaluated on 50 data sets obtained from DMOZ (Open Directory Project). The experimental results indicate that artificial immune based systems achieve higher predictive performance for web page classification. © Springer International Publishing Switzerland 2015.Item An improved ant algorithm with LDA-based representation for text document clustering(SAGE Publications Ltd, 2017) Onan A.; Bulut H.; Korukoglu S.Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents. © Chartered Institute of Library and Information Professionals.Item Hybrid supervised clustering based ensemble scheme for text classification(Emerald Group Publishing Ltd., 2017) Onan A.Purpose: The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design. Design/methodology/approach: An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks. Findings: The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification. Originality/value: The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification © 2017, © Emerald Publishing Limited.Item A machine learning based approach to identify geo-location of Twitter users(Association for Computing Machinery, 2017) Onan A.Twitter, a popular microblogging platform, has attracted great attention. Twitter enables people from all over the world to interact in an extremely personal way. The immense quantity of user-generated text messages become available on Twitter that could potentially serve as an important source of information for researchers and practitioners. The information available on Twitter may be utilized for many purposes, such as event detection, public health and crisis management. In order to effectively coordinate such activities, the identification of Twitter users' geo-locations is extremely important. Though online social networks can provide some sort of geo-location information based on GPS coordinates, Twitter suffers from geo-location sparseness problem. The identification of Twitter users' geo-location based on the content of send out messages, becomes extremely important. In this regard, this paper presents a machine learning based approach to the problem. In this study, our corpora is represented as a word vector. To obtain a classification scheme with high predictive performance, the performance of five classification algorithms, three ensemble methods and two feature selection methods are evaluated. Among the compared algorithms, the highest results (84.85%) is achieved by AdaBoost ensemble of Random Forest, when the feature set is selected with the use of consistency-based feature selection method in conjunction with best first search. © 2017 ACM.Item Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling(Hindawi Limited, 2018) Onan A.Text mining is an important research direction, which involves several fields, such as information retrieval, information extraction, and text categorization. In this paper, we propose an efficient multiple classifier approach to text categorization based on swarm-optimized topic modelling. The Latent Dirichlet allocation (LDA) can overcome the high dimensionality problem of vector space model, but identifying appropriate parameter values is critical to performance of LDA. Swarm-optimized approach estimates the parameters of LDA, including the number of topics and all the other parameters involved in LDA. The hybrid ensemble pruning approach based on combined diversity measures and clustering aims to obtain a multiple classifier system with high predictive performance and better diversity. In this scheme, four different diversity measures (namely, disagreement measure, Q-statistics, the correlation coefficient, and the double fault measure) among classifiers of the ensemble are combined. Based on the combined diversity matrix, a swarm intelligence based clustering algorithm is employed to partition the classifiers into a number of disjoint groups and one classifier (with the highest predictive performance) from each cluster is selected to build the final multiple classifier system. The experimental results based on five biomedical text benchmarks have been conducted. In the swarm-optimized LDA, different metaheuristic algorithms (such as genetic algorithms, particle swarm optimization, firefly algorithm, cuckoo search algorithm, and bat algorithm) are considered. In the ensemble pruning, five metaheuristic clustering algorithms are evaluated. The experimental results on biomedical text benchmarks indicate that swarm-optimized LDA yields better predictive performance compared to the conventional LDA. In addition, the proposed multiple classifier system outperforms the conventional classification algorithms, ensemble learning, and ensemble pruning methods. © 2018 Aytuǧ Onan.Item Comparative analysis of ensemble learning methods for signal classification; [Sinyal siniflandirmasi için topluluk öǧrenmesi yöntemlerinin karşilaştirmali analizi](Institute of Electrical and Electronics Engineers Inc., 2018) Yildirim P.; Birant K.U.; Radevski V.; Kut A.; Birant D.In recent years, the machine learning algorithms commenced to be used widely in signal classification area as well as many other areas. Ensemble learning has become one of the most popular Machine Learning approaches due to the high classification performance it provides. In this study, the application of four fundamental ensemble learning methods (Bagging, Boosting, Stacking, and Voting) with five different classification algorithms (Neural Network, Support Vector Machines, k-Nearest Neighbor, Naive Bayes, and C4.5) with the most optimal parameter values on signal datasets is presented. In the experimental studies, ensemble learning methods were applied on 14 different signal datasets and the results were compared in terms of classification accuracy rates. According to the results, the best classification performance was obtained with the Random Forest algorithm which is a Bagging based method. © 2018 IEEE.Item Semantic query suggestion based on optimized random forests(Springer Verlag, 2019) Onan A.Query suggestion is an integral part of Web search engines. Data-driven approaches to query suggestion aim to identify more relevant queries to users based on term frequencies and hence cannot fully reveal the underlying semantic intent of queries. Semantic query suggestion seeks to identify relevant queries by taking semantic concepts contained in user queries into account. In this paper, we propose a machine learning approach to semantic query suggestion based on Random Forests. The presented scheme employs an optimized Random Forest algorithm based on multi-objective simulated annealing and weighted voting. In this scheme, multi-objective simulated annealing is utilized to tune the parameters of Random Forests algorithm, i.e. the number of trees forming the ensemble and the number of features to split at each node. In addition, the weighted voting is utilized to combine the predictions of trees based on their predictive performance. The predictive performance of the proposed scheme is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines, Random Forest) and ensemble learning methods (such as AdaBoost, Bagging and Random Subspace). The experimental results on semantic query suggestion prove the superiority of the proposed scheme. © 2019, Springer International Publishing AG, part of Springer Nature.Item Performance Analysis of EEG Signal Processing Based Device Control Applications(Institute of Electrical and Electronics Engineers Inc., 2019) Altundogan T.G.; Karakose M.Nowadays, many types of devices are controlled by electroselenography (EEG) signals. In the literature and in daily life, related studies with EEG controlled devices are increasing day by day. EEG based control applications are applied on many devices such as robot arm, robot, vehicle and unmanned aerial vehicle (UAV). EEG based control procedures usually involve taking, pre-processing, classifying EEG signals, and applying the resulting command to the controlled device. In this study, a performance analysis was carried out by examining the control application studies using EEG signals in the literature. In this analysis study, firstly all studies related to the subject in the literature are examined and the devices, methods, signal processing techniques and classification algorithms used in these studies are handled separately. Appropriate electrode selection for the type of device used in device control applications using EEG signals and type of interaction for command extraction from EEG signal appears to be an important step. In this respect, performance correlations between the types of EEG devices used in the literature studies and the electrode choices used in these studies were compared. Since there are a variety of preprocessing steps for EEG signals, this study provides comparisons based on EEG signal preprocessing techniques. Artificial neural networks (ANN), support vector machines (SVM) and K nearest neighbours (Knn) are used to classify the works in the literature. In this study, comparative studies based on classification methods used in literature studies are also included. As a result, in this study, the studies in the literature for the device control using the EEG signal are examined, compared, interpreted and evaluated, and the points to be considered in the designs to be performed in this area are given. © 2018 IEEE.Item EBOC: Ensemble-Based Ordinal Classification in Transportation(Hindawi Limited, 2019) Yildirim P.; Birant U.K.; Birant D.; Moghaddam M.H.Y.Learning the latent patterns of historical data in an efficient way to model the behaviour of a system is a major need for making right decisions. For this purpose, machine learning solution has already begun its promising marks in transportation as well as in many areas such as marketing, finance, education, and health. However, many classification algorithms in the literature assume that the target attribute values in the datasets are unordered, so they lose inherent order between the class values. To overcome the problem, this study proposes a novel ensemble-based ordinal classification (EBOC) approach which suggests bagging and boosting (AdaBoost algorithm) methods as a solution for ordinal classification problem in transportation sector. This article also compares the proposed EBOC approach with ordinal class classifier and traditional tree-based classification algorithms (i.e., C4.5 decision tree, RandomTree, and REPTree) in terms of accuracy. The results indicate that the proposed EBOC approach achieves better classification performance than the conventional solutions. © 2019 Pelin Yildirim et al.