Browsing by Author "Yildirim, P"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Naive Bayes Classifier for Continuous Variables using Novel Method (NBC4D) and DistributionsYildirim, P; Birant, DIn data mining, when using Naive Bayes classification technique, it is necessary to overcome the problem of how to deal with continuous attributes. Most previous work has solved the problem either by using discretization, normal method or kernel method. This study proposes the usage of different continuous probability distribution techniques for Naive Bayes classification. It explores various probability density functions of distributions. The experimental results show that the proposed probability distributions also classify continuous data with potentially high accuracy. In addition, this paper introduces a novel method, named NBC4D, which offers a new approach for classification by applying different distribution types on different attributes. The results (obtained classification accuracy rates) show that our proposed method (the usage of more than one distribution types) has success on real-world datasets when compared with the usage of only one well known distribution type.Item Development of an Interactive Game-Based Learning Environment to Teach Data MiningCengiz, M; Birant, KU; Yildirim, P; Birant, DGame-based learning has become a popular topic in all levels of education. A number of computer games have been developed to teach different subjects such as mathematics, English language, medicine, and music. This paper presents the first study that proposes the development of edutainment games to teach data mining techniques with the scope of gamebased learning. The aim of this study is to provide an environment that is both fun and enables the achievement of learning goals in data mining training in computer engineering. An escape game called Mine4Escape, which consists of different rooms to teach different data mining techniques (classification and association rule mining), has been developed for individuals at the undergraduate and post-graduate levels. The advantages of the proposed approach are discussed in comparison with traditional data mining training. In addition, this paper describes a dynamic scoring system designed for game-based learning. Finally, an experimental study was carried out to evaluate the performance of our learning environment by analyzing feedback received from a test group consisting of 39 undergraduate and graduate students in computer engineering. The findings from the questionnaire show that it is possible to enhance knowledge acquisition about data mining via the game-based approach. However, the degree of learning interest and information acceptance changes according to students' age, gender, educational level, and game habits.Item EBOC: Ensemble-Based Ordinal Classification in TransportationYildirim, P; Birant, UK; Birant, DLearning the latent patterns of historical data in an efficient way to model the behaviour of a system is a major need for making right decisions. For this purpose, machine learning solution has already begun its promising marks in transportation as well as in many areas such as marketing, finance, education, and health. However, many classification algorithms in the literature assume that the target attribute values in the datasets are unordered, so they lose inherent order between the class values. To overcome the problem, this study proposes a novel ensemble-based ordinal classification (EBOC) approach which suggests bagging and boosting (AdaBoost algorithm) methods as a solution for ordinal classification problem in transportation sector. This article also compares the proposed EBOC approach with ordinal class classifier and traditional tree-based classification algorithms (i.e., C4.5 decision tree, RandomTree, and REPTree) in terms of accuracy. The results indicate that the proposed EBOC approach achieves better classification performance than the conventional solutions.Item TTC-3600: A new benchmark dataset for Turkish text categorizationKilinç, D; Özçift, A; Bozyigit, F; Yildirim, P; Yücalar, F; Borandag, EOwing to the rapid growth of the World Wide Web, the number of documents that can be accessed via the Internet explosively increases with each passing day. Considering news portals in particular, sometimes documents related to categories such as technology, sports and politics seem to be in the wrong category or documents are located in a generic category called others. At this point, text categorization (TC), which is generally addressed as a supervised learning task is needed. Although there are substantial number of studies conducted on TC in other languages, the number of studies conducted in Turkish is very limited owing to the lack of accessibility and usability of datasets created. In this paper, a new dataset named TTC-3600, which can be widely used in studies of TC of Turkish news and articles, is created. TTC-3600 is a well-documented dataset and its file formats are compatible with well-known text mining tools. Five widely used classifiers within the field of TC and two feature selection methods are evaluated on TTC-3600. The experimental results indicate that the best accuracy criterion value 91.03% is obtained with the combination of Random Forest classifier and attribute ranking-based feature selection method in all comparisons performed after pre-processing and feature selection steps. The publicly available TTC-3600 dataset and the experimental results of this study can be utilized in comparative experiments by other researchers.Item Investigation of BRAF mutation analysis with different technical platforms in metastatic melanomaSener, E; Yildirim, P; Tan, A; Gokoz, O; Tezel, GGIn metastatic melanoma, the detection of somatic mutations in the BRAF gene is crucial regarding patient selection for targeted therapy. Several screening methods have been developed to identify BRAF gene mutations. In this study, our objective was to evaluate the detection of the BRAF V600 mutations using two molecular methods, real-time polymerase chain (real-time PCR) assay and pyrosequencing, and immunohistochemistry (IHC), and to compare the results of these different technical platforms. This study included 98 patients diagnosed with metastatic melanoma at the Hacettepe University, Department of Pathology between 2002 and 2014. BRAF mutation analysis was tested with real-time PCR, pyrosequencing and IHC methods. The results of all three tests were compared with a reference test, and the sensitivity, specificity rates and kappa coefficient values were analysed for each test. We successfully analysed BRAF mutations using all three methods in 92 patients. According to our findings, the pyrosequencing method had the highest kappa value regarding the determination of BRAF V600 mutations. The kappa values were at almost perfect agreement levels in pyrosequencing and realtime PCR assay (kappa coefficient for pyrosequencing = 0.895 (95% CI: 0.795-0.995); kappa coefficient for real-time PCR=0.871 (95% CI: 0.761-0.981). The kappa value was at a substantial agreement level in the IHC analysis (kappa coefficient = 0.776 (95% CI: 0.629-0.923). According to our results, we found that real-time PCR and pyrosequencing methods were equally excellent in determination of BRAF V600 mutations. The IHC method, which is commonly used in routine pathology practice, can also be safely used as a screening test for determination of BRAF V600 mutations. (C) 2017 Elsevier GmbH. All rights reserved.Item Comparative Analysis of Ensemble Learning Methods for Signal ClassificationYildirim, P; Birant, KU; Radevski, V; Kut, A; Birant, DIn recent years, the machine learning algorithms commenced to be used widely in signal classification area as well as many other areas. Ensemble learning has become one of the most popular Machine Learning approaches due to the high classification performance it provides. In this study, the application of four fundamental ensemble learning methods (Bagging, Boosting, Stacking, and Voting) with five different classification algorithms (Neural Network, Support Vector Machines, k-Nearest Neighbor, Naive Bayes, and C4.5) with the most optimal parameter values on signal datasets is presented. In the experimental studies, ensemble learning methods were applied on 14 different signal datasets and the results were compared in terms of classification accuracy rates. According to the results, the best classification performance was obtained with the Random Forest algorithm which is a Bagging based method.