Browsing by Author "Bulut, H"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item An improved ant algorithm with LDA-based representation for text document clusteringOnan, A; Bulut, H; Korukoglu, SDocument clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.Item Communication of patient-physician in patients with chronic obstructive pulmonary disease with acute exacerbationBulut, H; Ozan, E; Özmen, E; Çimen, PPurpose The aim of this study was to investigate the patient-physician communication of patients with chronic obstructive pulmonary disease (COPD) who were hospitalized due to acute exacerbation. Materials and Methods: The study was carried out in the department of pulmonology in a training and Research Hospital in Izmir with COPD patients who were hospitalized due to acute exacerbation. 400 patients who were able to communicate and literate in the 18-65 age group, were selected with simple random sampling the study. In the collection of research data, Patient Identification Form and Satisfaction Scale of Communication of Physicians were used. Results: The total score of the Physicians' Communication Form Satisfaction Scale was 100.10 +/- 17.79. The mean scores of the sub-dimensions of Body Language, Speech-Listening, Caring and Giving information were 11.03 +/- 2.83, 39.03 +/- 7.20, 42.06 +/- 9.40 and 7.98 +/- 3.05. A significant relationship was found between the communication characteristics of the patients, such as knowing the name of the physician, asking the physician questions and answering the questions during the daily interview duration about the disease and treatment. Conclusion: In general, patients reported satisfaction with the total average score of satisfaction from the communication style of physicians.Item Turkish Medical Text Classification Using BERTÇelikten, A; Bulut, HMedical text classification is mostly carried out on English data sets. The limited number of studies in Turkish is due to the compelling morphological structure of Turkish for natural language processing and the limited number of data sets in the medical domain. In addition, the use of domain specific words and abbreviations makes natural language processing studies more challenging. In this study, a classification model is implemented to assign article abstracts to appropriate disease categories using multilingual BERT and BERTurk models on a data set consisting of Turkish medical article abstracts. As a result of the experimental study, 0.82 and 0.93 F-score are obtained for multilingual BERT and BERTurk, respectively. The results show that the BERTurk is more successful than other compared models for Turkish medical text classification.Item Fireworks: an intelligent location discovery algorithm for vehicular ad hoc networksBasaran, I; Bulut, HSearching for and locating a certain destination in a vehicular ad-hoc network (VANET) are fundamental issues to ensure routing and data dissemination under high mobility and lack of fixed infrastructure. However, naive-flooding searching is too expensive and takes a considerable amount of valuable bandwidth in the network. To overcome this, GPS information of the vehicles can be exploited, which can aid searching and routing in VANETs. In this paper, we present a novel position-based searching algorithm-called Fireworks-that can be used as a location discovery algorithm in VANETs. The proposed scheme is purely reactive and has a limited usage of beacons. Fireworks algorithm provides the position of the destination vehicle without having a Location Information System infrastructure or a proactive mechanism. We show that the method is efficient and reliable while greatly reducing the searching overhead. The simulations show that the algorithm covers as many nodes as naive-flooding with less than one-fifth of the broadcast messages and with less than one-third of the Dynamic Source Routing (DSR). It also performs better than Acknowledgement-Based Broadcast Protocol (ABSM) in terms of total number of broadcast messages, node coverage speed and query success rate.Item Preventing translation quality deterioration caused by beam search decoding in neural machine translation using statistical machine translationSatir, E; Bulut, HDecoding is an important part of machine translation systems, and the most popular inference algorithm used here is beam search. Beam search algorithm improves translation by allowing a larger search space to be traversed than greedy search. However, as the beam width increases, the translation performance declines after a certain point in neural machine translation (NMT). This problem is usually not observed in statistical machine translation (SMT) due to the decoding method. This paper proposes a hybrid system based method that uses SMT predictions to prevent quality deterioration in the beam search algorithm used in NMT decoding. Our approach is based on the reranking n-best list of NMT according to the SMT system translation sentence. We propose two different algorithms for reranking NMT n-best lists. The first algorithm uses the length information of the SMT outputs. In contrast, the second uses a word-based similarity approach with the Jaccard Index, the Dice's Coefficient, and the Overlap Coefficient. Experiments on three different language pairs show that the method we propose prevents the decrease in translation quality and produces a gain of 1.3 BLEU and 1.6 METEOR for different beam sizes and 1.8 BLEU and 2.1 METEOR average scores compared to the baseline results. (c) 2021 Published by Elsevier Inc.Item A short-term photovoltaic output power forecasting based on ensemble algorithms using hyperparameter optimizationBasaran, K; Çelikten, A; Bulut, HThe stochastic and intermittent nature of solar energy presents the power grid with the challenge of providing a stable, secure, and economical power supply, especially in the case of large-scale penetration. The prerequisite for addressing these challenges is accurate power output estimation from PV systems. In addition, accurate power estimation also ensures the correct sizing of PV systems for investors. In this study, the PV output prediction model has been developed based on ensemble algorithms using two years of real power and meteorological data from grid-connected PV systems. Grid search, random search, and Bayesian optimization were used to determine the optimal hyperparameters for ensemble algorithms. The originality of this study is that (i) the use of hyperparameter optimization for ensemble algorithms in predicting PV performance, (ii) the degradation rate of PV panels by ensemble algorithms using the first two years' data, and (iii) the performance comparison of ensemble algorithms using the hyperparameter optimization technique. The accuracy and precision of the prediction model are determined by the relative root mean square error (RMSE), mean absolute error (MAE), mean bias error (MBE), mean scaled error (MSE), coefficient of determination (R2), mean absolute percentage error (MAPE), and maximum absolute error (MaxAE). To the best of our knowledge, this is one of the first studies to address the optimization of all hyperparameters to find the best parameters for ensemble algorithms and PV panel degradation rates. The results show that the CatBoost algorithm has better performance than the other algorithms used. The performance metrics of the CatBoost algorithm were determined to be 0.9327 R2, 0.047 MSE, 0.0388 MAE, 0.0003 MBE, 0.069 RMSE, 18.7 MAPE, and 0.79 MaxAE.Item Ensemble of keyword extraction methods and classifiers in text classificationOnan, A; Korukoglu, S; Bulut, HAutomatic keyword extraction is an important research direction in text mining, natural language processing and information retrieval. Keyword extraction enables us to represent text documents in a condensed way. The compact representation of documents can be helpful in several applications, such as automatic indexing, automatic summarization, automatic classification, clustering and filtering. For instance, text classification is a domain with high dimensional feature space challenge. Hence, extracting the most important/relevant words about the content of the document and using these keywords as the features can be extremely useful. In this regard, this study examines the predictive performance of five statistical keyword extraction methods (most frequent measure based keyword extraction, term frequency-inverse sentence frequency based keyword extraction, co-occurrence statistical information based keyword extraction, eccentricity-based keyword extraction and TextRank algorithm) on classification algorithms and ensemble methods for scientific text document classification (categorization). In the study, a comprehensive study of comparing base learning algorithms (Naive Bayes, support vector machines, logistic regression and Random Forest) with five widely utilized ensemble methods (AdaBoost, Bagging, Dagging, Random Subspace and Majority Voting) is conducted. To the best of our knowledge, this is the first empirical analysis, which evaluates the effectiveness of statistical keyword extraction methods in conjunction with ensemble learning algorithms. The classification schemes are compared in terms of classification accuracy, F-measure and area under curve values. To validate the empirical analysis, two-way ANOVA test is employed. The experimental analysis indicates that Bagging ensemble of Random Forest with the most-frequent based keyword extraction method yields promising results for text classification. For ACM document collection, the highest average predictive performance (93.80%) is obtained with the utilization of the most frequent based keyword extraction method with Bagging ensemble of Random Forest algorithm. In general, Bagging and Random Subspace ensembles of Random Forest yield promising results. The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability of text classification schemes, which is of practical importance in the application fields of text classification. (C) 2016 Elsevier Ltd. All rights reserved.Item A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classificationOnan, A; Korukoglu, S; Bulut, HTypically performed by supervised machine learning algorithms, sentiment analysis is highly useful for extracting subjective information from text documents online. Most approaches that use ensemble learning paradigms toward sentiment analysis involve feature engineering in order to enhance the predictive performance. In response, we sought to develop a paradigm of a multiobjective, optimization-based weighted voting scheme to assign appropriate weight values to classifiers and each output class based on the predictive performance of classification algorithms, all to enhance the predictive performance of sentiment classification. The proposed ensemble method is based on static classifier selection involving majority voting error and forward search, as well as a multiobjective differential evolution algorithm. Based on the static classifier selection scheme, our proposed ensemble method incorporates Bayesian logistic regression, naive Bayes, linear discriminant analysis, logistic regression, and support vector machines as base learners, whose performance in terms of precision and recall values determines weight adjustment. Our experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed classification scheme can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting. Of all datasets examined, the laptop dataset showed the best classification accuracy (98.86%). (C) 2016 Elsevier Ltd. All rights reserved.