Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Српски
  • Yкраї́нська
  • Log In
    Have you forgotten your password?
Repository logoRepository logo
  • Communities & Collections
  • All Contents
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Српски
  • Yкраї́нська
  • Log In
    Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Bulut H."

Now showing 1 - 10 of 10
Results Per Page
Sort Options
  • No Thumbnail Available
    Item
    Ensemble of keyword extraction methods and classifiers in text classification
    (Elsevier Ltd, 2016) Onan A.; Korukoǧlu S.; Bulut H.
    Automatic keyword extraction is an important research direction in text mining, natural language processing and information retrieval. Keyword extraction enables us to represent text documents in a condensed way. The compact representation of documents can be helpful in several applications, such as automatic indexing, automatic summarization, automatic classification, clustering and filtering. For instance, text classification is a domain with high dimensional feature space challenge. Hence, extracting the most important/relevant words about the content of the document and using these keywords as the features can be extremely useful. In this regard, this study examines the predictive performance of five statistical keyword extraction methods (most frequent measure based keyword extraction, term frequency-inverse sentence frequency based keyword extraction, co-occurrence statistical information based keyword extraction, eccentricity-based keyword extraction and TextRank algorithm) on classification algorithms and ensemble methods for scientific text document classification (categorization). In the study, a comprehensive study of comparing base learning algorithms (Naïve Bayes, support vector machines, logistic regression and Random Forest) with five widely utilized ensemble methods (AdaBoost, Bagging, Dagging, Random Subspace and Majority Voting) is conducted. To the best of our knowledge, this is the first empirical analysis, which evaluates the effectiveness of statistical keyword extraction methods in conjunction with ensemble learning algorithms. The classification schemes are compared in terms of classification accuracy, F-measure and area under curve values. To validate the empirical analysis, two-way ANOVA test is employed. The experimental analysis indicates that Bagging ensemble of Random Forest with the most-frequent based keyword extraction method yields promising results for text classification. For ACM document collection, the highest average predictive performance (93.80%) is obtained with the utilization of the most frequent based keyword extraction method with Bagging ensemble of Random Forest algorithm. In general, Bagging and Random Subspace ensembles of Random Forest yield promising results. The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability of text classification schemes, which is of practical importance in the application fields of text classification. © 2016 Elsevier Ltd. All rights reserved.
  • No Thumbnail Available
    Item
    A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification
    (Elsevier Ltd, 2016) Onan A.; Korukoğlu S.; Bulut H.
    Typically performed by supervised machine learning algorithms, sentiment analysis is highly useful for extracting subjective information from text documents online. Most approaches that use ensemble learning paradigms toward sentiment analysis involve feature engineering in order to enhance the predictive performance. In response, we sought to develop a paradigm of a multiobjective, optimization-based weighted voting scheme to assign appropriate weight values to classifiers and each output class based on the predictive performance of classification algorithms, all to enhance the predictive performance of sentiment classification. The proposed ensemble method is based on static classifier selection involving majority voting error and forward search, as well as a multiobjective differential evolution algorithm. Based on the static classifier selection scheme, our proposed ensemble method incorporates Bayesian logistic regression, naïve Bayes, linear discriminant analysis, logistic regression, and support vector machines as base learners, whose performance in terms of precision and recall values determines weight adjustment. Our experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed classification scheme can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting. Of all datasets examined, the laptop dataset showed the best classification accuracy (98.86%). © 2016 Elsevier Ltd
  • No Thumbnail Available
    Item
    An improved ant algorithm with LDA-based representation for text document clustering
    (SAGE Publications Ltd, 2017) Onan A.; Bulut H.; Korukoglu S.
    Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents. © Chartered Institute of Library and Information Professionals.
  • No Thumbnail Available
    Item
    A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification
    (Elsevier Ltd, 2017) Onan A.; Korukoğlu S.; Bulut H.
    Sentiment analysis is a critical task of extracting subjective information from online text documents. Ensemble learning can be employed to obtain more robust classification schemes. However, most approaches in the field incorporated feature engineering to build efficient sentiment classifiers. The purpose of our research is to establish an effective sentiment classification scheme by pursuing the paradigm of ensemble pruning. Ensemble pruning is a crucial method to build classifier ensembles with high predictive accuracy and efficiency. Previous studies employed exponential search, randomized search, sequential search, ranking based pruning and clustering based pruning. However, there are tradeoffs in selecting the ensemble pruning methods. In this regard, hybrid ensemble pruning schemes can be more promising. In this study, we propose a hybrid ensemble pruning scheme based on clustering and randomized search for text sentiment classification. Furthermore, a consensus clustering scheme is presented to deal with the instability of clustering results. The classifiers of the ensemble are initially clustered into groups according to their predictive characteristics. Then, two classifiers from each cluster are selected as candidate classifiers based on their pairwise diversity. The search space of candidate classifiers is explored by the elitist Pareto-based multi-objective evolutionary algorithm. For the evaluation task, the proposed scheme is tested on twelve balanced and unbalanced benchmark text classification tasks. In addition, the proposed approach is experimentally compared with three ensemble methods (AdaBoost, Bagging and Random Subspace) and three ensemble pruning algorithms (ensemble selection from libraries of models, Bagging ensemble selection and LibD3C algorithm). Results demonstrate that the consensus clustering and the elitist pareto-based multi-objective evolutionary algorithm can be effectively used in ensemble pruning. The experimental analysis with conventional ensemble methods and pruning algorithms indicates the validity and effectiveness of the proposed scheme. © 2017 Elsevier Ltd
  • No Thumbnail Available
    Item
    Fireworks: an intelligent location discovery algorithm for vehicular ad hoc networks
    (Springer New York LLC, 2018) Basaran I.; Bulut H.
    Searching for and locating a certain destination in a vehicular ad-hoc network (VANET) are fundamental issues to ensure routing and data dissemination under high mobility and lack of fixed infrastructure. However, naive-flooding searching is too expensive and takes a considerable amount of valuable bandwidth in the network. To overcome this, GPS information of the vehicles can be exploited, which can aid searching and routing in VANETs. In this paper, we present a novel position-based searching algorithm—called Fireworks—that can be used as a location discovery algorithm in VANETs. The proposed scheme is purely reactive and has a limited usage of beacons. Fireworks algorithm provides the position of the destination vehicle without having a Location Information System infrastructure or a proactive mechanism. We show that the method is efficient and reliable while greatly reducing the searching overhead. The simulations show that the algorithm covers as many nodes as naive-flooding with less than one-fifth of the broadcast messages and with less than one-third of the Dynamic Source Routing (DSR). It also performs better than Acknowledgement-Based Broadcast Protocol (ABSM) in terms of total number of broadcast messages, node coverage speed and query success rate. © 2016, Springer Science+Business Media New York.
  • No Thumbnail Available
    Item
    Preventing translation quality deterioration caused by beam search decoding in neural machine translation using statistical machine translation
    (Elsevier Inc., 2021) Satir E.; Bulut H.
    Decoding is an important part of machine translation systems, and the most popular inference algorithm used here is beam search. Beam search algorithm improves translation by allowing a larger search space to be traversed than greedy search. However, as the beam width increases, the translation performance declines after a certain point in neural machine translation (NMT). This problem is usually not observed in statistical machine translation (SMT) due to the decoding method. This paper proposes a hybrid system-based method that uses SMT predictions to prevent quality deterioration in the beam search algorithm used in NMT decoding. Our approach is based on the reranking n-best list of NMT according to the SMT system translation sentence. We propose two different algorithms for reranking NMT n-best lists. The first algorithm uses the length information of the SMT outputs. In contrast, the second uses a word-based similarity approach with the Jaccard Index, the Dice's Coefficient, and the Overlap Coefficient. Experiments on three different language pairs show that the method we propose prevents the decrease in translation quality and produces a gain of 1.3 BLEU and 1.6 METEOR for different beam sizes and 1.8 BLEU and 2.1 METEOR average scores compared to the baseline results. © 2021
  • No Thumbnail Available
    Item
    A novel hybrid approach to improve neural machine translation decoding using phrase-based statistical machine translation
    (Institute of Electrical and Electronics Engineers Inc., 2021) Satir E.; Bulut H.
    Phrase-based models are among the best performing statistical machine translation (SMT) systems. These systems make translations phrase-by-phrase at a time. The decoding process is done locally in these systems. In addition, neural machine translation (NMT) systems have become very popular for the past four or five years with essential features such as more fluent translations. However, sometimes NMT systems give up accuracy for fluent translations due to the nature of the decoding technique they use. In this study, we aim to develop a hybrid system by guiding NMT decoding using the output sentences of the phrase-based SMT systems. According to the two-way translation experiments, German-to-English and English-to-German, and the results obtained in terms of two popular machine translation evaluation metrics: BLEU and METEOR, our method improves the quality of NMT system translations. © 2021 IEEE.
  • No Thumbnail Available
    Item
    Turkish medical text classification using BERT; [BERT modeli ile Türkçe medikal metin siniflandirma]
    (Institute of Electrical and Electronics Engineers Inc., 2021) Celikten A.; Bulut H.
    Medical text classification is mostly carried out on English data sets. The limited number of studies in Turkish is due to the compelling morphological structure of Turkish for natural language processing and the limited number of data sets in the medical domain. In addition, the use of domain specific words and abbreviations makes natural language processing studies more challenging. In this study, a classification model is implemented to assign article abstracts to appropriate disease categories using multilingual BERT and BERTurk models on a data set consisting of Turkish medical article abstracts. As a result of the experimental study, 0.82 and 0.93 F-score are obtained for multilingual BERT and BERTurk, respectively. The results show that the BERTurk is more successful than other compared models for Turkish medical text classification. © 2021 IEEE.
  • No Thumbnail Available
    Item
    Keyword extraction from biomedical documents using deep contextualized embeddings
    (Institute of Electrical and Electronics Engineers Inc., 2021) Celikten A.; Ugur A.; Bulut H.
    Due to the rapidly increasing amount of biomedical publications, it has become challenging to follow scientific articles and new developments. Keywords in scientific articles provide a quick understanding and summarize the important points of the context. When keywords are not used in some biomedical articles or are not sufficient to express the content of the text, automatic keyword extraction systems are needed. This paper addresses the keyword extraction problem as a sequence labeling task where words are represented as deep contextual embeddings. We predict the keyword tags identified in sequence labeling by fine-tuning XLNET and BERT-based models such as BERT, BioBERT, SCIBERT, and RoBERTa. Our proposed method does not need extra dictionaries required by rule-based methods and feature extraction as in traditional machine learning methods. Performance evaluation on the benchmark dataset for biomedical keyword extraction shows that domain-specific contextualized embeddings (BioBERT, SciBERT) achieve state-of-the-art results compared to the general domain embeddings (BERT, RoBERTa, XLNET) and unsupervised methods. © 2021 IEEE.
  • No Thumbnail Available
    Item
    A short-term photovoltaic output power forecasting based on ensemble algorithms using hyperparameter optimization
    (Springer Science and Business Media Deutschland GmbH, 2024) Basaran K.; Çelikten A.; Bulut H.
    The stochastic and intermittent nature of solar energy presents the power grid with the challenge of providing a stable, secure, and economical power supply, especially in the case of large-scale penetration. The prerequisite for addressing these challenges is accurate power output estimation from PV systems. In addition, accurate power estimation also ensures the correct sizing of PV systems for investors. In this study, the PV output prediction model has been developed based on ensemble algorithms using two years of real power and meteorological data from grid-connected PV systems. Grid search, random search, and Bayesian optimization were used to determine the optimal hyperparameters for ensemble algorithms. The originality of this study is that (i) the use of hyperparameter optimization for ensemble algorithms in predicting PV performance, (ii) the degradation rate of PV panels by ensemble algorithms using the first two years' data, and (iii) the performance comparison of ensemble algorithms using the hyperparameter optimization technique. The accuracy and precision of the prediction model are determined by the relative root mean square error (RMSE), mean absolute error (MAE), mean bias error (MBE), mean scaled error (MSE), coefficient of determination (R2), mean absolute percentage error (MAPE), and maximum absolute error (MaxAE). To the best of our knowledge, this is one of the first studies to address the optimization of all hyperparameters to find the best parameters for ensemble algorithms and PV panel degradation rates. The results show that the CatBoost algorithm has better performance than the other algorithms used. The performance metrics of the CatBoost algorithm were determined to be 0.9327 R2, 0.047 MSE, 0.0388 MAE, 0.0003 MBE, 0.069 RMSE, 18.7 MAPE, and 0.79 MaxAE. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.

Manisa Celal Bayar University copyright © 2002-2025 LYRASIS

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback