Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Српски
  • Yкраї́нська
  • Log In
    Have you forgotten your password?
Repository logoRepository logo
  • Communities & Collections
  • All Contents
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Српски
  • Yкраї́нська
  • Log In
    Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Onan, A"

Now showing 1 - 20 of 32
Results Per Page
Sort Options
  • No Thumbnail Available
    Item
    A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification
    Onan, A; Korukoglu, S; Bulut, H
    Typically performed by supervised machine learning algorithms, sentiment analysis is highly useful for extracting subjective information from text documents online. Most approaches that use ensemble learning paradigms toward sentiment analysis involve feature engineering in order to enhance the predictive performance. In response, we sought to develop a paradigm of a multiobjective, optimization-based weighted voting scheme to assign appropriate weight values to classifiers and each output class based on the predictive performance of classification algorithms, all to enhance the predictive performance of sentiment classification. The proposed ensemble method is based on static classifier selection involving majority voting error and forward search, as well as a multiobjective differential evolution algorithm. Based on the static classifier selection scheme, our proposed ensemble method incorporates Bayesian logistic regression, naive Bayes, linear discriminant analysis, logistic regression, and support vector machines as base learners, whose performance in terms of precision and recall values determines weight adjustment. Our experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed classification scheme can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting. Of all datasets examined, the laptop dataset showed the best classification accuracy (98.86%). (C) 2016 Elsevier Ltd. All rights reserved.
  • No Thumbnail Available
    Item
    Exploring Performance of Instance Selection Methods in Text Sentiment Classification
    Onan, A; Korukoglu, S
    Sentiment analysis is the process of extracting subjective information in source materials. Sentiment analysis is a subfield of web and text mining. One major problem encountered in these areas is overwhelming amount of data available. Hence, instance selection and feature selection become two essential tasks for achieving scalability in machine learning based sentiment classification. Instance selection is a data reduction technique which aims to eliminate redundant, noisy data from the training dataset so that training time can be reduced, scalability and generalization ability can be enhanced. This paper examines the predictive performance of fifteen benchmark instance selection methods for text classification domain. The instance selection methods are evaluated by decision tree classifier (C4.5 algorithm) and radial basis function networks in terms of classification accuracy and data reduction rates. The experimental results indicate that the highest classification accuracies on C4.5 algorithm are generally obtained by model class selection method, while the highest classification accuracies on radial basis function networks are obtained by nearest centroid neighbor edition.
  • No Thumbnail Available
    Item
    A feature selection model based on genetic rank aggregation for text sentiment classification
    Onan, A; Korukoglu, S
    Sentiment analysis is an important research direction of natural language processing, text mining and web mining which aims to extract subjective information in source materials. The main challenge encountered in machine learning method-based sentiment classification is the abundant amount of data available. This amount makes it difficult to train the learning algorithms in a feasible time and degrades the classification accuracy of the built model. Hence, feature selection becomes an essential task in developing robust and efficient classification models whilst reducing the training time. In text mining applications, individual filter-based feature selection methods have been widely utilized owing to their simplicity and relatively high performance. This paper presents an ensemble approach for feature selection, which aggregates the several individual feature lists obtained by the different feature selection methods so that a more robust and efficient feature subset can be obtained. In order to aggregate the individual feature lists, a genetic algorithm has been utilized. Experimental evaluations indicated that the proposed aggregation model is an efficient method and it outperforms individual filter-based feature selection methods on sentiment classification.
  • No Thumbnail Available
    Item
    Topic modeling through rank-based aggregation and LLMs: An approach for AI and human-generated scientific texts
    Çelikten, T; Onan, A
    The increasing presence of AI-generated and human-paraphrased content in scientific literature presents new challenges for topic modeling, particularly in maintaining semantic coherence and interpretability across diverse text sources. Traditional topic modeling methods, such as Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF), often suffer from inconsistencies and diminished coherence when applied to heterogeneous sources. Recently, large language models (LLMs) have demonstrated potential for enhanced topic extraction, yet they frequently lack the stability and interpretability required for reliable deployment. In response to these limitations, we propose a novel, robust ensemble framework that integrates rank-based aggregation and LLM-powered topic extraction to achieve consistent, high-quality topic modeling across AI-generated, AI-paraphrased, and human-generated scientific abstracts. Our framework employs a rank-based aggregation scheme to reduce inconsistencies in LLM outputs and incorporates neural topic models to enhance coherence and semantic depth. By combining the strengths of traditional models and LLMs, our framework consistently outperforms baseline methods in terms of topic coherence, diversity, and stability. Experimental results on a diverse dataset of scientific abstracts demonstrate a substantial improvement in coherence scores and topic interpretability, with our ensemble approach outperforming conventional models and leading neural topic models by significant margins. This framework not only addresses the challenges of cross-source topic modeling but also establishes a benchmark for robust, scalable analysis of scientific literature spanning AI and human narratives.
  • No Thumbnail Available
    Item
    Evaluating the Coherence and Diversity in AI-Generated and Paraphrased Scientific Abstracts: A Fuzzy Topic Modeling Approach
    Onan, A; Çelikten, T
    In an era where Artificial Intelligence (AI) plays a pivotal role in the generation and paraphrasing of scientific literature, understanding its impact on the integrity and coherence of scholarly content is crucial. This study embarks on an exploratory analysis to assess the differences in topic modeling outcomes among three distinct sets of radiology-related abstracts: original scientific abstracts from PubMed, AI-paraphrased abstracts, and AI-generated abstracts. Utilizing advanced fuzzy topic modeling techniques, which excel in handling the inherent ambiguity and nuances in natural language, this research aims to provide a comprehensive analysis of topic interpretability, coherence, and diversity within these datasets. By applying methods such as Fuzzy Latent Semantic Analysis (FLSA) and its variants, FLSA-W and FLSA-V, the study endeavors to unearth the subtle semantic shifts and thematic variances introduced by AI in scientific discourse. The findings are expected to reveal critical insights into how AI transformations influence the thematic fabric of scientific literature, potentially reshaping our understanding of AI's role in scholarly communication. This research not only contributes to the discourse on AI in academic writing but also showcases the effectiveness of fuzzy topic modeling in analyzing complex text corpora, underscoring its significance in the ever-evolving landscape of computational linguistics.
  • No Thumbnail Available
    Item
    Artificial Intelligence Methods for Risk Assessment of Neuroblastoma
    Leblebici, A; Uncu, B; Onan, A; Baskin, Y; Olgun, N
  • No Thumbnail Available
    Item
    Satire identification in Turkish news articles based on ensemble of classifiers
    Onan, A; Tocoglu, MA
    Social media and microblogging platforms generally contain elements of figurative and nonliteral language, including satire. The identification of figurative language is a fundamental task for sentiment analysis. It will not be possible to obtain sentiment analysis methods with high classification accuracy if elements of figurative language have not been properly identified. Satirical text is a kind of figurative language, in which irony and humor have been utilized to ridicule or criticize an event or entity. Satirical news is a pervasive issue on social media platforms, which can be deceptive and harmful. This paper presents an ensemble scheme for satirical news identification in Turkish news articles. In the presented scheme, linguistic and psychological feature sets have been utilized to extract the feature sets (i.e. linguistic, psychological, personal, spoken categories, and punctuation). In the classification phase, accuracy rates of five supervised learning algorithms (i.e. naive Bayes algorithm, logistic regression, support vector machines, random forest, and k-nearest neighbor algorithm) with three widely utilized ensemble methods (i.e. AdaBoost, bagging, and random subspace) have been considered. Based on the results, we concluded that the random forest algorithm yielded the highest performance, with a classification accuracy of 96.92% for satire detection in Turkish. For deep learning-based architectures, we have achieved classification accuracy of 97.72% with the recurrent neural network architecture with attention mechanism.
  • No Thumbnail Available
    Item
    Evidence of associations between brain-derived neurotrophic factor (BDNF) serum levels and gene polymorphisms with tinnitus
    Coskunoglu, A; Orenay-Boyacioglu, S; Deveci, A; Bayam, M; Onur, E; Onan, A; Cam, FS
    Background: Brain-derived neurotrophic factor (BDNF) gene polymorphisms are associated with abnormalities in regulation of BDNF secretion. Studies also linked BDNF polymorphisms with changes in brainstem auditory-evoked response test results. Furthermore, BDNF levels are reduced in tinnitus, psychiatric disorders, depression, dysthymic disorder that may be associated with stress, conversion disorder, and suicide attempts due to crises of life. For this purpose, we investigated whether there is any role of BDNF changes in the pathophysiology of tinnitus. Materials and Methods: In this study, we examined the possible effects of BDNF variants in individuals diagnosed with tinnitus for more than 3 months. Fifty-two tinnitus subjects between the ages of 18 and 55, and 42 years healthy control subjects in the same age group, who were free of any otorhinolaryngology and systemic disease, were selected for examination. The intensity of tinnitus and depression was measured using the tinnitus handicap inventory, and the differential diagnosis of psychiatric diagnoses made using the Structured Clinical Interview for Fourth Edition of Mental Disorders. BDNF gene polymorphism was analyzed in the genomic deoxyribonucleic acid (DNA) samples extracted from the venous blood, and the serum levels of BDNF were measured. One-way analysis of variance and Chi-squared tests were applied. Results: Serum BDNF level was found lower in the tinnitus patients than controls, and it appeared that there is no correlation between BDNF gene polymorphism and tinnitus. Conclusions: This study suggests neurotrophic factors such as BDNF may have a role in tinnitus etiology. Future studies with larger sample size may be required to further confirm our results.
  • No Thumbnail Available
    Item
    Ensemble Methods for Opinion Mining
    Onan, A; Korukoglu, S
    Opinion mining is an emerging field which uses computer science methods to extract subjective information, such as opinion, emotion, and attitude inherent in opinion holder's text. One of the major issues in opinion mining is to enhance the predictive performance of classification algorithm. Ensemble methods used for opinion mining aim to obtain robust classification models by combining decisions obtained by multiple classifier training, rather than depending on a single classifier. In this study, the comparative performance of opinion mining datasets on Bagging, Dagging, Random Subspace and Adaboost ensemble methods with five different classifiers and six different data representation schemes are presented. The experimental results indicate that ensemble methods can be used for building efficient opinion mining classification methods.
  • No Thumbnail Available
    Item
    An improved ant algorithm with LDA-based representation for text document clustering
    Onan, A; Bulut, H; Korukoglu, S
    Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.
  • No Thumbnail Available
    Item
    A review of literature on the use of machine learning methods for opinion mining
    Onan, A; Korukoglu, S
    Opinion mining is an emerging field which uses methods of natural language processing, text mining and computational linguistics to extract subjective information of opinion holders. Opinion mining can be viewed as a classification problem. Hence, machine learning based methods are widely employed for sentiment classification. Machine learning based methods in opinion mining can be mainly classified as supervised, semi-supervised and unsupervised methods. In this study, main existing literature on the use of machine learning methods for opinion mining has been presented. Besides, the weak and strong characteristics of machine learning methods have been discussed.
  • No Thumbnail Available
    Item
    HybridGAD: Identification of AI-Generated Radiology Abstracts Based on a Novel Hybrid Model with Attention Mechanism
    Çelikten, T; Onan, A
    The purpose of this study is to develop a reliable method for distinguishing between AI-generated, paraphrased, and human-written texts, which is crucial for maintaining the integrity of research and ensuring accurate information flow in critical fields such as healthcare. To achieve this, we propose HybridGAD, a novel hybrid model that combines Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), and Bidirectional Gated Recurrent Unit (Bi-GRU) architectures with an attention mechanism. Our methodology involves training this hybrid model on a dataset of radiology abstracts, encompassing texts generated by AI, paraphrased by AI, and written by humans. The major findings of our analysis indicate that HybridGAD achieves a high accuracy of 98%, significantly outperforming existing state-of-the-art models. This high performance is attributed to the model's ability to effectively capture the contextual nuances and structural differences between AI-generated and human-written texts. In conclusion, HybridGAD not only enhances the accuracy of text classification in the field of radiology but also paves the way for more advanced medical diagnostic processes by ensuring the authenticity of textual information. Future research will focus on integrating textual and visual data for comprehensive radiology assessments and improving model generalization with partially labeled data. This study underscores the potential of HybridGAD in transforming medical text classification and highlights its applicability in ensuring the integrity and reliability of research in healthcare and beyond.
  • No Thumbnail Available
    Item
    A Term Weighted Neural Language Model and Stacked Bidirectional LSTM Based Framework for Sarcasm Identification
    Onan, A; Toçoglu, MALP
    Sarcasm identification on text documents is one of the most challenging tasks in natural language processing (NLP), has become an essential research direction, due to its prevalence on social media data. The purpose of our research is to present an effective sarcasm identification framework on social media data by pursuing the paradigms of neural language models and deep neural networks. To represent text documents, we introduce inverse gravity moment based term weighted word embedding model with trigrams. In this way, critical words/terms have higher values by keeping the word-ordering information. In our model, we present a three-layer stacked bidirectional long short-term memory architecture to identify sarcastic text documents. For the evaluation task, the presented framework has been evaluated on three-sarcasm identification corpus. In the empirical analysis, three neural language models (i.e., word2vec, fastText and GloVe), two unsupervised term weighting functions (i.e., term-frequency, and TF-IDF) and eight supervised term weighting functions (i.e., odds ratio, relevance frequency, balanced distributional concentration, inverse question frequency-question frequency-inverse category frequency, short text weighting, inverse gravity moment, regularized entropy and inverse false negative-true positive-inverse category frequency) have been evaluated. For sarcasm identification task, the presented model yields promising results with a classification accuracy of 95.30%.
  • No Thumbnail Available
    Item
    Weighted word embeddings and clustering-based identification of question topics in MOOC discussion forum posts
    Onan, A; Toçoglu, MA
    Massive open online courses (MOOCs) are recent and widely studied distance learning approaches aimed at providing learning material to learners from geographically dispersed locations without age, gender, or race-related constraints. MOOCs generally enriched by discussion forums to provide interactions among students, professors, and teaching assistants. MOOC discussion forum posts provide feedback regarding the students' learning processes, social interactions, and concerns. The purpose of our research is to present a document-clustering model on MOOC discussion forum posts based on weighted word embeddings and clustering to identify question topics on discussion posts. In this study, four word-embedding schemes (namely, word2vec, fastText, global vectors, and Doc2vec), four weighting functions (i.e., term frequency-inverse document frequency [IDF], IDF, smoothed IDF, and subsampling function), and four clustering algorithms (i.e., K-means, K-means++, self-organizing maps, and divisive analysis clustering algorithm) for document clustering and topic modeling on MOOC discussion forum posts have been evaluated. Twenty different feature representations obtained from word-embedding schemes and weighting functions have been obtained. The feature representation schemes have been evaluated in conjunction with four clustering methods. For the evaluation task, the empirical results for the latent Dirichlet allocation have been also included. The empirical results in terms of adjusted rand index, normalized mutual information, and adjusted mutual information indicate that weighted word-embedding schemes combined with clustering algorithms outperform the conventional schemes.
  • No Thumbnail Available
    Item
    Review Spam Detection Based on Psychological and Linguistic Features
    Onan, A
    With the advances in information and communication technologies, the immense quantity of review texts have become available on the Web. Review text can serve as an essential source of information for individual decision makers and business organizations. Some of the reviews shared on the Web may contain deceptive information to mislead the existing decision making process. In this study, we have presented a supervised learning based scheme for review spam detection. In the presented study, psychological and linguistic feature sets and their combinations are taken into consideration. In the study, the predictive performances of four conventional supervised learning methods (namely, Naive Bayes classifier, K-nearest neighbor algorithm, support vector machines and C4.5 algorithm) are evaluated on the different feature sets.
  • No Thumbnail Available
    Item
    Sarcasm Identification on Twitter: A Machine Learning Approach
    Onan, A
    In recent years, the remarkable growth in social media and microblogging platforms provide an essential source of information to identify subjective information of people, such as opinions, sentiments and attitudes. Sentiment analysis is the process of identifying subjective information from source materials towards an entity. Much of the social content online contain nonliteral language, such as irony and sarcasm, which may degrade the performance of sentiment classification schemes. In sarcastic text, the expressed text utterances and the intention of the person employing sarcasm can be completely opposite. In this paper, we present a machine learning approach to sarcasm identification. In this scheme, we utilized lexical, pragmatic, dictionary based and part of speech features. We employed two kinds of features to describe lexical information: unigrams and bigrams. In addition, term-frequency, term-presence and TF-IDF based representations are evaluated. To evaluate predictive performance of different representation schemes, Naive Bayes, support vector machines, logistic regression and k-nearest neighbor classifiers are utilized.
  • No Thumbnail Available
    Item
    Semantic Query Suggestion Based on Optimized Random Forests
    Onan, A
    Query suggestion is an integral part of Web search engines. Data-driven approaches to query suggestion aim to identify more relevant queries to users based on term frequencies and hence cannot fully reveal the underlying semantic intent of queries. Semantic query suggestion seeks to identify relevant queries by taking semantic concepts contained in user queries into account. In this paper, we propose a machine learning approach to semantic query suggestion based on Random Forests. The presented scheme employs an optimized Random Forest algorithm based on multi-objective simulated annealing and weighted voting. In this scheme, multi-objective simulated annealing is utilized to tune the parameters of Random Forests algorithm, i.e. the number of trees forming the ensemble and the number of features to split at each node. In addition, the weighted voting is utilized to combine the predictions of trees based on their predictive performance. The predictive performance of the proposed scheme is compared to conventional classification algorithms (such as Naive Bayes, logistic regression, support vector machines, Random Forest) and ensemble learning methods (such as AdaBoost, Bagging and Random Subspace). The experimental results on semantic query suggestion prove the superiority of the proposed scheme.
  • No Thumbnail Available
    Item
    The Use of Data Mining for Strategic Management: A Case Study on Mining Association Rules in Student Information System
    Onan, A; Bal, V; Bayam, BY
    In today's competitive conditions changes in business environment and business structures make strategic management an effective form of management for business and organizations. Strategic management is a current management strategy that requires setting of the appropriate strategies, plans and applications and putting them into action in order to reach the aims and goals of organizations. The process of strategic management involves setting the company's vision, mission and objectives, determining the competitive position, and the evaluation of results obtained by strategy selection, development and application. In the application of activities related to the strategic management of business processes, the discipline of data mining, which can be defined as the process of extracting useful and meaningful patterns from large volumes of data, emerges as a viable method. In this study, strategic management and data mining disciplines and their basic concepts and applications are introduced. Apart from that, data mining methods in the context of strategic management are taken into consideration. In addition, a sample case study about the use of association rule mining algorithms in student information systems data will be presented.
  • No Thumbnail Available
    Item
    A Stochastic Gradient Descent Based SVM with Fuzzy-Rough Feature Selection and Instance Selection for Breast Cancer Diagnosis
    Onan, A
    Breast cancer remains to be one of the most severe and deadly diseases among women in the world. Fortunately, a long survival rate for patients with not metastasized breast cancer can be achieved with the help of early detection, proper treatment and therapy. This urges the need to develop efficient classification models with high predictive performance. Machine learning and artificial intelligence based methods are effectively utilized for building classification models in medical domain. In this paper, fuzzy-rough feature selection based support vector machine classifier with stochastic gradient descent learning is proposed for breast cancer diagnosis. In the proposed model, fuzzy-rough feature selection with particle swarm optimization based search is used for obtaining a subset of relevant features for model. In order to select appropriate instances, a fuzzy-rough instance selection method is utilized. The effectiveness of the proposed classification approach is evaluated on Wisconsin Breast Cancer Dataset (WBCD) with classification evaluation metrics, such as classification accuracy, sensitivity, specificity, F-measure and kappa statistics. Experimental results indicate that the proposed model can achieve a very high predictive performance.
  • No Thumbnail Available
    Item
    Artificial Immune System Based Web Page Classification
    Onan, A
    Automated classification of web pages is an important research direction in web mining, which aims to construct a classification model that can classify new instances based on labeled web documents. Machine learning algorithms are adapted to textual classification problems, including web document classification. Artificial immune systems are a branch of computational intelligence inspired by biological immune systems which is utilized to solve a variety of computational problems, including classification. This paper examines the effectiveness and suitability of artificial immune system based approaches for web page classification. Hence, two artificial immune system based classification algorithms, namely Immunos-1 and Immunos-99 algorithms are compared to two standard machine learning techniques, namely C4.5 decision tree classifier and Naive Bayes classification. The algorithms are experimentally evaluated on 50 data sets obtained from DMOZ (Open Directory Project). The experimental results indicate that artificial immune based systems achieve higher predictive performance for web page classification.
  • «
  • 1 (current)
  • 2
  • »

Manisa Celal Bayar University copyright © 2002-2025 LYRASIS

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback