MCBÜ Açıkerişim Sistemi :: Browsing by Author "Özçift, A"

Browsing by Author "Özçift, A"

Now showing 1 - 6 of 6

Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data
Bozuyla, M; Özçift, A
The massive use of social media causes rapid information dissemination that amplifies harmful messages such as fake news. Fake-news is misleading information presented as factual news that is generally used to manipulate public opinion. In particular, fake news related to COVID-19 is defined as 'infodemic' by World Health Organization. An infodemic is a misleading information that causes confusion which may harm health. There is a high volume of misinformation about COVID-19 that causes panic and high stress. Therefore, the importance of development of COVID-19 related fake news identification model is clear and it is particularly important for Turkish language from COVID-19 fake news identification point of view. In this article, we propose an advanced deep language transformer model to identify the truth of Turkish COVID-19 news from social media. For this aim, we first generated Turkish COVID-19 news from various sources as a benchmark dataset. Then we utilized five conventional machine learning algorithms (i.e. Naive Bayes, Random Forest, K-Nearest Neighbor, Support Vector Machine, Logistic Regression) on top of several language preprocessing tasks. As a next step, we used novel deep learning algorithms such as Long Short -Term Memory, Bi-directional Long-Short-Term-Memory, Convolutional Neural Networks, Gated Recurrent Unit and Bi-directional Gated Recurrent Unit. For further evaluation, we made use of deep learning based language transformers, i.e. Bi-directional Encoder Representations from Transformers and its variations, to improve efficiency of the proposed approach. From the obtained results, we observed that neural transformers, in particular Turkish dedicated transformer BerTURK, is able to identify COVID-19 fake news in 98.5% accuracy.
Development of majority vote ensemble feature selection algorithm augmented with rank allocation to enhance Turkish text categorization
Borandag, E; Özçift, A; Kaygusuz, Y
The increase in the number of texts as digital documents from numerous sources such as customer reviews, news, and social media has made text categorization crucial in order to be able to manage the enormous amount of data. The high dimensional nature of these texts requires a preliminary feature selection task to reduce the feature space with a potential increase in the prediction accuracy. In this study, we developed an ensemble feature selection method, namely majority vote rank allocation, was developed for Turkish text categorization purposes. The method uses a majority voting ensemble strategy in combination with a rank allocation approach to combine weak filters such as information gain, symmetric uncertainty, relief, and correlation-based feature selection. Thus, the proposed method measures the quality of the features among all features with the majority votes of the filters and ranking allocation. The feature selection efficacy of the method was tested on two datasets, one from the literature and a newly collected dataset. The effect of the obtained features on the classification prediction performance was evaluated on top of the naive bayes, support vector machine J48, and random forests algorithms. It was empirically observed that the developed method improved the prediction accuracies of the classifiers compared to the mentioned filters. The statistical significance of the experimental results were also validated with the use of a two-way analysis of variance test.
A New Approach for Prediction of Solar Radiation with Using Ensemble Learning Algorithm
Basaran, K; Özçift, A; Kilinç, D
This article investigates the competence of ensemble learning techniques in solar irradiance prediction. It was seen from the literature survey, an ensemble tree model, random forests is studied more frequently as ensemble models. However, ensemble of support vector regression (SVR) and artificial neural networks (ANN) is also possible. So, this study is the first detailed evaluation of ensemble models in solar irradiance estimation domain. Boosting and bagging ensembles of SVR, ANN and decision tree (DT), are developed to estimate solar irradiance in hourly basis in five cities in Turkey. First frequently used base models (SVR, ANN, and DT) are created and tested with the use of 5 years meteorological data. Then boosting and bagging ensembles of the base models are developed and tested with the same data. The base models are compared with their ensemble counterparts in terms of average coefficient of determination (R-2) and root mean squared error (RMSE). The comparative results show that boosting and bagging ensemble models improve SVR, ANN, and DT in terms of RMSE between 4.6 and 14.6% in average. The results show empirically that ensemble models improve prediction accuracies of various base regression models and it can be applied to other machine learning models used in solar irradiance prediction.
A Genetic Optimized Federated Learning Approach for Joint Consideration of End-to-End Delay and Data Privacy in Vehicular Networks
Erel-Özçevik, M; Özçift, A; Özçevik, Y; Yücalar, F
In 5G vehicular networks, two key challenges have become apparent, including end-to-end delay minimization and data privacy. Learning-based approaches have been used to alleviate these, either by predicting delay or protecting privacy. Traditional approaches train machine learning models on local devices or cloud servers, each with their own trade-offs. While pure-federated learning protects privacy, it sacrifices delay prediction performance. In contrast, centralized training improves delay prediction but violates privacy. Existing studies in the literature overlook the effect of training location on delay prediction and data privacy. To address both issues, we propose a novel genetic algorithm optimized federated learning (GAoFL) approach in which end-to-end delay prediction and data privacy are jointly considered to obtain an optimal solution. For this purpose, we analytically define a novel end-to-end delay formula and data privacy metrics. Accordingly, a novel fitness function is formulated to optimize both the location of training model and data privacy. In conclusion, according to the evaluation results, it can be advocated that the outcomes of the study highlight that training location significantly affects privacy and performance. Moreover, it can be claimed that the proposed GAoFL improves data privacy compared to centralized learning while achieving better delay prediction than other federated methods, offering a valuable solution for 5G vehicular computing.
Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish
Özçift, A; Akarsu, K; Yumuk, F; Söylemez, C
Language model pre-training architectures have demonstrated to be useful to learn language representations. bidirectional encoder representations from transformers (BERT), a recent deep bidirectional self-attention representation from unlabelled text, has achieved remarkable results in many natural language processing (NLP) tasks with fine-tuning. In this paper, we want to demonstrate the efficiency of BERT for a morphologically rich language, Turkish. Traditionally morphologically difficult languages require dense language pre-processing steps in order to model the data to be suitable for machine learning (ML) algorithms. In particular, tokenization, lemmatization or stemming and feature engineering tasks are needed to obtain an efficient data model to overcome data sparsity or high-dimension problems. In this context, we selected five various Turkish NLP research problems as sentiment analysis, cyberbullying identification, text classification, emotion recognition and spam detection from the literature. We then compared the empirical performance of BERT with the baseline ML algorithms. Finally, we found enhanced results compared to base ML algorithms in the selected NLP problems while eliminating heavy pre-processing tasks.
TTC-3600: A new benchmark dataset for Turkish text categorization
Kilinç, D; Özçift, A; Bozyigit, F; Yildirim, P; Yücalar, F; Borandag, E
Owing to the rapid growth of the World Wide Web, the number of documents that can be accessed via the Internet explosively increases with each passing day. Considering news portals in particular, sometimes documents related to categories such as technology, sports and politics seem to be in the wrong category or documents are located in a generic category called others. At this point, text categorization (TC), which is generally addressed as a supervised learning task is needed. Although there are substantial number of studies conducted on TC in other languages, the number of studies conducted in Turkish is very limited owing to the lack of accessibility and usability of datasets created. In this paper, a new dataset named TTC-3600, which can be widely used in studies of TC of Turkish news and articles, is created. TTC-3600 is a well-documented dataset and its file formats are compatible with well-known text mining tools. Five widely used classifiers within the field of TC and two feature selection methods are evaluated on TTC-3600. The experimental results indicate that the best accuracy criterion value 91.03% is obtained with the combination of Random Forest classifier and attribute ranking-based feature selection method in all comparisons performed after pre-processing and feature selection steps. The publicly available TTC-3600 dataset and the experimental results of this study can be utilized in comparative experiments by other researchers.

Browsing by Author "Özçift, A"

Results Per Page

Sort Options