MetricHunter: A software metric dataset generator utilizing SourceMonitor upon public GitHub repositories
No Thumbnail Available
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Version control systems are pervasively consulted nowadays to obtain software metric datasets. Accordingly, machine learning is applied to predict different aspects of a software including quality monitoring, influence analysis, etc. However, construction of a metric dataset is challenging and the dataset content may affect the success of the learning-based models. In this study, we propose a dataset construction tool, MetricHunter, which is able to produce platform/language specific datasets that can be used for predicting the features of newly created software. The proposed tool is developed by C# programming language utilizing a known metric gathering tool, i.e. SourceMonitor, and the GitHub REST API for public repositories. Thus, one can construct a proper dataset from a graphical user interface by simply specifying the programming language or target platform. The outputs of the tool on a set of repositories are validated by investigating automatically generated attribute values and comparing them with the measurements of metric gathering tools as well as the GitHub metric values.& COPY; 2023 The Author(s). Published by Elsevier B.V.This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).