Development a method for ensuring the quality of comments in version control systems based on transformer models

This study substantiates the importance of addressing the problem of improving the quality of commit message descriptions in source code version control systems, which play a crucial role in collaborative software development and maintenance. Commit messages often serve as a primary source of contex...

Повний опис

Збережено в:
Бібліографічні деталі
Дата:2025
Автори: Семьонов, Б. О., Погорілий, С. Д.
Формат: Стаття
Мова:Українська
Опубліковано: Інститут проблем реєстрації інформації НАН України 2025
Теми:
Онлайн доступ:http://drsp.ipri.kiev.ua/article/view/345503
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:Data Recording, Storage & Processing

Репозитарії

Data Recording, Storage & Processing
Опис
Резюме:This study substantiates the importance of addressing the problem of improving the quality of commit message descriptions in source code version control systems, which play a crucial role in collaborative software development and maintenance. Commit messages often serve as a primary source of contextual information for developers, code reviewers, and automated analysis tools. However, in practice, these descriptions may be incomplete, overly generic, or uninformative, which complicates the understanding of change history and can negatively influence further development processes. Therefore, the task of automatically identifying low-quality commit messages and assisting developers in generating more meaningful descriptions becomes particularly relevant. Machine learning methods, in particular neural networks of various architectures, are applied for commit message filtering and classification. The use of neural networks is justified by their ability to effectively capture semantic nuances within short text fragments and generalize patterns from large sets of repository metadata. A comparative analysis of Transformer-based language models, such as BERT, RoBERTa, and DistilBERT, and their application in binary classifiers for commit message quality filtering is presented. The models were trained on a dataset of commit descriptions obtained through the GitHub REST API, which includes both high-quality and low-quality real-world examples. The evaluation of model performance was carried out using Accuracy and F1-score metrics, which demonstrated the advantages of Transformer architectures in capturing contextual meaning. Additionally, the effectiveness of Google Colab as an environment for prototyping and experimenting with machine learning models has been confirmed, due to its accessible computing resources, integration with the Python ecosystem, and suitability for rapid iteration and evaluation. Tabl.: 7. Fig.: 2. Refs: 20 titles.