Development a method for ensuring the quality of comments in version control systems based on transformer models

This study substantiates the importance of addressing the problem of improving the quality of commit message descriptions in source code version control systems, which play a crucial role in collaborative software development and maintenance. Commit messages often serve as a primary source of contex...

Full description

Saved in:

Bibliographic Details
Date:	2025
Main Authors:	Семьонов, Б. О., Погорілий, С. Д.
Format:	Article
Language:	Ukrainian
Published:	Інститут проблем реєстрації інформації НАН України 2025
Subjects:	AdamW algorithm BERT commit message DistilBERT F1-score GitHub REST API program source code repository RoBERTa software Transformer version control system
Online Access:	http://drsp.ipri.kiev.ua/article/view/345503
Tags:	Add Tag No Tags, Be the first to tag this record!
Journal Title:	Data Recording, Storage & Processing

Institution

Data Recording, Storage & Processing

_version_	1856543234613837824
author	Семьонов, Б. О. Погорілий, С. Д.
author_facet	Семьонов, Б. О. Погорілий, С. Д.
author_sort	Семьонов, Б. О.
baseUrl_str
collection	OJS
datestamp_date	2025-12-21T03:44:45Z
description	This study substantiates the importance of addressing the problem of improving the quality of commit message descriptions in source code version control systems, which play a crucial role in collaborative software development and maintenance. Commit messages often serve as a primary source of contextual information for developers, code reviewers, and automated analysis tools. However, in practice, these descriptions may be incomplete, overly generic, or uninformative, which complicates the understanding of change history and can negatively influence further development processes. Therefore, the task of automatically identifying low-quality commit messages and assisting developers in generating more meaningful descriptions becomes particularly relevant. Machine learning methods, in particular neural networks of various architectures, are applied for commit message filtering and classification. The use of neural networks is justified by their ability to effectively capture semantic nuances within short text fragments and generalize patterns from large sets of repository metadata. A comparative analysis of Transformer-based language models, such as BERT, RoBERTa, and DistilBERT, and their application in binary classifiers for commit message quality filtering is presented. The models were trained on a dataset of commit descriptions obtained through the GitHub REST API, which includes both high-quality and low-quality real-world examples. The evaluation of model performance was carried out using Accuracy and F1-score metrics, which demonstrated the advantages of Transformer architectures in capturing contextual meaning. Additionally, the effectiveness of Google Colab as an environment for prototyping and experimenting with machine learning models has been confirmed, due to its accessible computing resources, integration with the Python ecosystem, and suitability for rapid iteration and evaluation. Tabl.: 7. Fig.: 2. Refs: 20 titles.
first_indexed	2026-02-08T07:59:59Z
format	Article
id	drspiprikievua-article-345503
institution	Data Recording, Storage & Processing
language	Ukrainian
last_indexed	2026-02-08T07:59:59Z
publishDate	2025
publisher	Інститут проблем реєстрації інформації НАН України
record_format	ojs
spelling	drspiprikievua-article-3455032025-12-21T03:44:45Z Development a method for ensuring the quality of comments in version control systems based on transformer models Створення методу забезпечення якості коментарів у системах контролю версій на основі трансформерних моделей Семьонов, Б. О. Погорілий, С. Д. AdamW-алгоритм, BERT, commit message, DistilBERT, GitHub REST API, RoBERTa, Transformer, вихідний текст програми, ПЗ, повідомлення про внесені зміни, програмне забезпечення, репозиторій, середнє гармонійне, система контролю версій AdamW algorithm, BERT, commit message, DistilBERT, F1-score, GitHub REST API, program source code, repository, RoBERTa, software, Transformer, version control system This study substantiates the importance of addressing the problem of improving the quality of commit message descriptions in source code version control systems, which play a crucial role in collaborative software development and maintenance. Commit messages often serve as a primary source of contextual information for developers, code reviewers, and automated analysis tools. However, in practice, these descriptions may be incomplete, overly generic, or uninformative, which complicates the understanding of change history and can negatively influence further development processes. Therefore, the task of automatically identifying low-quality commit messages and assisting developers in generating more meaningful descriptions becomes particularly relevant. Machine learning methods, in particular neural networks of various architectures, are applied for commit message filtering and classification. The use of neural networks is justified by their ability to effectively capture semantic nuances within short text fragments and generalize patterns from large sets of repository metadata. A comparative analysis of Transformer-based language models, such as BERT, RoBERTa, and DistilBERT, and their application in binary classifiers for commit message quality filtering is presented. The models were trained on a dataset of commit descriptions obtained through the GitHub REST API, which includes both high-quality and low-quality real-world examples. The evaluation of model performance was carried out using Accuracy and F1-score metrics, which demonstrated the advantages of Transformer architectures in capturing contextual meaning. Additionally, the effectiveness of Google Colab as an environment for prototyping and experimenting with machine learning models has been confirmed, due to its accessible computing resources, integration with the Python ecosystem, and suitability for rapid iteration and evaluation. Tabl.: 7. Fig.: 2. Refs: 20 titles. Обґрунтовано важливість розв’язання задачі підвищення якості описів до змін у вихідних текстах програм у контексті систем контролю версій. Для фільтрації коментарів застосовано методи машинного навчання, зокрема нейронні мережі різних архітектур. Використання нейронних мереж є доцільним через потребу в автоматичному виявленні описів, що точно відображають призначення внесених змін. Проведено порівняльний аналіз моделей на основі Transformer-архітектур, таких як BERT, RoBERTa та DistilBERT, та їхнє застосування у бінарних класифікаторах для фільтрації змін. Здійснено навчання моделей на множині описів до внесених змін, отриманих за допомогою спеціального програмного інтерфейсу GitHub REST API. Проведено оцінювання точності моделей через використання метрик: точності (Accuracy) та середнього гармонійного (F1-score). Також підтверджено ефективність середовища Google Colab для прототипування моделей машинного навчання. Інститут проблем реєстрації інформації НАН України 2025-09-16 Article Article application/pdf http://drsp.ipri.kiev.ua/article/view/345503 10.35681/1560-9189.2025.27.2.345503 Data Recording, Storage & Processing; Vol. 27 No. 2 (2025); 38-51 Регистрация, хранение и обработка данных; Том 27 № 2 (2025); 38-51 Реєстрація, зберігання і обробка даних; Том 27 № 2 (2025); 38-51 1560-9189 uk http://drsp.ipri.kiev.ua/article/view/345503/334385 Авторське право (c) 2025 Реєстрація, зберігання і обробка даних
spellingShingle	AdamW algorithm BERT commit message DistilBERT F1-score GitHub REST API program source code repository RoBERTa software Transformer version control system Семьонов, Б. О. Погорілий, С. Д. Development a method for ensuring the quality of comments in version control systems based on transformer models
title	Development a method for ensuring the quality of comments in version control systems based on transformer models
title_alt	Створення методу забезпечення якості коментарів у системах контролю версій на основі трансформерних моделей
title_full	Development a method for ensuring the quality of comments in version control systems based on transformer models
title_fullStr	Development a method for ensuring the quality of comments in version control systems based on transformer models
title_full_unstemmed	Development a method for ensuring the quality of comments in version control systems based on transformer models
title_short	Development a method for ensuring the quality of comments in version control systems based on transformer models
title_sort	development a method for ensuring the quality of comments in version control systems based on transformer models
topic	AdamW algorithm BERT commit message DistilBERT F1-score GitHub REST API program source code repository RoBERTa software Transformer version control system
topic_facet	AdamW-алгоритм BERT commit message DistilBERT GitHub REST API RoBERTa Transformer вихідний текст програми ПЗ повідомлення про внесені зміни програмне забезпечення репозиторій середнє гармонійне система контролю версій AdamW algorithm BERT commit message DistilBERT F1-score GitHub REST API program source code repository RoBERTa software Transformer version control system
url	http://drsp.ipri.kiev.ua/article/view/345503
work_keys_str_mv	AT semʹonovbo developmentamethodforensuringthequalityofcommentsinversioncontrolsystemsbasedontransformermodels AT pogorílijsd developmentamethodforensuringthequalityofcommentsinversioncontrolsystemsbasedontransformermodels AT semʹonovbo stvorennâmetoduzabezpečennââkostíkomentarívusistemahkontrolûversíjnaosnovítransformernihmodelej AT pogorílijsd stvorennâmetoduzabezpečennââkostíkomentarívusistemahkontrolûversíjnaosnovítransformernihmodelej

Development a method for ensuring the quality of comments in version control systems based on transformer models

Institution

Similar Items