Порівняння ефективності методів заповнення пропущених даних під час розроблення моделей прогнозування
Missing data is a common issue in data analysis and machine learning. This article analyzes the impact of missing data imputation methods during the data preprocessing stage on the quality of forecasting models. Selected methods are listwise deletion, mean imputation, and two implementations of the...
Saved in:
| Date: | 2025 |
|---|---|
| Main Author: | |
| Format: | Article |
| Language: | English |
| Published: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2025
|
| Subjects: | |
| Online Access: | http://journal.iasa.kpi.ua/article/view/301918 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Journal Title: | System research and information technologies |
Institution
System research and information technologies| Summary: | Missing data is a common issue in data analysis and machine learning. This article analyzes the impact of missing data imputation methods during the data preprocessing stage on the quality of forecasting models. Selected methods are listwise deletion, mean imputation, and two implementations of the multiple imputation method in Python and R languages. Selected classifiers are Logistic Regression, Random Forest, Support Vector Machine, and Light Gradient Boosting Machine. The performance quality of forecasting models is estimated using accuracy, precision, and recall metrics. Two datasets were used as binary classification problems with different target metrics. The highest performance was achieved when the R implementation of the multiple imputation method was combined with RF and LGBM classifiers. |
|---|