Порівняння ефективності методів заповнення пропущених даних під час розроблення моделей прогнозування
Missing data is a common issue in data analysis and machine learning. This article analyzes the impact of missing data imputation methods during the data preprocessing stage on the quality of forecasting models. Selected methods are listwise deletion, mean imputation, and two implementations of the...
Gespeichert in:
| Datum: | 2025 |
|---|---|
| 1. Verfasser: | |
| Format: | Artikel |
| Sprache: | Englisch |
| Veröffentlicht: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2025
|
| Schlagworte: | |
| Online Zugang: | http://journal.iasa.kpi.ua/article/view/301918 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | System research and information technologies |
Institution
System research and information technologies| Zusammenfassung: | Missing data is a common issue in data analysis and machine learning. This article analyzes the impact of missing data imputation methods during the data preprocessing stage on the quality of forecasting models. Selected methods are listwise deletion, mean imputation, and two implementations of the multiple imputation method in Python and R languages. Selected classifiers are Logistic Regression, Random Forest, Support Vector Machine, and Light Gradient Boosting Machine. The performance quality of forecasting models is estimated using accuracy, precision, and recall metrics. Two datasets were used as binary classification problems with different target metrics. The highest performance was achieved when the R implementation of the multiple imputation method was combined with RF and LGBM classifiers. |
|---|