Вивчення зв’язку між туберкульозом та соціально-економічними, медичними, демографічними чинниками в Україні
Ukraine is currently experiencing a new, ongoing tuberculosis offensive. Our study analyzes the impact of various socioeconomic and medical factors, including the number of specialized hospitals, fluoroscopic examinations of the population, the number of healthcare workers, the level of alcohol and...
Gespeichert in:
| Datum: | 2025 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Artikel |
| Sprache: | Englisch |
| Veröffentlicht: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2025
|
| Schlagworte: | |
| Online Zugang: | https://journal.iasa.kpi.ua/article/view/303481 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | System research and information technologies |
| Завантажити файл: | |
Institution
System research and information technologies| _version_ | 1867334443227676672 |
|---|---|
| author | Nevinskyi, Denys Martjanov, Dmytro Semianiv, Ihor Vyklyuk, Yaroslav |
| author_facet | Nevinskyi, Denys Martjanov, Dmytro Semianiv, Ihor Vyklyuk, Yaroslav |
| author_institution_txt_mv | [
{
"author": "Denys Nevinskyi",
"institution": "Національний університет “Львівська політехніка”, Львів"
},
{
"author": "Dmytro Martjanov",
"institution": "Національний університет “Львівська політехніка”, Львів"
},
{
"author": "Ihor Semianiv",
"institution": "Буковинський Державний Медичний Університет, Чернівці"
},
{
"author": "Yaroslav Vyklyuk",
"institution": "Національний університет “Львівська політехніка”, Львів"
}
] |
| author_sort | Nevinskyi, Denys |
| baseUrl_str | http://journal.iasa.kpi.ua/oai |
| collection | OJS |
| datestamp_date | 2025-05-20T17:56:07Z |
| description | Ukraine is currently experiencing a new, ongoing tuberculosis offensive. Our study analyzes the impact of various socioeconomic and medical factors, including the number of specialized hospitals, fluoroscopic examinations of the population, the number of healthcare workers, the level of alcohol and drug abuse, and others, on the prevalence of tuberculosis among different demographic groups in Ukraine. Artificial intelligence methods made it possible to identify key factors contributing to the growth or decline in tuberculosis incidence. The results of the SHAP (SHapley Additive exPlanations) analysis, which offers a methodology for interpreting complex machine learning models, shows the most important factors that influence the incidence of tuberculosis in Ukraine. The sensitivity analysis provided more important and detailed information, which confirmed the results of the SHAP analysis. |
| doi_str_mv | 10.20535/SRIT.2308-8893.2025.1.02 |
| first_indexed | 2025-07-17T10:28:29Z |
| format | Article |
| fulltext |
Publisher IASA at the Igor Sikorsky Kyiv Polytechnic Institute, 2025
Системні дослідження та інформаційні технології, 2025, № 1 19
UDC 004.02, 004.67, 004.891.3, 616.24-002.5-02:316.342.6:316.62:314(477)
DOI: 10.20535/SRIT.2308-8893.2025.1.02
STUDYING THE RELATIONSHIP BETWEEN TUBERCULOSIS
AND SOCIOECONOMIC, MEDICAL,
AND DEMOGRAPHIC FACTORS IN UKRAINE
D.V. NEVINSKYI, D.I. MARTJANOV, I.O. SEMIANIV, Y.I. VYKLYUK
Abstract. Ukraine is currently experiencing a new, ongoing tuberculosis offensive.
Our study analyzes the impact of various socioeconomic and medical factors, in-
cluding the number of specialized hospitals, fluoroscopic examinations of the popu-
lation, the number of healthcare workers, the level of alcohol and drug abuse, and
others, on the prevalence of tuberculosis among different demographic groups in
Ukraine. Artificial intelligence methods made it possible to identify key factors con-
tributing to the growth or decline in tuberculosis incidence. The results of the SHAP
(SHapley Additive exPlanations) analysis, which offers a methodology for interpret-
ing complex machine learning models, shows the most important factors that influ-
ence the incidence of tuberculosis in Ukraine. The sensitivity analysis provided
more important and detailed information, which confirmed the results of the SHAP
analysis.
Keywords: artificial intelligence, tuberculosis, incidence, socio-demographic fac-
tors, medical factors, demographic factors.
RELEVANCE OF THE WORK
Currently, Ukraine is experiencing a new, regular offensive of tuberculosis. In the
current conditions of development of Ukrainian society, one of the important
problems that needs to be addressed is the spread of tuberculosis, a disease that is
closely related to socioeconomic, medical and demographic factors [1]. The fact
is that tuberculosis, as a social disease, is a mirror of socioeconomic well-being in
the country [2].
The analysis of the ways of spreading, negative consequences for public
health and other aspects of the spread of tuberculosis has long been the focus of
research [3]. At the same time, the study of socioeconomic, medical and
demographic reasons that influence the spread of tuberculosis in Ukrainian
society remains an unexplored area of research.
Only a medical approach to the analysis of socio-economic, medical and
demographic factors that affect the incidence of tuberculosis in Ukraine is insuffi-
cient in timely forecasting the prospects for the development of the tuberculosis
epidemic and developing an appropriate plan to counter its challenges, as a result
of which the incidence of tuberculosis remains extremely threatening not only to
the life and health of our citizens, but also gives reason to consider this situation
as a threat to the WHO European region [4].
Therefore, we used mathematical analysis with the use of artificial intelli-
gence to establish the relationship between tuberculosis and socioeconomic, med-
ical, and demographic factors in Ukraine.
D.V. Nevinskyi, D.I. Martjanov, I.O. Semianiv, Y. I. Vyklyuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 20
ANALYSIS OF RESEARCH
Today, scientists are conducting research and modeling of the spread of tubercu-
losis [5]. Another study highlights how socioeconomic conditions contribute to
the spread of tuberculosis [6]. In [7], the authors analyze how access to health
care affects the effectiveness of tuberculosis control. They also consider how de-
mographic changes affect the incidence of tuberculosis [8]. An overview of pro-
gress in the use of artificial intelligence (AI) in medicine [9]
The use of artificial intelligence in tuberculosis research is becoming in-
creasingly popular due to its ability to analyze large data sets, identify complex
relationships, and predict epidemiological trends. In particular, [10] uses various
machine learning algorithms to predict the incidence of tuberculosis, which al-
lows for high accuracy predictions and identification of regions at high risk of
disease spread [11]. The authors have developed a deep learning-based system to
automatically detect major chest diseases, including tuberculosis, in X-rays [12].
Although this study focuses on COVID-19, the methodologies and technologies
they use can be adapted to monitor and predict the spread of tuberculosis, demon-
strating the potential of AI in global epidemic management [13]. In this review,
the authors discuss the possibilities of machine learning in the medical field, in-
cluding its ability to integrate and analyze large amounts of data on socioeco-
nomic factors to better understand their impact on the spread of tuberculosis.
However, there are currently no studies that examine the complex impact of
various factors on the spread of tuberculosis based on artificial intelligence tech-
nology.
Therefore, the purpose of our work is to analyze the impact of various so-
cioeconomic, medical, and demographic factors on the incidence of tuberculosis
among the urban and rural population of Ukraine, in order to identify key factors
that can contribute to the development of more effective strategies for controlling
and preventing the disease.
MATERIALS AND METHODS
Description of the dataset. The dataset for analyzing the impact of various so-
cioeconomic, medical, and demographic factors on tuberculosis incidence con-
sists of the above fields and contains 400 records. The data was collected over the
last 16 years and covers all regions of Ukraine. This dataset includes information
on the number of specialized hospitals, the number of fluoroscopic examinations
per 100.000 population, vaccination data, the number of bacterial isolators, the
incidence among urban and rural residents, and the percentage of different demo-
graphic groups (workers, employees, healthcare workers, students, pupils, pen-
sioners, unemployed, persons returned from prison, persons without permanent
residence, private workers).
The dataset also includes indicators reflecting the level of alcohol abuse and
drug use, the incidence of doctors in specialized hospitals per 10 thousand health
care workers, HIV/TB rates per 100 thousand people, cases of resistant TB,
treatment failure, interrupted treatment, patients dropped out of follow-up, treat-
Studying the relationship between tuberculosis and socioeconomic, medical, …
Системні дослідження та інформаційні технології, 2025, № 1 21
ment outcomes for relapses and multidrug-resistant tuberculosis (MDR-TB), and
the number of surgical interventions (lung and extrapulmonary TB surgeries).
Research methodology. The research consists of the following steps:
1. Correlation analysis. At the first stage of the study, correlation analysis
is used to identify statistical relationships between various factors (e.g., number of
hospitals, healthcare workers, vaccination rates) and TB incidence. This allows us
to determine which variables have a potential impact on the prevalence of the dis-
ease. The use of Pearson correlation coefficient helps to assess the strength and
direction of the interaction between variables.
2. Testing different models with cross-validation. The next step is to test
different machine learning models, such as linear regression, decision trees, ran-
dom forest, kNN, support vector machine (SVM), adaptive boosting (AdaBoost),
stochastic gradient descent, back propagation neural networks. Cross-validation is
used to check the stability of models, in our case through 5-fold cross-validation,
where the data is divided into 5 subsets and the model is tested 5 times, each time
using one subset as a test set and the others as training data. The consistency of
the cross-validation results served as an indicator of the presence of overfitting in
these machine learning models and the selection of their hyperparameters. The
following hyperparameters were selected: Linear Regression — Elastic Net regu-
larization 50/502/1 LL , Decision Trees and Random Forest — maximum
depth 5d , Nearest Neighbors Method — selecting the optimal value of the
nearest neighbors — 5k , Support Vector Machine (SVM) — selecting the pa-
rameters 8.0C and 1.0 , Adaptive Boosting — Limiting the number of base
models: n_estimators = 50, Stochastic Gradient Descent (SGD) — Elastic Net
reegularozation 50/502/1 LL , Backpropagation — regularization 001.0 and
Dropout 2.0d .
3. Building an ensemble of models. Based on the obtained models, an en-
semble is built that combines the forecasts of the best models to improve the ac-
curacy and reliability of the results. The study used a stacking-based ensemble,
which allowed us to consider various aspects of the data and reduce the variability
of the forecast.
4. Analysis of an ensemble of models. This analysis evaluates the overall
performance of the model ensemble. It evaluates how the combination of models
performs compared to individual models, including an assessment of accuracy,
specificity, and other fit metrics.
5. Determining the importance of factors. Factor importance analysis is
conducted to identify the key variables that have the greatest impact on morbidity.
This may include the use of importance metrics provided by the algorithms that
are included in the ensemble model.
6. Sensitivity analysis. The final step of sensitivity analysis tests the robust-
ness of the model ensemble to changes in the data or in the model parameters.
This involves varying key parameters and assessing the impact of these changes
on the model results.
The study was conducted in the Orange environment. The data flow diagram
is shown in Fig. 1.
D.V. Nevinskyi, D.I. Martjanov, I.O. Semianiv, Y. I. Vyklyuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 22
RESULTS OF THE STUDY
Correlation analysis. Table 1 shows the results of the correlation analysis, which
presents the values of the coefficients of determination R2 for various factors that
may affect the incidence of tuberculosis. The coefficient of determination R2
measures the proportion of variation in a given variable that can be explained by
the independent variables in the model. The key conclusions from the table in-
clude:
Bacterial excretion has the highest coefficient R2 = 0.641, indicating a
strong relationship between the frequency of bacterial excretion in the population
and the incidence of tuberculosis.
HIV/TB (the ratio of HIV and TB incidence per 100.000 population) also
has a significant coefficient of 542.02 R , which emphasizes the link between
these two diseases.
Fluoroscopic examinations have a coefficient of 501.02 R , which in-
dicates the importance of regular medical examinations in detecting and control-
ling tuberculosis, especially in risk groups.
Physician morbidity and surgical treatment also show relatively high
R2 values, which may reflect the impact of non-compliance with infection control
conditions and the importance of surgery in some cases as an additional treatment
method.
The low 2R coefficients for variables such as alcohol and drug abuse
and demographic groups (e.g., pensioners, students, workers) indicate a less
pronounced direct impact of these factors on morbidity compared to medical and
epidemiological factors.
Fig. 1. The scheme of information flows of the study
Studying the relationship between tuberculosis and socioeconomic, medical, …
Системні дослідження та інформаційні технології, 2025, № 1 23
T a b l e 1 . Results of the correlation analysis
Factor R2
Bacterial excretion 0.641
HIV/TB (per 100 thousand) 0.542
Fluoroscopic examinations of the population (per 100 thousand) 0.501
Morbidity rate of doctors (per 10 thousand medical staff) 0.48
Surgical treatment (easy number of operations) 0.468
Resistant TB 0.466
Interrupted treatment 0.433
Unsuccessful treatment 0.387
Relapse rate (interrupted treatment) 0.379
Relapse rate (cured) 0.378
Expelled. 0.369
Non-operational (% of total) 0.364
MLS-TV (withdrawn) 0.335
Surgical treatment (total number of operations) 0.317
Relapse rate (unsuccessful treatment) 0.311
Recidivism rate (discharged) 0.308
Pensioners (% of total) -0.294
Number of hospitals 0.216
Vaccinations carried out 0.2
R-treatment of MDR-TB (interrupted treatment) 0.146
Drug use (% of total) 0.118
Without a permanent place of residence (% of total) -0.111
Alcohol abuse (% of total) -0.107
Employees (% of total) -0.091
MDR-TB treatment (failed treatment) 0.076
Private employees (% of total) -0.056
Students (% of total) 0.052
Employees (% of total) -0.047
People who returned from places of deprivation of liberty (% of the total) -0.019
Students (% of total) -0.01
Medical workers (% of total) 0.002
Testing different models by cross-validation. The next step was to analyze
the performance of the above machine learning models in the context of tubercu-
losis incidence prediction using the 5-fold cross-validation method. The main
parameters evaluated include the mean square error (MSE), root mean square
error (RMSE), mean absolute error (MAE), mean absolute percentage error
(MAPE), and coefficient of determination (R²). The results of the study are
presented in Table 2.
D.V. Nevinskyi, D.I. Martjanov, I.O. Semianiv, Y. I. Vyklyuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 24
T a b l e 2 . Results of testing different machine learning model
Machine Learning Model MSE RMSE MAE MAPE R2
Linear Regression 108.04 10.39 7.87 0.14 0.71
Neural Network 111.52 10.56 7.54 0.14 0.70
kNN 265.11 16.28 11.93 0.26 0.29
Decision Tree 191.64 13.84 9.42 0.18 0.49
Random Forest 80.92 9.00 6.64 0.13 0.78
SVM 255.39 15.98 11.69 0.25 0.32
AdaBoost 72.49 8.51 6.22 0.12 0.81
Stochastic Gradient Descent 132.32 11.50 8.52 0.16 0.65
Stacking 62.99 7.94 5.78 0.11 0.83
As can be seen from the table, the linear regression performed satisfactorily
with a coefficient of determination of 71.02 R , indicating that the model is
moderately adequate for this data set. Although the RMSE and MSE are relatively
high, this indicates potential deviations in predictions, especially when consider-
ing large and complex data. Neural networks are almost equal to linear regression
in terms of 2R , but require more careful tuning and computational resources.
This model can be particularly sensitive to overfitting due to the complexity of the
model structure.
The KNN model showed the worst results with 29.02 R , which indicates
low prediction accuracy. The high MSE and RMSE values emphasize that the
model does not work efficiently with the data, possibly due to insufficient data for
training or mismatched model parameters. Decision trees showed average results
)49.0( 2 R . This model is sensitive to changes in the data, and can create com-
plex structures that lead to overfitting, especially in cases where tree pruning
techniques are not applied.
Random Forest showed one of the best results )78.0( 2 R , demonstrating
high accuracy and reliability of predictions. It efficiently manages overfitting and has
a high classification and regression capability, thanks to the ensemble approach.
The support vector machine (SVM) method showed low efficiency
)32.0( 2 R with high MSE and RMSE, which may indicate the need to refine
and optimize the kernel parameters to improve prediction.
Adaptive boosting (AdaBoost) showed the highest performance )81.0( 2 R
among all the models considered, with the lowest MSE and RMSE, indicating
high accuracy and reliability. This model adapts well to different datasets, im-
proving accuracy by consistently reducing the weight of errors in the training data
Stochastic Gradient Descent performed moderately well )65.0( 2 R , show-
ing potential in situations where large datasets need to be optimized quickly.
However, the method can be sensitive to noise in the data and requires careful
tuning of the learning rate.
Building an ensemble of models. Based on the analysis of the performance
of various machine learning models, it is proposed to create an ensemble of mod-
Studying the relationship between tuberculosis and socioeconomic, medical, …
Системні дослідження та інформаційні технології, 2025, № 1 25
els using Stacking method, including the following estimators: linear regression,
neural network, adaptive boosting (AdaBoost), and random forest. These models
were chosen because of their high performance and complementarity in solving
forecasting problems.
Stacking technology has the following advantages:
1. Complementarity of models: Random Forest and AdaBoost have demon-
strated high accuracy in prediction, but they may tend to overlearn or bias in cer-
tain scenarios. Linear regression, while less accurate, offers stability and good
generalization. Neural networks work effectively with non-linear relationships in
data. Stacking allows you to combine their predictions, which can improve the
overall accuracy and reliability of forecasting.
2. Reduce variability and errors: Stacking uses a linear model to stack pre-
dictions from the underlying models. This not only preserves the strengths of each
model, but also effectively reduces the errors that can occur when using any sin-
gle model.
3. Improved generalization: Using the predictions of different models as in-
put to a “metamodel” in stacking allows the ensemble to generalize more effec-
tively on unseen data, which is critical for real-world forecasting tasks.
Analysis of an ensemble of models. As can be seen in Table 2, the Stacking
model shows the best performance among all the methods considered:
R²: The highest among all models, 0.83, indicating that the Stacking
model explains approximately 83% of the variation in response across the dataset,
outperforming its closest competitor (AdaBoost) by 0.02 points.
MSE and RMSE: Stacking has the lowest MSE (6299) and RMSE (794),
which indicates lower overall prediction errors compared to other models.
MAE and MAPE: Also the lowest among all the models considered
(MAE 578 and MAPE = 0.011), which emphasizes the high accuracy of the
forecasts created by the Stacking model.
Compared to individual models such as AdaBoost and Random Forest,
which also showed high accuracy rates, Stacking provides an additional im-
provement in accuracy and stability. This demonstrates the power of a combined
approach that considers different aspects of the data and the problem, while re-
ducing the likelihood of overfitting that can occur with individual models.
Thus, stacking turned out to be the most efficient method among the ana-
lyzed ones, showing the highest performance across all evaluation criteria. This
makes it an ideal candidate for use in real-world environments where high accu-
racy and reliability of forecasts are important.
Determining the importance of factors. The analysis of the importance of
the factors, performed using a stacked model, allows us to identify the key
variables that have the greatest impact on the incidence of tuberculosis.
Assessment of the importance of each factor in the model allows us to better
understand the dynamics of morbidity and optimize intervention strategies. Table
3 show the results of the importance of factors based on the stacked model.
As we can see from the data, the rate of bacterial shedding differs
significantly from the others, which is fully supported by the literature [14]. It
seems somewhat unexpected that the surgical treatment rate was among the
factors with a significant impact. According to the current global TB treatment
protocols, surgical treatment is indicated only in certain cases and is no longer
D.V. Nevinskyi, D.I. Martjanov, I.O. Semianiv, Y. I. Vyklyuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 26
used as often as it used to be. All other factors undoubtedly have an impact on the
incidence of tuberculosis, which is confirmed by medical research data [1].
T a b l e 3 . Importance of factors in the stacking model
Feature Importance
Bacterial excretion 0.405
Fluoroscopic examinations of the population (per 100 thousand) 0.059
Surgical treatment (easy number of operations) 0.026
MDR-TB treatment (failed treatment) 0.020
Expelled 0.016
Morbidity rate of doctors (per 10 thousand medical staff) 0.015
Resistant TB 0.015
MLS-TV (withdrawn) 0.014
HIV/TB (per 100 thousand) 0.011
People who returned from places of deprivation of liberty (% of the total) 0.009
Non-operational (% of total) 0.008
Alcohol abuse (% of total) 0.007
Pensioners (% of total) 0.007
R-treatment of MDR-TB (interrupted treatment) 0.006
Unsuccessful treatment 0.006
Vaccinations carried out 0.006
Number of hospitals 0.005
Relapse rate (unsuccessful treatment) 0.005
Without a permanent place of residence (% of total) 0.005
Relapse rate (cured) 0.004
Surgical treatment (total number of operations) 0.004
Students (% of total) 0.004
Recidivism rate (discharged) 0.004
Relapse rate (interrupted treatment) 0.004
Employees (% of total) 0.004
Medical workers (% of total) 0.003
Private employees (% of total) 0.003
Interrupted treatment 0.003
Drug use (% of total) 0.003
Employees (% of total) 0.002
Students (% of total) 0.002
As one can see from the results:
Bacterial shedding is the most important factor (0.405), indicating a high
level of influence on TB incidence. This emphasizes the need to focus on control-
ling the spread of bacterial shedding, as this indicator correlates with high inci-
dence rates.
Fluoroscopic examinations have the second most important indicator
(0.059). This confirms the importance of regular medical examinations, especially
Studying the relationship between tuberculosis and socioeconomic, medical, …
Системні дослідження та інформаційні технології, 2025, № 1 27
for risk groups, in detecting and preventing the disease, which allows for early
identification of new cases of tuberculosis.
Surgical treatment and outcomes for MDR-TB are also important vari-
ables. This reflects the importance of additional surgical interventions, in addition
to chemotherapy, and the importance of successful treatment in the context of
fighting resistant forms of TB and the need to improve and optimize treatment
strategies.
The incidence of physician-associated and resistant TB is also relatively
high, which may indicate the risk of non-compliance with infection control measures in
healthcare facilities and challenges associated with the spread of resistant forms of TB.
Less important, but still significant, variables include HIV/TB co-
morbidity, reentry from prison, and socioeconomic indicators such as alcohol
abuse. These variables indicate the complexity of the links between social condi-
tions and disease, which requires a comprehensive approach to community health.
Sensitivity analysis. Sensitivity analysis and SHAP analysis are important
tools for analyzing the spread of tuberculosis, which help to better understand the
mechanisms of the model and its response to changes in input data.
SHAP (SHapley Additive exPlanations) analysis offers a methodology for
interpreting complex machine learning models. It allows one to identify the con-
tribution of each factor to the model’s prediction, which is crucial for transpar-
ency and clarity in medical and policy decision-making. In the context of tubercu-
losis, SHAP analysis helps to identify which factors are most important for
disease incidence, which can help to develop targeted interventions.
Sensitivity analysis is used to assess the stability and reliability of predictive
models by determining how they respond to changes in input parameters. In the
context of this study, this analysis allows us to test how small changes in factors,
such as the number of medical examinations or demographic composition, can
affect the model’s conclusions. This is critical to ensure the accuracy and repro-
ducibility of the results, especially in settings where models may be used to sup-
port public health decisions.
Fig. 2 shows the SHAP analysis of the stacking model. The graph shows the
most important factors of the model. Each point on the graph corresponds to a
SHAP value for each factor. The SHAP value is a measure of how much each
Impact on model output
–26 –20 –10 0 10 20 26
R-tattoo of M..
Resistant TB
R-tattoo of M..
HIV/TB
(per 100...
Morbidity...
Expelled.
Bacterial
excretion
Fluoroscopic...
Non-working (%...
The surgical
treatment of the…
Fig. 2. SHAP analysis of the stacking model
D.V. Nevinskyi, D.I. Martjanov, I.O. Semianiv, Y. I. Vyklyuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 28
factor influences the model outcome. A higher SHAP value (greater deviation
from the center of the graph) means that the factor value has a greater impact on
the prediction for the selected class. Positive SHAP values (points to the right of
the center) are the values of features that influence prediction. The SHAP value
shows how much the feature value affects the predicted value from the average
prediction. The colors represent the value of each factor. Red represents a higher
texture value and blue represents a lower value. The color range is determined
based on all the values in the dataset for the object. As you can see from Fig. 2,
the results of the SHAP analysis fully confirm the importance of the factors.
Sensitivity analysis provides more important information. Figs. 3–5 show the
dependence of changes in tuberculosis incidence on changes in the most impor-
tant factors.
All graphs are individual sensitivity plots for each individual row in the da-
taset. The yellow graph shows the average value of all records.
Δ
P
(A
ct
iv
e
pu
lm
on
ar
y
T
B
c
as
es
a
m
on
g
th
e
to
ta
l
po
pu
la
ti
on
o
f
U
kr
ai
ne
>
=
5
6.
15
)
0,
0
0
1
.0
10 20 30 40 50 60 70
Bacterial excretion
Fig. 3. The dependence of changes in tuberculosis incidence on bacterial shedding per
100 thousand people
Δ
P
(A
ct
iv
e
pu
lm
on
ar
y
T
B
c
as
es
a
m
on
g
th
e
to
ta
l
po
pu
la
ti
on
o
f
U
kr
ai
ne
>
=
5
6.
15
)
0,
0
0.
4
0.
7
10 20 30 40 50 60 70
Fluoroscopic examinations of the population (per 100 thousand)
Fig. 4. The dependence of changes in tuberculosis incidence on bacterial shedding per
100 thousand people
Studying the relationship between tuberculosis and socioeconomic, medical, …
Системні дослідження та інформаційні технології, 2025, № 1 29
The logarithmic growth of the incidence shows a rapid increase against the
background of an increase in bacterial shedding, but then a stable saturation level
is determined. From a medical point of view, this is explained by the fact that ac-
tive bacterial shedders quickly infect their contacts, and then the process of infec-
tion spread is suspended until new active patients start infecting others. This
points to the importance of the efforts of health care systems in developed coun-
tries, which are primarily aimed at identifying and starting treatment of patients
with bacterial excretion as soon as possible. Such patients pose a danger to others,
often without realizing it. One undetected patient can infect 10 to 15 people who
are in close daily contact with him or her. Thus, the result fully confirms the
WHO epidemiological studies.
The linear increase in morbidity against the background of the fluoroscopic
examination rate demonstrates a gradual, steady increase in the number of active
TB patients. The importance of fluoroscopic examinations is confirmed by the
latest WHO recommendations, especially the statement that fluoroscopic exami-
nations of the population should focus on high-quality screening of risk groups
rather than on random screening of everyone. Since Ukraine still has a quite high
incidence of tuberculosis, and the number of internally displaced persons reached
4.9 million during the war period, all these people can be considered a risk group.
The importance of regular fluoroscopic preventive examinations has been con-
firmed by numerous studies [15], and the fact that the sensitivity analysis ranked
this indicator second in terms of its impact on morbidity is logical and under-
standable for the medical community.
An analysis of the sensitivity of the active TB incidence rate to the pulmo-
nary tuberculosis surgical treatment rate shows a logarithmic increase at the be-
ginning and a rapid transition to a stable level. This is due to the achievement of
drug-free treatment of tuberculosis over a certain period. The number of surgical
treatments for pulmonary tuberculosis is decreasing every year, but there are no
large studies on the correlation of this indicator with the incidence of pulmonary
Δ
P
(A
ct
iv
e
pu
lm
on
ar
y
T
B
c
as
es
a
m
on
g
th
e
to
ta
l
po
pu
la
ti
on
o
f
U
kr
ai
ne
>
=
5
6.
15
)
0,
0
0
1.
0
0 20 40 60 80 100 120 140 160 80 200 220 240 260 280 300 320 340 360
Surgical treatment (easy number of operations)
Fig. 5. Analysis of the sensitivity of tuberculosis incidence to the rate of surgical treatment
of pulmonary tuberculosis (number of surgeries)
D.V. Nevinskyi, D.I. Martjanov, I.O. Semianiv, Y. I. Vyklyuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 1 30
tuberculosis [16]. The sensitivity analysis demonstrated exactly these results the
surgical interventions rate, as we performed statistical processing of the data from
2007, and for most of this sixteen-year period, surgical treatment was performed
along with chemotherapy for tuberculosis.
CONCLUSIONS
The use of artificial intelligence to analyze socioeconomic, medical, and demo-
graphic data has helped to identify the main factors contributing to the incidence
of tuberculosis in Ukraine. In particular, the analysis confirmed the significant
impact of the number of specialized hospitals, fluoroscopic examinations of the
population, and the frequency of bacterial excretion on the incidence rate.
The development and validation of machine learning models, including lin-
ear regression, random forests, and adaptive boosting, allowed for accurate fore-
casting of tuberculosis incidence. The use of 5-fold cross-validation increased the
reliability of the predictions, ensuring stability and accuracy across different de-
mographic groups.
The results of the SHAP analysis, which offers a methodology for interpret-
ing complex machine learning models, show the most important factors that in-
fluence the incidence of tuberculosis in Ukraine, with the greatest impact shown
in bacterial excretion rates and fluoroscopic examinations of the population.
Interpretation of complex models through SHAP analysis and sensitivity
analysis provided a deep understanding of the impact of individual factors, allow-
ing for the formulation of targeted strategies for TB control and prevention. This
creates the basis for informed decision-making in the field of public health and
optimization of health care resources.
REFERENCES
1. S.S. Chiang et al., “Clinical manifestations and epidemiology of adolescent tubercu-
losis in Ukraine,” ERJ Open Res, 6(3):00308-2020, 2020. doi: https:/
/doi.org/10.1183/23120541.00308-2020
2. I. Margineanu et al., “TB therapeutic drug monitoring - analysis of opportunities in
Romania and Ukraine,” Int. J. Tuberc. Lung Dis., 27(11), pp. 816–821, 2023. doi:
10.5588/ijtld.22.0667
3. O.S. Shevchenko, L.D. Todoriko, I.A. Ovcharenko, O.O. Pogorelova, and I.O. Semi-
aniv, “A mathematical model for predicting the outcome of treatment of multidrug-
resistant tuberculosis,” Wiad. Lek., 74(7), pp. 1649–1654, 2021. doi: 10.36740
WLek202107117
4. D. Butov et al., “National survey on the impact of the war in Ukraine on TB diagnos-
tics and treatment services in 2022,” Int. J. Tuberc. Lung. Dis., 27(1), pp. 86–88,
2023. doi: 10.5588/ijtld.22.0563
5. K. Lönnroth, E. Jaramillo, B.G. Williams, C. Dye, and M. Raviglione, “Drivers of
tuberculosis epidemics: the role of risk factors and social determinants,” Soc. Sci.
Med., 68(12), pp. 2240–2246, 2009. doi: 10.1016/j.socscimed.2009.03.041
6. Rifat Atun, Diana E.C. Weil, Mao Tan Eang, and David Mwakyusa, “Health-system
strengthening and tuberculosis control,” The Lancet, 375(9732), pp. 2169–2178,
2010. doi: 10.1016/S0140-6736(10)60493-X
7. M.A. Mujtaba et al., “Demographic and Clinical Determinants of Tuberculosis and
TB Recurrence: A Double-Edged Retrospective Study from Pakistan,” J. Trop.
Med., vol. 2022, article ID 4408306, 2022. doi: 10.1155/2022/4408306
8. E.J. Topol, “High-performance medicine: the convergence of human and artificial
intelligence,” Nat. Med., vol. 25, pp. 44–56, 2019. doi: https://doi.org/10.1038/
s41591-018-0300-7
Studying the relationship between tuberculosis and socioeconomic, medical, …
Системні дослідження та інформаційні технології, 2025, № 1 31
9. P. Farmer, “The major infectious diseases in the world--to treat or not to treat?” N.
Engl. J. Med., 345(3), pp. 208–210, 2001. doi: 10.1056/NEJM200107193450310
10. N. Tang et al., “Machine Learning Prediction Model of Tuberculosis Incidence
Based on Meteorological Factors and Air Pollutants,” Int. J. Environ. Res. Public
Health, 20(5), 3910, 2023. doi: 10.3390/ijerph20053910
11. E.J. Hwang et al., “Development and Validation of a Deep Learning-Based Auto-
mated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
[published correction appears in JAMA Netw Open. 2019 Apr 5;2(4):e193260],”
JAMA Netw Open, 2(3):e191095, 2019. doi: 10.1001/jamanetworkopen.2019.1095
12. S. Tuli, S. Tuli, R. Tuli, and S.S. Gill, “Predicting the growth and trend of COVID-
19 pandemic using machine learning and cloud computing,” Internet of Things,
11:100222, 2020. doi: 10.1016/j.iot.2020.100222
13. A. Rajkomar, J. Dean, and I. Kohane, “Machine learning in medicine,” N. Engl. J.
Med., 380(14), pp. 1347–1358, 2019. doi: 10.1056/NEJMra1814259
14. K.E. Wiens et al., “Global variation in bacterial strains that cause tuberculosis dis-
ease: a systematic review and meta-analysis,” BMC Med., 16(1), article no. 196,
2018. doi: 10.1186/s12916-018-1180-x
15. V. Smelov et al., “Rationale and Purpose: The FLUTE Study to Evaluate Fluorogra-
phy Mass Screening for Tuberculosis and Other Diseases, as Conducted in Eastern
Europe and Central Asia Countries,” Int. J. Environ. Res. Public. Health, 19(14),
8706, 2022. doi: 10.3390/ijerph19148706
16. R. Zaleskis, A.W. Mariani, F. Inzirillo, and I. Vasilyeva, “The Role of Surgery in
Tuberculosis Management: Indications and Contraindications,” in G.B. Migliori,
M.C. Raviglione (eds) Essential Tuberculosis. Springer, Cham, 2021. doi:
https://doi.org/10.1007/978-3-030-66703-0_15
Received 10.05.2024
INFORMATION ON THE ARTICLE
Denys V. Nevinskyi, ORCID: 0000-0002-0962-072X, Lviv Polytechnic National Uni-
versity, Ukraine, e-mail: nevinskiy90@gmail.com
Dmytro I. Martjanov, ORCID: 0009-0003-3919-4412, Lviv Polytechnic National Uni-
versity, Ukraine, e-mail: d.martjnoff@gmail.com
Ihor O. Semianiv, ORCID: 0000-0003-0340-0766, Bukovinian State Medical University,
Ukraine, e-mail: igor_semianiv@bsmu.edu.ua
Yaroslav I. Vyklyuk, ORCID: 0000-0003-4766-4659, Lviv Polytechnic National Univer-
sity, Ukraine, e-mail: vyklyuk@ukr.net
ВИВЧЕННЯ ЗВ’ЯЗКУ МІЖ ТУБЕРКУЛЬОЗОМ ТА СОЦІАЛЬНО-
ЕКОНОМІЧНИМИ, МЕДИЧНИМИ, ДЕМОГРАФІЧНИМИ ЧИННИКАМИ
В УКРАЇНІ / Д.В. Невінський, Д.І. Мартьянов, І.О. Сем’янів, Я.І. Виклюк
Анотація. Натепер Україна переживає новий, черговий наступ туберкульозу.
Це дослідження аналізує вплив різних соціально-економічних та медичних
факторів, включаючи: кількість спеціалізованих лікарень, флюорографічні
огляди населення, кількість медичних працівників, рівень зловживання алко-
голем та наркотиками та інші на поширеність туберкульозу серед різних демо-
графічних груп населення в Україні. Використання методів штучного інтелек-
ту дало змогу визначити ключові чинники, що сприяють зростанню або
зниженню захворюваності на туберкульоз. Результати SHAP (SHapley Additive
exPlanations) аналізу, який пропонує методологію для інтерпретації складних
моделей машинного навчання, показує найважливіші фактори, які впливають
на захворюваність туберкульозом в Україні. Більш важливу інформацію несе
аналіз чутливості, який підтвердив отримані показники в SHAP аналізі.
Ключові слова: штучний інтелект, туберкульоз, захворюваність, соціально-
демографічні чинники, медичні чинники, демографічні чинники.
|
| id | journaliasakpiua-article-303481 |
| institution | System research and information technologies |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2025-09-17T09:26:02Z |
| publishDate | 2025 |
| publisher | The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" |
| record_format | ojs |
| resource_txt_mv | journaliasakpiua/36/b42820f6b278760b8c64ed6d2f6ae536.pdf |
| spelling | journaliasakpiua-article-3034812025-05-20T17:56:07Z Studying the relationship between tuberculosis and socioeconomic, medical, and demographic factors in Ukraine Вивчення зв’язку між туберкульозом та соціально-економічними, медичними, демографічними чинниками в Україні Nevinskyi, Denys Martjanov, Dmytro Semianiv, Ihor Vyklyuk, Yaroslav штучний інтелект туберкульоз захворюваність соціально-демографічні чинники медичні чинники демографічні чинники artificial intelligence tuberculosis incidence socio-demographic factors medical factors demographic factors Ukraine is currently experiencing a new, ongoing tuberculosis offensive. Our study analyzes the impact of various socioeconomic and medical factors, including the number of specialized hospitals, fluoroscopic examinations of the population, the number of healthcare workers, the level of alcohol and drug abuse, and others, on the prevalence of tuberculosis among different demographic groups in Ukraine. Artificial intelligence methods made it possible to identify key factors contributing to the growth or decline in tuberculosis incidence. The results of the SHAP (SHapley Additive exPlanations) analysis, which offers a methodology for interpreting complex machine learning models, shows the most important factors that influence the incidence of tuberculosis in Ukraine. The sensitivity analysis provided more important and detailed information, which confirmed the results of the SHAP analysis. Натепер Україна переживає новий, черговий наступ туберкульозу. Це дослідження аналізує вплив різних соціально-економічних та медичних факторів, включаючи: кількість спеціалізованих лікарень, флюорографічні огляди населення, кількість медичних працівників, рівень зловживання алкоголем та наркотиками та інші на поширеність туберкульозу серед різних демографічних груп населення в Україні. Використання методів штучного інтелекту дало змогу визначити ключові чинники, що сприяють зростанню або зниженню захворюваності на туберкульоз. Результати SHAP (SHapley Additive exPlanations) аналізу, який пропонує методологію для інтерпретації складних моделей машинного навчання, показує найважливіші фактори, які впливають на захворюваність туберкульозом в Україні. Більш важливу інформацію несе аналіз чутливості, який підтвердив отримані показники в SHAP аналізі. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025-03-28 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/303481 10.20535/SRIT.2308-8893.2025.1.02 System research and information technologies; No. 1 (2025); 19-31 Системные исследования и информационные технологии; № 1 (2025); 19-31 Системні дослідження та інформаційні технології; № 1 (2025); 19-31 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/303481/318900 |
| spellingShingle | штучний інтелект туберкульоз захворюваність соціально-демографічні чинники медичні чинники демографічні чинники Nevinskyi, Denys Martjanov, Dmytro Semianiv, Ihor Vyklyuk, Yaroslav Вивчення зв’язку між туберкульозом та соціально-економічними, медичними, демографічними чинниками в Україні |
| title | Вивчення зв’язку між туберкульозом та соціально-економічними, медичними, демографічними чинниками в Україні |
| title_alt | Studying the relationship between tuberculosis and socioeconomic, medical, and demographic factors in Ukraine |
| title_full | Вивчення зв’язку між туберкульозом та соціально-економічними, медичними, демографічними чинниками в Україні |
| title_fullStr | Вивчення зв’язку між туберкульозом та соціально-економічними, медичними, демографічними чинниками в Україні |
| title_full_unstemmed | Вивчення зв’язку між туберкульозом та соціально-економічними, медичними, демографічними чинниками в Україні |
| title_short | Вивчення зв’язку між туберкульозом та соціально-економічними, медичними, демографічними чинниками в Україні |
| title_sort | вивчення зв’язку між туберкульозом та соціально-економічними, медичними, демографічними чинниками в україні |
| topic | штучний інтелект туберкульоз захворюваність соціально-демографічні чинники медичні чинники демографічні чинники |
| topic_facet | штучний інтелект туберкульоз захворюваність соціально-демографічні чинники медичні чинники демографічні чинники artificial intelligence tuberculosis incidence socio-demographic factors medical factors demographic factors |
| url | https://journal.iasa.kpi.ua/article/view/303481 |
| work_keys_str_mv | AT nevinskyidenys studyingtherelationshipbetweentuberculosisandsocioeconomicmedicalanddemographicfactorsinukraine AT martjanovdmytro studyingtherelationshipbetweentuberculosisandsocioeconomicmedicalanddemographicfactorsinukraine AT semianivihor studyingtherelationshipbetweentuberculosisandsocioeconomicmedicalanddemographicfactorsinukraine AT vyklyukyaroslav studyingtherelationshipbetweentuberculosisandsocioeconomicmedicalanddemographicfactorsinukraine AT nevinskyidenys vivčennâzvâzkumížtuberkulʹozomtasocíalʹnoekonomíčnimimedičnimidemografíčnimičinnikamivukraíní AT martjanovdmytro vivčennâzvâzkumížtuberkulʹozomtasocíalʹnoekonomíčnimimedičnimidemografíčnimičinnikamivukraíní AT semianivihor vivčennâzvâzkumížtuberkulʹozomtasocíalʹnoekonomíčnimimedičnimidemografíčnimičinnikamivukraíní AT vyklyukyaroslav vivčennâzvâzkumížtuberkulʹozomtasocíalʹnoekonomíčnimimedičnimidemografíčnimičinnikamivukraíní |