Інформаційна система для оцінювання інформативності ознак епідемічного процесу
The primary objective of this study is to assess the informativeness of various parameters influencing epidemic processes utilizing the Shannon and Kullback–Leibler methods. These methods were selected based on their foundation in the principles of information theory and their extensive application...
Gespeichert in:
| Datum: | 2023 |
|---|---|
| Hauptverfasser: | , , , , , , , |
| Format: | Artikel |
| Sprache: | Englisch |
| Veröffentlicht: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2023
|
| Schlagworte: | |
| Online Zugang: | https://journal.iasa.kpi.ua/article/view/297411 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | System research and information technologies |
| Завантажити файл: | |
Institution
System research and information technologies| _version_ | 1866302948466753536 |
|---|---|
| author | Bazilevych, Kseniia Kyrylenko, Olena Parfenyuk, Yurii Yakovlev, Sergiy Krivtsov, Serhii Meniailov, Ievgen Kuznietcova, Victoriya Chumachenko, Dmytro |
| author_facet | Bazilevych, Kseniia Kyrylenko, Olena Parfenyuk, Yurii Yakovlev, Sergiy Krivtsov, Serhii Meniailov, Ievgen Kuznietcova, Victoriya Chumachenko, Dmytro |
| author_sort | Bazilevych, Kseniia |
| baseUrl_str | http://journal.iasa.kpi.ua/oai |
| collection | OJS |
| datestamp_date | 2024-02-01T21:03:07Z |
| description | The primary objective of this study is to assess the informativeness of various parameters influencing epidemic processes utilizing the Shannon and Kullback–Leibler methods. These methods were selected based on their foundation in the principles of information theory and their extensive application in machine learning, statistics, and other relevant domains. A comparative analysis was performed between the results acquired from both methods, and an information system was designed to facilitate the uploading of data samples and the calculation of factor informativeness impacting the epidemic processes. The findings revealed that certain features, such as “Chronic lung disease,” “Chronic kidney disease,” and “Weakened immunity,” did not carry significant information for further analysis and hindered the forecasting process, as per the data set examined. The developed information system efficiently supports the assessment of feature informativeness, thereby aiding in the comprehensive analysis of epidemic processes and enabling the visualization of the results. This study contributes to the current body of knowledge by providing specific examples of applying the described algorithmic models, comparing various methods and their outcomes, and developing a supportive tool for analyzing epidemic processes. |
| doi_str_mv | 10.20535/SRIT.2308-8893.2023.4.08 |
| first_indexed | 2025-07-17T10:28:26Z |
| format | Article |
| fulltext |
K. Bazilevych, O. Kyrylenko, Y. Parfeniuk, S. Yakovlev, S. Krivtsov, I. Meniailov, V. Kuznietcova,
D. Chumachenko, 2023
100 ISSN 1681–6048 System Research & Information Technologies, 2023, № 4
UDC 004.942:614.4(460)
DOI: 10.20535/SRIT.2308-8893.2023.4.08
INFORMATION SYSTEM FOR ASSESSING THE
INFORMATIVENESS OF AN EPIDEMIC PROCESS FEATURES
K. BAZILEVYCH, O. KYRYLENKO, Y. PARFENIUK, S. YAKOVLEV,
S. KRIVTSOV, I. MENIAILOV, V. KUZNIETCOVA, D. CHUMACHENKO
Abstract. The primary objective of this study is to assess the informativeness of
various parameters influencing epidemic processes utilizing the Shannon and Kull-
back–Leibler methods. These methods were selected based on their foundation in
the principles of information theory and their extensive application in machine
learning, statistics, and other relevant domains. A comparative analysis was per-
formed between the results acquired from both methods, and an information system
was designed to facilitate the uploading of data samples and the calculation of factor
informativeness impacting the epidemic processes. The findings revealed that cer-
tain features, such as “Chronic lung disease,” “Chronic kidney disease,” and “Weak-
ened immunity,” did not carry significant information for further analysis and hin-
dered the forecasting process, as per the data set examined. The developed
information system efficiently supports the assessment of feature informativeness,
thereby aiding in the comprehensive analysis of epidemic processes and enabling the
visualization of the results. This study contributes to the current body of knowledge
by providing specific examples of applying the described algorithmic models, com-
paring various methods and their outcomes, and developing a supportive tool for
analyzing epidemic processes.
Keywords: information system, epidemic process, informativeness of features,
Shannon method, Kullback–Leibler method.
INTRODUCTION
Predicting morbidity is an essential task in health care and public health. The use
of machine learning in the analysis of epidemic processes is relevant in contem-
porary conditions, as it allows for rapid and efficient processing of large volumes
of data and making accurate forecasts [1]. This helps reduce the consequences of
epidemics and ensures a more effective fight against diseases. Using machine
learning models helps predict morbidity with high accuracy [2].
In the modern world, especially considering the current situation related to
the COVID-19 pandemic, the theme of analyzing data on epidemic processes re-
mains extremely relevant and critically important. Data analysis is an essential
tool that plays a key role and helps understand the spread of disease [3], identify
trends [4], identify risk groups of the population [5], evaluate the effectiveness of
control measures [6], imagine the scale of the problem [7], and predict the future
development of epidemics [8]. It helps scientists, doctors, and relevant authorities
make informed decisions and develop strategies for effective epidemic control [9].
It is also difficult to overestimate the importance of timely medical diagnos-
tics in managing epidemic processes. Rapid and accurate disease diagnosis is
a key factor for successful control and management of epidemics [10]. Ensuring
timely diagnostics allows diagnosing and isolating sick people, starting treatment,
Information system for assessing the informativeness of an epidemic process features
Системні дослідження та інформаційні технології, 2023, № 4 101
taking necessary preventive measures and vaccination, and taking strategic steps
to reduce the spread of the disease.
Laboratory tests are one of the main tools for medical diagnostics of epi-
demic diseases [11]. They allow for detecting the presence of a pathogenic agent,
determining its characteristic properties, and establishing a diagnosis. For exam-
ple, in the case of the COVID-19 pandemic, testing for the SARS-CoV-2 virus is
crucial for detecting infected individuals, even when they do not show symptoms.
This helps to take appropriate control measures and preventive strategies.
Many modern healthcare facilities have information systems for storing
various medical data about patients' health, used by doctors for diagnosing patho-
logical processes [12]. However, when analyzing medical data, identifying pat-
terns, and extracting it, one faces the problem of dimensionality. The dimension-
ality of stored data, determined by the number of different features describing the
patient's health status, is vast and sometimes reaches several tens and hundreds of
indicators [13].
Evaluating informativeness is essential for analyzing epidemic process data,
as it allows for determining the significance of various factors and relationships
associated with diseases [14]. This helps to identify key factors affecting the
spread of epidemics and make effective decisions regarding their prevention and
treatment. Informativeness evaluation also helps detect complex relationships be-
tween different factors and determine which has the most significant impact on
epidemic processes [15]. This allows for making more accurate predictions and
effective decisions regarding epidemic response.
Therefore, the problem of reducing the dimensionality of the feature space
and identifying the most informative features is a very relevant task of epidemic
process data analysis.
The aim of the paper is to develop the information system for evaluation of
the factors’ informativeness for healthcare data.
Research is part of a complex intelligent information system for epidemiol-
ogical diagnostics, the concept of which is discussed in [16, 17].
2. MATERIALS AND METHODS
2.1. Informativeness of features
The informativeness of a feature is an indicator of its significance or usefulness
for solving a specific task or problem. This is an essential concept in many areas,
including machine learning, statistics, signal processing, and many others [18].
The informativeness of features is assessed depending on their ability to classify
or predict the target variable. More informative features have a greater impact on
the model and provide more significant information for the separation or predic-
tion of classes.
Diagnostic features are specific symptoms, indicators, or characteristics used
to diagnose a disease, condition, or problem [19]. In medicine, diagnostic features
help doctors determine a disease or condition based on examination, patient sur-
veys, laboratory tests, examinations, images, and other studies. Diagnostic fea-
tures may include such indicators:
Physical symptoms: for example, pain, pulsation, swelling, bleeding, skin
color change, etc.
K. Bazilevych, O. Kyrylenko, Y. Parfeniuk, S. Yakovlev, S. Krivtsov, I. Meniailov, V. Kuznietcova, D. Chumachenko
ISSN 1681–6048 System Research & Information Technologies, 2023, № 4 102
Behavioral symptoms: for example, nervousness, depression, irritation,
inability to concentrate, sleep change, appetite change, etc.
Laboratory results: such as cell count, hormone level, substance concen-
tration in the blood or urine, or results of other analyses.
Imaging: results of X-rays, CT scans, MRI, or other techniques that may
show changes in the structure or function of organs.
Anamnesis: information obtained from the patient about their medical his-
tory, symptoms, duration, and nature of the disease.
Genetic research: determining the presence or absence of certain genetic
mutations or variants.
2.2. Problem formulation of feature space reduction
The application of modern information technologies in medicine contributes to
accumulating large volumes of medical data, which are stored and processed us-
ing medical information systems (MIS). These data contain medical knowledge
that can be extracted and used for decision-making, such as diagnosing pathologi-
cal processes [20]. The dimensionality of the stored data, defined by the number
of different features describing the patient's health status, is vast and sometimes
reaches several tens and hundreds of indicators. Therefore, the problem of reduc-
ing the dimensionality of the feature space and highlighting the most informative
features is very relevant for MIS development.
Let be a set of objects, and },,{ ,21 nxxxX be the finite set of quanti-
tative features of these objects. For any object , its feature descrip-
tion )}(,),(),({ 21 nxxx is known as a n -dimensional vector, where this
vector's ( аi )-th coordinate equals the ( аi )-th feature's value. The set of fea-
ture descriptions of objects for a given sample of objects A is given as a ma-
trix of size nA || , a table “object – feature”. Let )(ZI be the measure of infor-
mativeness of the subset of features XZ , defined on A . It is necessary to
select some subset XZ * from all different subsets of the set X, such that
)(max)( * ZIZI
XZ
.
The task of features selection is computationally complex; as for nX || , a
permutation of all different subsets XZ requires )2( nO time.
2.3. Kullback–Leibler Method
The Kullback–Leibler method is a statistical approach for measuring the diver-
gence between two probability distributions. This method is popular in many
fields, including statistics, machine learning, and information theory [21]. Using
the Kullback–Leibler method, a measure is calculated that gauges the divergence
between two distributions to assess the informativeness of a feature.
Typically, two distributions are input into the Kullback–Leibler method to
evaluate the informativeness of features [22]: the distribution of data with the fea-
ture value considered and the distribution of data without considering the feature
value. The method estimates the informativeness of the studied feature as a value
ranging from 0 to 2. In this case, it is considered that the closer the informative-
ness measure )(xI is to 2, the higher the informativeness of x, and conversely,
the closer )(xI is to 0, the lower the informativeness of x . The output of the
Information system for assessing the informativeness of an epidemic process features
Системні дослідження та інформаційні технології, 2023, № 4 103
Kullback–Leibler method is a numerical estimate indicating the informativeness
of the feature.
Algorithmic Model of the Kullback–Leibler Method
Step 1. Define the target input set (in this case, it is “Morbidity”).
Step 2. Calculate the probability of the event for each value in the target set:
NXnXQ /)()( , where n is the number of cases X , and N is the total number
of cases.
Step 3. Calculate the probability of the event for each value in the feature:
NynyP /)()( , where n is the number of cases y , and N is the total number of
cases.
Step 4. Calculate the Kullback–Leibler divergence between the two sets P
and Q. The Kullback–Leibler divergence, sometimes called relative entropy, is a
measure of the difference between two probability distributions:
i
iQiPiPQPD ))(/)((log)(),( 2 ,
where P(i) is the joint probability of the event X-target set and y-feature, and Q(i)
is the probability of the event of the target set.
Repeat steps 3-4 for all values in the feature and calculate the overall Kull-
back–Leibler divergence.
Step 5. Calculate the overall informativeness of the feature.
Step 6. Evaluate the obtained results based on the magnitude of the informa-
tiveness of the feature. The higher
the evaluation value, the more in-
formative the feature.
Step 7. Select the features
with the highest values as the most
informative.
The algorithm of the model is
shown in Fig. 1.
2.4. Shannon Method
The Shannon method for calculat-
ing feature informativeness in a
table is based on the concept of
entropy in information theory [23].
Entropy is a measure of uncertainty
or randomness in a data set. En-
tropy reflects the average level of
'information,' 'surprise,' or 'uncer-
tainty' inherent in the possible out-
comes of a random variable [24].
The Shannon method
provides an estimate of the
informativeness of the studied
feature in the form of a normalized
variable, which takes values from 0
to 1 [25]. In this case, the informa-
tiveness of feature x is said to be
higher as ( )I x approaches 1 and
Definition of the
target input set
Start
for each value
Feature umber n=1
Calculation the probability
of an event for a value
in a target
Calculation of the joint
probability of an event for
a value in a future with
a target set
Calculation of Kulback-
Leibler divergence
Calculation of the general
informativeness
for a feature
Calculate the probability of
an event for each value in
the target set
n<N (number
of characters)
Assessment of the
informativeness
of the signs n=n+1
Fig. 1. The algorithm of the Kullback–Leibler method
K. Bazilevych, O. Kyrylenko, Y. Parfeniuk, S. Yakovlev, S. Krivtsov, I. Meniailov, V. Kuznietcova, D. Chumachenko
ISSN 1681–6048 System Research & Information Technologies, 2023, № 4 104
lower as ( )I x approaches 0.
Algorithmic model of the Shannon method
Step 1. Define the target input set (in our case, it is “Morbidity”).
Step 2. Calculate the total entropy for the target set using the Shannon formula
N
i
ii ppSH
0
2log)( ,
where ip is the probability of the occurrence of the i-th class in the data set, H is
the entropy, and S is the set of instances.
Step 3. Divide the data by each unique feature value and calculate the frequency
of each value in the target set.
Step 4. Calculate the entropy for each feature value.
Step 5. Calculate the weighted entropy for each feature value, multiplying
the entropy value by its frequency. Weighted entropy by the Shannon method [26]
is used to measure the informational weight of a random event:
)()( SHSPHweighed ,
where ( ) /P S m N : m is the fre-
quency of the occurrence of the
value in the feature; N is the total
number; ( )P S is the probability of
the occurrence of the S-th class
relative to the target variable.
Step 6. Calculate the informa-
tiveness of features. The informa-
tiveness of a feature is calculated
as the difference between the en-
tropy of the output set and the sum
of the entropy of the subsets
formed by the given feature, with
weights equal to the fraction of the
subset in the output set:
N
i weighedHSHSI 0)()( ,
where )(SI is the informativeness
of the feature of the subset S .
Repeat steps 2-6 for all features
and calculate the informativeness for
each feature.
Step 7. Evaluate the obtained
results based on the informative-
ness of the feature. The higher the
evaluation value, the more infor-
mative the feature.
Step 8. Select features with
the highest values as the most in-
formative.
Figure 2 shows the flowchart
of the algorithmic model.
Definition of the
target input set
Start
Calculation
the total entropy for the
target input set
Feature
n
Separation of data by each
unique value of the char-
acteristic and frequency
Calculation of entropy
for a feature
Calculation of weighted
entropy for a feature
Calculating the informa-
tiveness for a feature
Calculate the probability of
an event for each value in
the target set
Separation of data by each
unique value
and frequency
n<N (number
of characters)
Assessment of the
informativeness
of the signs n=n+1
Fig. 2. The algorithm of the Shannon method
Information system for assessing the informativeness of an epidemic process features
Системні дослідження та інформаційні технології, 2023, № 4 105
3. RESULTS
3.1. Program realization
Various algorithms and methods were employed to develop the information sys-
tem, and Python is an ideal choice for such tasks. Its library, sklearn, includes
many machine learning algorithms, including naive Bayes, logistic regression,
and gradient boosting [27].
For data visualization, tkinter, matplotlib.pyplot, and seaborn were used,
which are powerful visualization tools in Python. These libraries provide many
possibilities for creating plots, diagrams, interactive visualizations, and more.
Based on data from healthcare facilities, the developed software product
predicts the probability of a patient getting sick. The product is a decision-support
system for general practitioners, which is especially important during pandemics
and other disasters that limit the number of doctors.
Figure 3 shows the interface of the software product.
Further, by pressing the "Calculate" button, the calculation of informative-
ness estimation methods is carried out, precisely the Shannon method and the
Kullback–Leibler method.
3.2. Data analysis
The experimental study used data on patients suffering from COVID-19 [28].
Figure 4 depicts the histogram of the input data.
Next, we checked the dataset for empty data that would worsen the predic-
tion. Figure 5 shows all data output in terms of data type, presence of zero, and
the number of records of 950217 patients.
Fig. 3. Decision support system interface
K. Bazilevych, O. Kyrylenko, Y. Parfeniuk, S. Yakovlev, S. Krivtsov, I. Meniailov, V. Kuznietcova, D. Chumachenko
ISSN 1681–6048 System Research & Information Technologies, 2023, № 4 106
Fig. 4. Patient Data Histogram
Fig. 5. Checking for the presence of empty values
Information system for assessing the informativeness of an epidemic process features
Системні дослідження та інформаційні технології, 2023, № 4 107
Figure 6 shows the output of the first 5 rows of the input data table.
3.3. Feature selection
We should note that the Shannon method estimates the informativeness of the
investigated recognition in a normalized quantity, which takes values from 0 to 1.
Comparison of results of both methods allows the following conclusions: the con-
sidered methods do not contradict each other and give similar sets of the most
informative features on the same training samples, and the results of the Shannon
and Kullback methods mostly coincide. Table shows the results of using methods
for assessing the informativeness of features.
Results of calculating the informativeness of features
Name Results (Shannon) Results (Kullback–Leibler)
Treatment in medical institutions 0.92 1.55
Medical insurance 0.44 1.99
Gender 0.99 1.73
Patient type 0.55 1.97
Pneumonia 0.43 0.94
Age 0.86 2.00
Diabetes 0.46 0.99
Chronic lung disease 0.08 0.00
Asthma 0.19 0.46
Weakness of the immune system 0.09 0.025
High blood pressure 0.57 1.13
Another disease 0.16 0.34
Cardiovascular disease 0.12 0.18
Obesity 0.60 1.17
Chronic kidney disease 0.10 0.08
Smoking 0.40 0.89
Covid-19 disease 0.93 1.56
Fig. 6. View of the first 5 rows of input medical data
K. Bazilevych, O. Kyrylenko, Y. Parfeniuk, S. Yakovlev, S. Krivtsov, I. Meniailov, V. Kuznietcova, D. Chumachenko
ISSN 1681–6048 System Research & Information Technologies, 2023, № 4 108
The obtained results were visualized. Figures 7 and 8 show which features
have an impact and informativeness and which can be excluded from the set.
4. DISCUSSION
The evaluation of informativeness is pivotal in understanding the dynamics of
epidemic processes and devising effective disease control strategies. This study
aimed to implement and evaluate methods to assess the informativeness of fea-
tures that influence epidemic processes. The methods examined in this study,
namely the Shannon method and the Kullback–Leibler method, are grounded in
the principles of information theory and have distinct advantages, differences, and
commonalities. Both methods utilize the concept of event probability and employ
a logarithmic scale to measure informativeness, which is particularly helpful
when dealing with extremely small or large probability values. These methods are
Importances0.0 0.2 0.4 0.6 0.4 0.6
Fig. 7. Diagram of informativeness assessment by the Shannon method
Importances
0.00 0.25 0.50 1.00 1.50 2.00 0.75 1.25 1.75
Fig. 8. Diagram of informativeness assessment by the Kullback–Leibler method
Information system for assessing the informativeness of an epidemic process features
Системні дослідження та інформаційні технології, 2023, № 4 109
also extensively applied in machine learning for feature selection, model man-
agement, and assessing feature informativeness.
The study found that the Shannon and Kullback–Leibler methods are valu-
able tools for quantifying the information contained in a random process and thus
can be applied across various fields such as information theory, statistics, and ma-
chine learning. The comparison of different methods and the results they yield is
crucial for understanding their applicability and limitations. It was observed that
certain features, such as "Chronic lung disease," "Chronic kidney disease," and
"Weakness of the immune system," did not carry significant information for fur-
ther analysis and prediction, indicating that not all available features are necessar-
ily informative or relevant for epidemic process analysis.
Developing an information system that facilitates the assessment of feature
informativeness is a significant contribution of this study. This system not only
supports data sample uploading but also enables the calculation of the informa-
tiveness of factors that influence the epidemic process. The visualization of the
system's results aids in the interpretation and application of the findings.
However, there are several limitations to this study. First, the analysis was
based on a specific data set, and the informativeness of features may vary in dif-
ferent contexts or with different diseases. Therefore, the findings of this study
may not be directly generalizable to other epidemic processes. Second, the study
focused on two specific methods of assessing informativeness, and there may be
other methods that could yield different results or insights. Additionally, the study
did not consider the potential interactions between different features, which could
also influence the informativeness of individual features.
The study contributes a novel perspective by demonstrating a methodical
approach to assess the informativeness of various features related to epidemic
processes. By applying the Shannon and Kullback–Leibler methods, this study
brings a quantitative, data-driven approach to a field often dominated by qualita-
tive assessments and heuristic methods. This quantitative approach can lead to
more objective, replicable, and actionable insights into the drivers of epidemic
processes.
Additionally, this study contributes by identifying specific features that are
not informative in the context of the analyzed data set. This is crucial as it chal-
lenges conventional wisdom and prompts a re-evaluation of commonly held be-
liefs about the most critical factors in driving epidemic processes. This can lead to
a paradigm shift in how epidemic processes are analyzed and managed, moving
away from a one-size-fits-all approach to a more nuanced, data-driven approach.
Moreover, the study compares two widely used methods for assessing in-
formativeness, thereby providing insights into their relative merits and limitations.
This can guide researchers and practitioners in selecting the most appropriate
method for their specific context and research questions.
Developing an information system that supports data upload and informa-
tiveness calculations adds a practical tool that researchers and practitioners can
use to assess the informativeness of features in their own data sets. This contrib-
utes to the methodological rigor of future studies and enhances the practical ap-
plicability of the findings by enabling real-world implementation.
Future research should validate the findings of this study in different con-
texts and with different diseases to assess the generalizability of the results. It
would also be beneficial to compare the performance of the Shannon and Kull-
back–Leibler methods with other methods of assessing informativeness. Further-
more, future studies should also explore the potential interactions between differ-
ent features and their impact on the informativeness of individual features.
K. Bazilevych, O. Kyrylenko, Y. Parfeniuk, S. Yakovlev, S. Krivtsov, I. Meniailov, V. Kuznietcova, D. Chumachenko
ISSN 1681–6048 System Research & Information Technologies, 2023, № 4 110
Developing and evaluating more sophisticated information systems that can ac-
count for feature interactions and other complexities in the data would be a valu-
able avenue for future research.
Overall, this study contributes a novel perspective, challenges conventional
wisdom, provides practical insights into the relative merits of different methods,
and offers a practical tool for assessing feature informativeness. These contribu-
tions are crucial for enhancing our understanding of epidemic processes and de-
veloping more effective strategies for their management.
CONCLUSIONS
The use of methods for assessing informativeness is crucial in analyzing epidemic
processes. The main objective of such an analysis is to understand the spread of
the disease and determine the effectiveness of strategies to combat it. Methods of
informativeness assessment allow for determining how well a specific parameter
correlates with the risk of disease. This enables identifying population groups that
may be more susceptible to the disease and considering this when developing
prevention and treatment strategies.
As a result of this study, methods were identified and implemented that al-
low assessing the informativeness of features. Methods for assessing the informa-
tiveness of features were considered; algorithmic models were developed for the
Kullback–Leibler and Shannon methods. Both considered methods are based on
information theory principles and have advantages, differences, and standard fea-
tures. Thus, both the Shannon method and the Kullback–Leibler method are based
on the concept of the probability of events, use a logarithmic scale to measure
informativeness, which helps in dealing with very small or tremendous probabil-
ity values, and is widely used in the field of machine learning for evaluating the
informativeness of features, model management, and feature selection. Overall,
the Shannon and Kullback–Leibler informativeness assessment methods are valuable
tools for measuring the information contained in a random process. They can be used
in various fields, such as information theory, statistics, machine learning, etc.
Specific examples of using the described algorithmic models are presented.
A comparison of different methods and their results was carried out. It was found
that such features as “Chronic lung disease”, “Chronic kidney disease”, and
“Weakness of the immune system” do not carry information for further work with
the table and burden the prediction relative to the presented data set.
An information system for analyzing epidemic process data was developed
to assess the informativeness of features. This system supports data sample up-
loading and calculations of the informativeness of factors affecting the epidemic
process. The results of the system operation are visualized.
Acknowledgements. The study was funded by the National Research Foun-
dation of Ukraine in the framework of the research project 2020.02/0404 on the
topic “Development of intelligent technologies for assessing the epidemic situa-
tion to support decision-making within the population biosafety management”.
REFERENCES
1. K. Batko and A. Ślęzak, “The use of Big Data Analytics in healthcare,” Big Data, vol. 9,
no. 1 (2022), https://doi.org/10.1186/s40537-021-00553-4.
2. I. Izonin, R. Tkachenko, I. Dronyuk, et al., “Predictive modeling based on small data in
clinical medicine: RBF-based additive input-doubling method,” Mathematical Biosci-
ences and Engineering, vol. 18, no. 3, pp. 2599–2613 (2021),
https://doi.org/10.3934/mbe.2021132.
Information system for assessing the informativeness of an epidemic process features
Системні дослідження та інформаційні технології, 2023, № 4 111
3. S.Y. Lee, B. Lei, and B. Mallick, “Estimation of COVID-19 spread curves integrating
global data and borrowing information,” PLOS ONE, vol. 15, no. 7, 0236860 (2020),
https://doi.org/10.1371/journal.pone.0236860.
4. S. Ma, Y. Sun, and S. Yang, “Using Internet Search Data to Forecast COVID-19 Trends:
A Systematic Review,” Analytics, vol. 1, no. 2, pp. 210–227 (2022),
https://doi.org/10.3390/analytics1020014.
5. A. Ibrahim, U. W. Humphries, A. Khan, et al., “COVID-19 Model with High- and Low-
Risk Susceptible Population Incorporating the Effect of Vaccines,” Vaccines, vol. 11, no.
1 (2022), https://doi.org/10.3390/vaccines11010003.
6. N. Davidich, I. Chumachenko, Y. Davidich, et al., “Advanced Traveller Information Sys-
tems to Optimizing Freight Driver Route Selection,” 2020 13th International Conference
on Developments in eSystems Engineering (DeSE) (2020),
https://doi.org/10.1109/dese51703.2020.9450763.
7. S. Fedushko and T. Ustyianovych, “E-Commerce Customers Behavior Research Using
Cohort Analysis: A Case Study of COVID-19,” Journal of Open Innovation: Technol-
ogy, Market, and Complexity, vol. 8, no. 1, pp. 1-12 (2022),
https://doi.org/10.3390/joitmc8010012.
8. P.S. Knopov, O.S. Samosonok, and G.D. Bila, “A Model of Infectious Disease Spread
with Hidden Carriers,” Cybernetics and Systems Analysis, vol. 57, no. 4, pp. 647–655
(2021), https://doi.org/10.1007/s10559-021-00390-6.
9. D.A. Klyushin, “Effective algorithms for solving statistical problems posed by COVID-
19 pandemic,” Elsevier eBooks, pp. 21–44 (2023), https://doi.org/10.1016/b978-0-323-
90531-2.00005-9.
10. I. Krak, H. Kudin, V. Kasianiuk, et al., “Hyperplane Clustering of the Data in the Vector
Space of Features Based on Pseudo Inversion Tools,” CEUR Workshop Proceesings, vol.
3003, pp. 98–105 (2021), https://ceur-ws.org/Vol-3003/short4.pdf
11. O. Filchakova, D. Dossym, A. Ilyas, et al., “Review of COVID-19 testing and diagnostic
methods,” Talanta, vol. 244, 123409 (2022), https://doi.org/10.1016/ j.talanta.2022.123409.
12. S. Patil, H. Lu, C. L. Saunders, et al., “Public preferences for electronic health data stor-
age, access, and sharing — evidence from a pan-European survey,” Journal of the
American Medical Informatics Association, vol. 23, no. 6, pp. 1096–1106 (2016),
https://doi.org/10.1093/jamia/ocw012.
13. V. Berisha, C. Krantsevich, P. R. Hahn, et al., “Digital medicine and the curse of dimensional-
ity,” npj Digital Medicine, vol. 4, no. 1 (2021) https://doi.org/10.1038/ s41746-021-00521-5.
14. K. Bazilevych, S. Krivtsov, and M. Butkevych, “Intelligent Evaluation of the Informative
Features of Cardiac Studies Diagnostic Data using Shannon Method,” CEUR Workshop
Proceedings, vol. 3003, pp. 65–75 (2021).
15. I. Meniailov and H. Padalko, “Application of Multidimensional Scaling Model for Hepatitis C
Data Dimensionality Reduction,” CEUR Workshop Proceedings, vol. 3348, pp. 34–43 (2022).
16. K. O. Bazilevych, D. I. Chumachenko, L. F. Hulianytskyi, et al., “Intelligent Decision-
Support System for Epidemiological Diagnostics. I. A Concept of Architecture Design,”
Cybernetics and Systems Analysis, vol. 58, no. 3, pp. 343–353 (2022),
https://doi.org/10.1007/s10559-022-00466-x.
17. K.O. Bazilevych, D.I. Chumachenko, L.F. Hulianytskyi, et al., Intelligent Decision-
Support System for Epidemiological Diagnostics. II. Information Technologies Devel-
opment,” Cybernetics and Systems Analysis, vol. 58, no. 4, pp. 499–509 (2022).
https://doi.org/10.1007/s10559-022-00484-9
18. D. Panda, R. Ray, and Satya Ranjan Dash, “Feature Selection: Role in Designing Smart
Healthcare Models,” Intelligent systems reference library, vol. 178, pp. 143–162, (2020),
https://doi.org/10.1007/978-3-030-37551-5_9.
19. D. Geiszler, D. A. Polasky, F. Yu, and A. I. Nesvizhskii, “Detecting diagnostic features
in MS/MS spectra of post-translationally modified peptides,” Nature Communications,
vol. 14, no. 1 (2023), https://doi.org/10.1038/s41467-023-39828-0.
20. D.E. Ehrmann, S. Joshi, S.D. Goodfellow, et al., “Making machine learning matter to
clinicians: model actionability in medical decision-making,” npj Digital Medicine, vol. 6,
no. 1 (2023), https://doi.org/10.1038/s41746-023-00753-7.
21. O. Cliff, M. Prokopenko, and R. Fitch, “Minimising the Kullback–Leibler Divergence for
Model Selection in Distributed Nonlinear Systems,” Entropy, vol. 20, no. 2, p. 51 (2018),
doi: https://doi.org/10.3390/e20020051.
K. Bazilevych, O. Kyrylenko, Y. Parfeniuk, S. Yakovlev, S. Krivtsov, I. Meniailov, V. Kuznietcova, D. Chumachenko
ISSN 1681–6048 System Research & Information Technologies, 2023, № 4 112
22. X. Wang, W. Hou, H. Zhang, et al., “KDE-OCSVM model using Kullback–Leibler di-
vergence to detect anomalies in medical claims,” Expert Systems with Applications, vol.
200, 117056 (2022), doi: https://doi.org/10.1016/j.eswa.2022.117056.
23. N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, et al., “A Review of Feature Selection
Methods for Machine Learning-Based Disease Risk Prediction,” Frontiers in Bioinfor-
matics, vol. 2 (2022), https://doi.org/10.3389/fbinf.2022.927312.
24. J. Li, K. Cheng, S. Wang, et al., “Feature Selection,” ACM Computing Surveys, vol. 50,
no.6, pp. 1–45 (2018), https://doi.org/10.1145/3136625.
25. F. Jalali-najafabadi, M. Stadler, N. Dand, et al., “Application of information theo-
retic feature selection and machine learning methods for the development of genetic
risk prediction models,” Scientific Reports, vol. 11, no. 1 (2021),
https://doi.org/10.1038/s41598-021-00854-x.
26. A. D. Al-Nasser, A. Rawashdeh, and A. Talal, “On using Shannon entropy measure for
formulating new weighted exponential distribution,” Journal of Taibah University for
Science, vol. 16, no. 1, pp. 1035–1047 (2022), https://doi.org/10.1080/ 16583655.2022.2135806.
27. “Scikit-learn: machine learning in Python,” Scikit-learn.org (2019), https://scikit-
learn.org/stable/
28. “COVID-19 Dataset,” www.kaggle.com (2022), https://www.kaggle.com/datasets/
meirnizri/covid19-dataset
Received 06.09.2023
INFORMATION ON THE ARTICLE
Kseniia O. Bazilevych, ORCID: 0000-0001-5332-9545, National Aerospace University
“Kharkiv Aviation Institute”, Ukraine, e-mail: k.bazilevych@khai.edu
Olena Yu. Kyrylenko, ORCID: 0009-0005-8917-0878, National Aerospace University
“Kharkiv Aviation Institute”, Ukraine, e-mail: o.kyrylenko@khai.edu
Yurii L. Parfenyuk, ORCID: 0000-0001-5357-1868, V.N. Karazin Kharkiv National
University, Ukraine, e-mail: parfuriy.l@gmail.com
Sergiy V. Yakovlev, ORCID: 0000-0003-1707-843X, National Aerospace University
“Kharkiv Aviation Institute”, Ukraine, e-mail: s.yakovlev@khai.edu
Serhii O. Krivtsov, ORCID: 0000-0001-5214-0927, National Aerospace University
“Kharkiv Aviation Institute”, Ukraine, e-mail: krivtsovpro@gmail.com
Ievgen S. Meniailov, ORCID: 0000-0002-9440-8378, V.N. Karazin Kharkiv National
University, Ukraine, e-mail: evgenii.menyailov@gmail.com
Victoriya O. Kuznietcova, ORCID: 0000-0003-3882-1333, V.N. Karazin Kharkiv Na-
tional University, Ukraine, e-mail: vkuznietcova@karazin.ua
Dmytro I. Chumachenko, ORCID: 0000-0003-2623-3294, National Aerospace Univer-
sity “Kharkiv Aviation Institute”, Ukraine, e-mail: d.chumachenko@khai.edu
ІНФОРМАЦІЙНА СИСТЕМА ДЛЯ ОЦІНЮВАННЯ ІНФОРМАТИВНОСТІ
ОЗНАК ЕПІДЕМІЧНОГО ПРОЦЕСУ / К.O. Базілевич, О.Ю. Кіріленко, Ю.Л. Пар-
фенюк, С.В. Яковлев, С.О. Кривцов, Є.С. Меняйлов, В.О. Кузнецова, Д.І. Чумаченко
Анотація. Роботп полягає в оцінюванні інформативності параметрів, які впли-
вають на епідемічні процеси, з використанням методів Шенона та Кульбака–
Лейблера на основі їх фундаментальності у принципах теорії інформації та їх
широкого застосування в машинному навчанні, статистиці та інших відповід-
них галузях. Проведено порівняльний аналіз результатів, отриманих обома ме-
тодами, розроблено інформаційну систему для спрощення завантаження вибі-
рок даних та обчислення інформативності факторів, які впливають на
епідемічні процеси. Показано, що деякі ознаки, такі як «хронічне захворюван-
ня легень», «хронічне захворювання нирок» та «ослаблений імунітет», не міс-
тили значущої інформації для подальшого аналізу та ускладнювали процес
прогнозування за даними досліджуваного набору даних. Розроблена інформа-
ційна система ефективно підтримує оцінювання інформативності ознак, тим
самим сприяючи комплексному аналізу епідемічних процесів, візуалізації ре-
зультатів, а також поточному стану знань. Надано конкретні приклади застосу-
вання описаних алгоритмічних моделей, порівняння різних методів та їх результатів
та розроблення підтримувального інструменту для аналізу епідемічних процесів.
Ключові слова: інформаційна система, епідемічний процес, інформативність
ознаки, метод Шенона, метод Кульбака–Лейблера.
|
| id | journaliasakpiua-article-297411 |
| institution | System research and information technologies |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2025-07-17T10:28:26Z |
| publishDate | 2023 |
| publisher | The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" |
| record_format | ojs |
| resource_txt_mv | journaliasakpiua/28/1967b145e8a0d3b695b31b4c308c3928.pdf |
| spelling | journaliasakpiua-article-2974112024-02-01T21:03:07Z Information system for assessing the informativeness of an epidemic process features Інформаційна система для оцінювання інформативності ознак епідемічного процесу Bazilevych, Kseniia Kyrylenko, Olena Parfenyuk, Yurii Yakovlev, Sergiy Krivtsov, Serhii Meniailov, Ievgen Kuznietcova, Victoriya Chumachenko, Dmytro інформаційна система епідемічний процес інформативність ознаки метод Шенона метод Кульбака–Лейблера information system epidemic process informativeness of features Shannon method Kullback–Leibler method The primary objective of this study is to assess the informativeness of various parameters influencing epidemic processes utilizing the Shannon and Kullback–Leibler methods. These methods were selected based on their foundation in the principles of information theory and their extensive application in machine learning, statistics, and other relevant domains. A comparative analysis was performed between the results acquired from both methods, and an information system was designed to facilitate the uploading of data samples and the calculation of factor informativeness impacting the epidemic processes. The findings revealed that certain features, such as “Chronic lung disease,” “Chronic kidney disease,” and “Weakened immunity,” did not carry significant information for further analysis and hindered the forecasting process, as per the data set examined. The developed information system efficiently supports the assessment of feature informativeness, thereby aiding in the comprehensive analysis of epidemic processes and enabling the visualization of the results. This study contributes to the current body of knowledge by providing specific examples of applying the described algorithmic models, comparing various methods and their outcomes, and developing a supportive tool for analyzing epidemic processes. Робота полягає в оцінюванні інформативності параметрів, які впливають на епідемічні процеси, з використанням методів Шенона та Кульбака–Лейблера на основі їх фундаментальності у принципах теорії інформації та їх широкого застосування в машинному навчанні, статистиці та інших відповідних галузях. Проведено порівняльний аналіз результатів, отриманих обома методами, розроблено інформаційну систему для спрощення завантаження вибірок даних та обчислення інформативності факторів, які впливають на епідемічні процеси. Показано, що деякі ознаки, такі як «хронічне захворювання легень», «хронічне захворювання нирок» та «ослаблений імунітет», не містили значущої інформації для подальшого аналізу та ускладнювали процес прогнозування за даними досліджуваного набору даних. Розроблена інформаційна система ефективно підтримує оцінювання інформативності ознак, тим самим сприяючи комплексному аналізу епідемічних процесів, візуалізації результатів, а також поточному стану знань. Надано конкретні приклади застосування описаних алгоритмічних моделей, порівняння різних методів та їх результатів та розроблення підтримувального інструменту для аналізу епідемічних процесів. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2023-12-26 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/297411 10.20535/SRIT.2308-8893.2023.4.08 System research and information technologies; No. 4 (2023); 100-112 Системные исследования и информационные технологии; № 4 (2023); 100-112 Системні дослідження та інформаційні технології; № 4 (2023); 100-112 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/297411/290390 |
| spellingShingle | інформаційна система епідемічний процес інформативність ознаки метод Шенона метод Кульбака–Лейблера Bazilevych, Kseniia Kyrylenko, Olena Parfenyuk, Yurii Yakovlev, Sergiy Krivtsov, Serhii Meniailov, Ievgen Kuznietcova, Victoriya Chumachenko, Dmytro Інформаційна система для оцінювання інформативності ознак епідемічного процесу |
| title | Інформаційна система для оцінювання інформативності ознак епідемічного процесу |
| title_alt | Information system for assessing the informativeness of an epidemic process features |
| title_full | Інформаційна система для оцінювання інформативності ознак епідемічного процесу |
| title_fullStr | Інформаційна система для оцінювання інформативності ознак епідемічного процесу |
| title_full_unstemmed | Інформаційна система для оцінювання інформативності ознак епідемічного процесу |
| title_short | Інформаційна система для оцінювання інформативності ознак епідемічного процесу |
| title_sort | інформаційна система для оцінювання інформативності ознак епідемічного процесу |
| topic | інформаційна система епідемічний процес інформативність ознаки метод Шенона метод Кульбака–Лейблера |
| topic_facet | інформаційна система епідемічний процес інформативність ознаки метод Шенона метод Кульбака–Лейблера information system epidemic process informativeness of features Shannon method Kullback–Leibler method |
| url | https://journal.iasa.kpi.ua/article/view/297411 |
| work_keys_str_mv | AT bazilevychkseniia informationsystemforassessingtheinformativenessofanepidemicprocessfeatures AT kyrylenkoolena informationsystemforassessingtheinformativenessofanepidemicprocessfeatures AT parfenyukyurii informationsystemforassessingtheinformativenessofanepidemicprocessfeatures AT yakovlevsergiy informationsystemforassessingtheinformativenessofanepidemicprocessfeatures AT krivtsovserhii informationsystemforassessingtheinformativenessofanepidemicprocessfeatures AT meniailovievgen informationsystemforassessingtheinformativenessofanepidemicprocessfeatures AT kuznietcovavictoriya informationsystemforassessingtheinformativenessofanepidemicprocessfeatures AT chumachenkodmytro informationsystemforassessingtheinformativenessofanepidemicprocessfeatures AT bazilevychkseniia ínformacíjnasistemadlâocínûvannâínformativnostíoznakepídemíčnogoprocesu AT kyrylenkoolena ínformacíjnasistemadlâocínûvannâínformativnostíoznakepídemíčnogoprocesu AT parfenyukyurii ínformacíjnasistemadlâocínûvannâínformativnostíoznakepídemíčnogoprocesu AT yakovlevsergiy ínformacíjnasistemadlâocínûvannâínformativnostíoznakepídemíčnogoprocesu AT krivtsovserhii ínformacíjnasistemadlâocínûvannâínformativnostíoznakepídemíčnogoprocesu AT meniailovievgen ínformacíjnasistemadlâocínûvannâínformativnostíoznakepídemíčnogoprocesu AT kuznietcovavictoriya ínformacíjnasistemadlâocínûvannâínformativnostíoznakepídemíčnogoprocesu AT chumachenkodmytro ínformacíjnasistemadlâocínûvannâínformativnostíoznakepídemíčnogoprocesu |