Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications
Six decision rules based on the analysis of the Levenshtein distances and the number of occurrences of characteristic patterns of codewords are considered and investigated. It has been shown that using of the developed decision rules makes it possible to increase the sensitivity and specificity of d...
Gespeichert in:
Datum: | 2021 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | English |
Veröffentlicht: |
Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України
2021
|
Schriftenreihe: | Control systems & computers |
Schlagworte: | |
Online Zugang: | http://dspace.nbuv.gov.ua/handle/123456789/181260 |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Назва журналу: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
Zitieren: | Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications / L.S. Fainzilberg, Ju.R. Dykach // Control systems & computers. — 2021. — № 2-3. — С. 28-39. — Бібліогр.: 12 назв. — англ. |
Institution
Digital Library of Periodicals of National Academy of Sciences of Ukraineid |
irk-123456789-181260 |
---|---|
record_format |
dspace |
spelling |
irk-123456789-1812602021-11-10T01:26:28Z Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications Fainzilberg, L.S. Dykach, Ju.R. Fundamental Problems in Computer Science Six decision rules based on the analysis of the Levenshtein distances and the number of occurrences of characteristic patterns of codewords are considered and investigated. It has been shown that using of the developed decision rules makes it possible to increase the sensitivity and specificity of diagnos-tics even in cases when the ECG does not show traditional electrocardiological signs of myocardial ischemia.. Мета статті — розширення діагностичних можливостей лінгвістичного підходу до аналізу та інтерпретації електрокардіограм (ЕКГ). Результати. На основі обробки реальних клінічних даних верифікованих пацієнтів і здорових волонтерів побудовано еталони хворих на хронічну форму ішемічної хвороби серця (ІХС) і здорових пацієнтів. Еталони розроблено з використанням обчислювальних процедур, прийнятих в математичній лінгвістиці — відстані Левенштейна, що являє собою мінімальну кількість операцій редагування (вставки, видалення та заміни символу), що забезпечує перехід від одного слова до іншого і частоти входження підрядка в аналізоване слово. На основі цих процедур розроблено вирішальні правила, що дають змогу ухвалювати діагностичні рішення, виходячи з відстані Левенштейна до еталонів і частоти входження одно-, дво- і трисимвольних патернів в кодові слова . Встановлено, що поєднання цих двох методів розширює діагностичні можливості лінгвістичного підходу до аналізу та інтерпретації ЕКГ. 2021 Article Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications / L.S. Fainzilberg, Ju.R. Dykach // Control systems & computers. — 2021. — № 2-3. — С. 28-39. — Бібліогр.: 12 назв. — англ. 2706-8145 DOI https://doi.org/10.15407/csc.2021.02.028 http://dspace.nbuv.gov.ua/handle/123456789/181260 004.001 en Control systems & computers Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України |
institution |
Digital Library of Periodicals of National Academy of Sciences of Ukraine |
collection |
DSpace DC |
language |
English |
topic |
Fundamental Problems in Computer Science Fundamental Problems in Computer Science |
spellingShingle |
Fundamental Problems in Computer Science Fundamental Problems in Computer Science Fainzilberg, L.S. Dykach, Ju.R. Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications Control systems & computers |
description |
Six decision rules based on the analysis of the Levenshtein distances and the number of occurrences of characteristic patterns of codewords are considered and investigated. It has been shown that using of the developed decision rules makes it possible to increase the sensitivity and specificity of diagnos-tics even in cases when the ECG does not show traditional electrocardiological signs of myocardial ischemia.. |
format |
Article |
author |
Fainzilberg, L.S. Dykach, Ju.R. |
author_facet |
Fainzilberg, L.S. Dykach, Ju.R. |
author_sort |
Fainzilberg, L.S. |
title |
Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications |
title_short |
Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications |
title_full |
Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications |
title_fullStr |
Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications |
title_full_unstemmed |
Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications |
title_sort |
development of linguistic approach to the problem of the computer electro-cardiogram's classifications |
publisher |
Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України |
publishDate |
2021 |
topic_facet |
Fundamental Problems in Computer Science |
url |
http://dspace.nbuv.gov.ua/handle/123456789/181260 |
citation_txt |
Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications / L.S. Fainzilberg, Ju.R. Dykach // Control systems & computers. — 2021. — № 2-3. — С. 28-39. — Бібліогр.: 12 назв. — англ. |
series |
Control systems & computers |
work_keys_str_mv |
AT fainzilbergls developmentoflinguisticapproachtotheproblemofthecomputerelectrocardiogramsclassifications AT dykachjur developmentoflinguisticapproachtotheproblemofthecomputerelectrocardiogramsclassifications |
first_indexed |
2025-07-15T22:06:49Z |
last_indexed |
2025-07-15T22:06:49Z |
_version_ |
1837752355551969280 |
fulltext |
28 iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3
doi https://doi.org/10.15407/csc.2021.02.028
Udc 004.001
l.s. FAInZIlBerG, dr.Sc. (eng.), professor, chief researcher,
international research and training center for information technologies
and Systems of the NaS and meS of Ukraine,
acad. glushkova ave., 40, kyiv, 03187, Ukraine,
fainzilberg@gmail.com
Ju.r. DyKACH, Student of biomedical engineering faculty,
the National technical University of Ukraine
«igor Sikorsky kyiv polytechnic institute»,
37, peremohy ave., kyiv, 03056, Ukraine,
jul.dykach@gmail.com
DeVelopment oF lInGuIstIC ApproACH
to tHe proBlem oF tHe Computer
eleCtroCArDIoGrAm's ClAssIFICAtIons
Six decision rules based on the analysis of the Levenshtein distances and the number of occurrences of characteristic patterns
of codewords are considered and investigated. It has been shown that using of the developed decision rules makes it possible to
increase the sensitivity and specificity of diagnos-tics even in cases when the ECG does not show traditional electrocardiological
signs of myocardial ischemia..
Keywords: ECG, the Levenshtein distance, occurrence frequency of a substring in a code word, decision rule.
Introduction
For more than a hundred years, electrocardiogra-
phy has been widely used in cardiological practice
to diagnose diseases of the cardiovascular system .
However, it is known that the traditional approach
to the analysis and interpretation of the ECG does
not always provide the required reliability of diag-
nostic decisions . So, for example, according to the
medical statistics [1] resting ECG, assessed accor-
ding to generally accepted criteria, remains normal
in almost 50% of patients with chronic coronary
artery disease (СAD) . Therefore, experts are ac-
tively exploring new approaches to computerized
ECG processing .
One of these new approaches is an intelligent
method of ECG processing, called “fasegraphy” .
It was developed at the International Research and
Training Center for Information Technologies and
Systems of the National Academy of Sciences of
Ukraine and Ministry of Education and Science
of Ukraine [2] . The method is based on the transi-
tion from a scalar signal x(t) to a vector signal on
the phase plane x(t), ( )x t� , where ( )x t� is the rate of
change in the electrical activity of the heart, which
is determined on the basis of original computati-
onal procedures according to the signal x(t) re-
corded in the standard lead, for example, in the
first standard lead (left and right hand) .
Large-scale clinical trials have shown that the
fasegraphy method provides an increase in the reli-
ability of detecting latent signs of myocardial ische-
mia even in those cases when the generally accep-
ted electrocardiographic signs of СAD (depression
or elevation of the isoelectric line) are absent in all
12 traditional leads . This is achieved through the
iSSN 2706-8145, control systems and computers, 2021, № 2–3 29
Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications
use of new diagnostic ECG indicators in the phase
space, in particular, the parameter β
T
characteri-
zing the symmetry of the repolarization area on the
phase plane [3, 4] .
Further studies have shown that not only the ave-
rage value β
T
, but also the dynamics of its change
from cycle to cycle has diagnostic value . In the sim-
plest case, the variability of the indicator Tβ cha-
racterizes the mean square deviation of the RMS
β
T
. For a more subtle analysis of the change in the
indicator β
T
, it turned out to be useful to calculate
the entropy estimates of the signal, in particular,
the modified permutation entropy [5] .
An effective method for assessing the dynamics
of changes in the shape of ECG cycles is based on
the use of a linguistic approach to processing cyclic
signals . This approach is based on the transition
from the observed ECG to a sequence of symbols
(word), which uniquely encodes the ECG [6] .
The purpose of this article is a further examina-
tion of this method .
Basic method
of linguistic Analysis
and eCG Interpretation
Let`s first consider the approach to ECG analy-
sis proposed in [6], which we will need in further
studies . Using a microprocessor sensor with finger
electrodes, the ECG signal x(t) of the first standard
lead is recorded (Fig . 1) .
For each i cycle (i = 1, . . ., N) of the digitized sig-
nal x(t), using special computational procedures
implemented in the fasegraphy method, the dura-
tions of the cycles (RR
i
-intervals) and the values
of the mentioned original indicator β
T, i are deter-
mined .
Further, the dynamics of these indicators is as-
sessed in the process of ECG registration . For this
purpose, indicator variables are introduced
1( )
1
1, if 0,
1, if 0.
i iRR
i
i i
RR RR
V
RR R
−
−
+ − >
= − − >
(1)
, , 1(β)
, 1
1, if β β 0,
1, if β β , 0,
T i T ii
i
T i T i
V −
−
+ − >= − − >
(2)
where i = 2, . . ., N .
iRR
iTβ
Sequences V
i
(R) and V
i
(β) allow you to encode
each ECG cycle with one of the alphabet symbols
A = {a, b, c, d} as follows (Table 1) .
As a result, the N – 1 — digit word S
k
, composed
of the symbols a, b, c, d, uniquely encodes the k-th
processed ECG (Fig . 3) .
The transition from the observed ECG to the
code word makes it possible to use the methods
of mathematical linguistics to solve the problem
of the analysis and interpretation of the ECG . In
particular, the proposed method provides for an
assessment of the proximity L(Sµ, Sν) between the
codewords Sµ, Sν, of processed ECGs based on the
editorial distance L(Sµ, Sν) — the Levenshtein dis-
Fig. 1. Microprocessor ECG recorder1
1 Sensor developed by Solvaig, J .S .C . (Kyiv)
https://solvaig .com/fasegraphy
Fig. 2 . ECG indicators
Indicator variable value Vi
(RR) +1 +1 -1 -1
Indicator variable value Vi
(βT) +1 -1 +1 -1
Symbol а b c d
Table 1. Principle of ECG cycle coding
30 iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3
L.S. Fainzilberg, Ju.R. Dykach
tance, which is defined as the minimum number
of editing operations (insertion, deletion, and re-
placement of a symbol) that ensures the transition
from word Sµ to word Sν [9] .
To calculate the Levenshtein distance, the
Wagner-Fischer algorithm [10], based on the dy-
namic programming method, is used . Table 2 shows
the optimal path to transition from the word,
Sµ = ddabdcbadcbadca (3)
to the word
Sν = bacdaaacdadccbb . (4)
The closeness of these words assesses the Leven-
shtein distance L(Sµ, Sν) = 10 .
Electrocardiogram
Parameters
iRR , iT ,β , ( Ni ,...,1= )
Indicator variables
)(RR
iV , )(β
iV , ( 1,...,1 −= Ni )
Coding word
ddabdcbadcbadca
ECG classification based on the proposed ap-
proach assumes:
• construction of class standards based on the
Levenshtein distances between pairs of code words
of the training set;
• comparison of the code word of the processed
ECG with the standards .
The algorithm for constructing standards is as
follows . Let, as a result of the experiments, Q
CAD
electrocardiograms of patients with coronary artery
disease (CAD) were recorded, which, in accordance
with tables 1, are coded with words S
q
(CAD),
q = 1, . . .
. . ., Q
CAD
. Let`s determine the Levenshtein distances
( ) ( )( , )CAD CADL S Sµν µ ν between each pair ( ) ( ),CAD CADS Sµ ν ,
µ = 1, . . ., Q
CAD
, ν = 1, . . ., M
1
of the indicated words
and form a square Q
CAD
× QCAD
matrix of distances
( ) ( )( , )CAD CADL S Sµν µ ν , µ = 1, . . ., Q
CAD
, ν = 1, . . ., Q
CAD
:
. (5)
Then the CAD patient's reference word will de-
termine the row of the matrix (5), the sum of the
elements of which is minimal, i .e .
. (6)
The reference word of the healthy group (Healthy)
is determined in a similar way by the elements of the
Levenshtein distance matrix ( ) ( )( )μν μ ν,
Healthy HealthyL S S ,
Fig. 3. The principle of forming a code word
Step Original word Operation Result of editing
1 Sµ = ddabdcbadcbadca Replacement d → b S = bdabdcbadcbadca
2 S = bdabdcbadcbadca Deleting d S = babdcbadcbadca
3 S = babdcbadcbadca Replacement b → c S = bacdcbadcbadca
4 S = bacdcbadcbadca Deleting с S = bacdbadcbadca
5 S = bacdbadcbadca Replacement b → a S = bacdaadcbadca
6 S = bacdaadcbadca Replacement d → a S = bacdaaacbadca
7 S = bacdaaacbadca Replacement b → d S = bacdaaacdadca
8 S = bacdaaacdadca Replacement a → c S = bacdaaacdadcc
9 S = bacdaaacdadcc Insert b S = bacdaaacdadccb
10 S = bacdaaacdadccb Insert b Sν = bacdaaacdadccbb
Table 2. Optimal transition from word to word
11 12 1
21 22 2
1 2
CAD
CAD
CAD CAD
Q
Q
CAD CAD Q Q
L L L
L L L
L L L
Λ =
…
…
… … … …
…
( )
0
1 1
argmin
CAD
CAD
Q
CAD
Q
S Lµ ν
ν µ≤ ≤ =
= ∑
iSSN 2706-8145, control systems and computers, 2021, № 2–3 31
Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications
µ = 1, . . ., Q
Healthy
, ν = 1, . . ., Q
Healthy
, built for all pairs
of codewords of the control group, i .e .
( )
0 μν
1 ν μ 1
arg min .
Healthy
Healthy
Q
Healthy
Q
S L
≤ ≤ =
= ∑
(7)
Reference code words (6), (7) allow to classify
the analyzed ECG based on the comparison of the
Levenshtein distances between its code word tS
and reference words S
0
(CAD) and S
0
(Healthy) using the
following decision rule:
Decision rule 1:
CAD, if , (8)
Healthy, if . . (9)
The study of the diagnostic capabilities of the
proposed method was carried out on the basis of
real ECGs registered at the Department of Ische-
mic Heart Diseases of the V .D . Strazhesko AMS
of Ukraine (Kyiv) and four German clinics: Essen
University Hospital (Essen), Katholical Hospital
“Phillpusstift” (Essen), Heart and Diabetes Center
of North Rhein-Weasfalia (Bad-Oeynhausen),
German Heart Center (Berlin) .
The clinical material consisted of 100 ECG
records of patients with coronary artery disease
(CAD), the diagnosis of which was previously es-
tablished based on the results of coronary angiog-
raphy, and 100 ECG records of healthy volunteers
included in the control group . It is important to
note that the training set included only those ECGs
in which traditional electrocardiographic signs of
coronary artery disease (flat or negative wave T, de-
pression or isoelectric line elevation) were absent .
In other words, from the point of view of traditio-
nal cardiology, all ECGs, including those of verified
patients, would be classified as healthy (Healthy) .
Table 3 shows a fragment of the database of code
words built for the ECG of the training set .
( ) ( )
0 0( , ) ( , )CAD Healthy
t tL S S L S S≤
CAD Healthy
Table 3. ECG training set code words
( ) ( )
0 0( , ) ( , )CAD Healthy
t tL S S L S S>
32 iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3
L.S. Fainzilberg, Ju.R. Dykach
Using the training set according to formulas (6),
(7), reference code words of the indicated classes
were determined:
S
0
(CAD) = adcbdadcadabdabcadabdadcbdab,
S
0
(Healthy) = cbcdcdabdcabddcaadcaa .
For illustration, we present the results of the
ECG assessment of a verified sick patient (male, 69
years old) whose codogram had the form
S
t
(1) = adcabdadcadabdaddabdaadabdbdda
and a representative of the control group —
a 54-year-old man whose codogram looked like,
S
t
(2) = bdcbbcdcabcdcabcdcbaa .
It is easy to verify that L(S
t
(1) , S
0
(CAD)) = 13 and
L(S
t
(1) , S
0
(Healthy)) = 15, i .e .
L(S
t
(1) , S
0
(CAD)) < L(S
t
(1) , S
0
(Healthy))
and in accordance with rule (8), the subject was as-
signed to the CAD group .
Similarly, for the second subject we have
L(S
t
(2) , S
0
(CAD)) = 14 and L(S
t
(1) , S
0
(Healthy)) = 8, i .e .,
L(S
t
(1) , S
0
(CAD)) > L(S
t
(1) , S
0
(Healthy))
and in accordance with rule (9) the subject was as-
signed to the healthy group .
Clinical studies have shown that, despite the
presence of traditional electrocardiographic signs
on the ECG of patients with coronary artery di-
sease (CAD), decision rule 1 provides sensitivity
S
E
= 72% and specificity S
P
= 79% .
Fig . 4 presents estimates of the conditional dis-
tributions of Levenshtein distances with respect to
the reference codograms of the sick P(L(S
t
, S
0
(CAD)))
and the Healthy P(L(S
t
, S
0
(Healthy))) .
Checking the hypothesis about the homogene-
ity of conditional distributions P(L(S
t
, S
0
(CAD))) and
P(L(S
t
, S
0
(Healthy))) according to the Kolmogorov-
Smirnov criterion showed, that with high statistical
significance (p < 0,001) the hypothesis of equality
of distributions must be rejected . A similar fact was
confirmed by the Man-Whitney test for indepen-
dent samples .
Statistically significant difference in conditional
distributions P(L(S
t
, S
0
(CAD))) and P(L(S
t
, S
0
(Healthy)))
allows to hypothesize that, the Levenshtein dis-
tance is not only a useful diagnostic feature, but
also beneficial in combination with other diagnos-
tic features [11] . Therefore, the next stage of our
research was aimed at finding additional diagnostic
signs that would increase the sensitivity and speci-
ficity of the decision rule (8), (9) .
extension of the Basic method
According to [12], the probability of the appea-
rance of symbols in code words carries valuable
information in the linguistic analysis of physiologi-
cal signals . Based on this idea, lets introduce into
consideration three types of patterns, which are
substrings of code words of the training set:
π• 1 = x, x ∈ {a, b, c, d} — one-cha-
racter pattern;
π• 2 = xy, x, y ∈ {a, b, c, d} — two-
character pattern;
π• 3 = xyz, x, y, z ∈ {a, b, c, d} —
three-character pattern .
Based on the data of the training sample, we
calculate the average frequencies P(Healthy)(π1) and
P(CAD)(π1) of the appearance of single-symbol pat-
terns in the ECG code words of healthy and sick
groups:
( ) ( ) ( )1
1
1
π1π ,
HealthyQ
Healthy i
iHealthy i
G
P
Q W=
= ∑ (10)
( ) ( ) ( )1
2
1
π1π ,
CADQ
CAD i
iCADA i
G
P
Q W=
= ∑ (11)
where G(π1) — the number of occurrences of a
two-character pattern π1 = x, x ∈ {a, b, c, d} in the
i-th codeword, W
1
— the total number of charac-
ters in the i-th codeword, Q
Healthy
и Q
CAD
— number
of ECGs of healthy and CAD in the training set .
The results of evaluating the frequency
P(Healthy)(π1) and P(CAD)(π1) of occurrence of sin-
gle-character patterns in groups are summarized
in Table 4 . To assess the statistical significance
(p-value of deviations in the mean frequency
P(Healthy)(π1) and P(CAD)(π1) was used the Student's
test, since checking by the Kolmogorov-Smirnov
test confirmed the normal distribution of the ap-
pearance of patterns π1 in the codewords .
iSSN 2706-8145, control systems and computers, 2021, № 2–3 33
Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications
),( )(
0
CAD
t SSL
),( )(
0
Healthy
t SSL
Fig. 6. Conditional distributions of the Levenshtein dis-
tances to the reference codograms CAD and Healthy
Table 4 shows that the one-character pattern
π1 = d occurs in the codewords of healthy patients
and patients with CAD with different probabilities .
This fact made it possible to formulate the second
decisive rule:
Decision rule 2:
CAD, if
, (12)
Healthy, if
( ) ( )( ) ( )
( ) ( )CAD Healthyt t
t t
G d G dP d P d
W W
− > − , (13)
Else uncertain .
where G
t
(d) — number of occurrences of a charac-
ter x = d into the analyzed codeword of length W
t
,
а P(CAD) (d) ≈ 0,311 and P(Healthy) (d) ≈ 0,286 — esti-
mating the probabilities of a one-character pattern
π1 = d in the codewords of the corresponding
groups .
However, the single-character pattern π = d pro-
vides low reliability of the decisions made: the de-
cision rule (12), (13) provided sensitivity S
E
= 59%
and specificity S
p
= 58% .
Therefore, the diagnostic value of the decision
rule that was investigated, which allows to make a
decision with the Levenshtein distances together
with an estimate of the number of occurrences .
G
t
(d) pattern d=π into the analyzed codeword .
Thus, it somewhat improves the decision rule 1 .
Decision rule 3:
CAD, if L(S
t
, S
0
(CAD)) ≤ L(S
t
, S
0
(Healthy)) AND
( ) ( )1 1( ) ( )
( ) ( )CAD Healthyt t
t t
G GP d P d
W W
π π
− ≤ − , (14)
Healthy, if L(S
t
, S
0
(CAD)) > L(S
t
, S
0
(Healthy)) AND
Pattern P(Healthy)(π1) P(CAD)(π1) p-value
a 0,273275 0,274386 0,907793
b 0,203286 0,196676 0,504594
c 0,236435 0,217466 0,082603
d 0,286128 0,311056 0,024375
Table 4. Estimating the frequencies of occurrence of one-symbol patterns
( ) ( )1 1( ) ( )
( ) ( )CAD Healthyt t
t t
G GP d P d
W W
π π
− ≤ − , (15)
Else uncertain.
The rule (14), (15) provided decision with sensi-
tivity S
E
= 77,2% and specificity S
p
= 86,2% .
Consider the possibility of improving the deci-
sion rule by analyzing two-symbol patterns π2 = xy,
x, y ∈ {a, b, c, d} . Table 5 presents the results of
assessing the probability of the appearance of such
patterns in the words of the training sample .
Since the variety of two-pattern π2 characters is
greater than that of single-pattern π1, characters,
the frequency of occurrence π2 is low . Nevertheless,
as can be seen from Table 5, the two-character
pattern π2 = ab is almost twice as common in the
( ) ( )( ) ( )
( ) ( )CAD Healthyt t
t t
G d G dP d P d
W W
− ≤ −
34 iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3
L.S. Fainzilberg, Ju.R. Dykach
codewords of healthy patients than in the group
of CAD patients, moreover, these differences
are statistically significant with high reliability
(p-value = 0,0001) .
Taking this fact into account, we formulate
the decision rule, based on joint analysis of the
Levenshtein distances and pattern frequency
π2 = ba .
Decision rule 4:
CAD, if L(S
t
, S
0
(CAD)) ≤ L(S
t
, S
0
(Healthy)) AND
( ) ( )( ) ( )
( ) ( )
1 1
CAD Healthyt t
t t
G ba G baP ba P ba
W W
− ≤ −
− −
, (16)
Healthy, if L(S
t
, S
0
(CAD)) > L(S
t
, S
0
(Healthy)) AND
( ) ( )( ) ( )
( ) ( )
1 1
CAD Healthyt t
t t
G ba G baP ba P ba
W W
− > −
− −
, ( 17)
Else uncertain,
where G
t
(ab) — the number of occurrences of a
two-character pattern π2 = ba into the analyzed
codeword of length W
t
.
Pattern P(Healthy) |(π2) P(CAD) |(π2) p-value
aa 0,059126 0,038845 0,000189
ab 0,091659 0,093691 0,400061
ac 0,025715 0,019925 0,154795
ad 0,094804 0,118936 0,012481
ba 0,048478 0,027195 0,00001
bb 0,02254 0,015407 0,024597
bc 0,059796 0,063595 0,385212
bd 0,071602 0,084538 0,166476
ca 0,082968 0,086643 0,376135
cb 0,065612 0,064056 0,208652
cc 0,026535 0,020845 0,087007
cd 0,0544 0,044895 0,010992
da 0,079259 0,120013 0,000026
db 0,021287 0,022272 0,453138
dc 0,123771 0,111342 0,044878
dd 0,057954 0,053647 0,050011
The rule (16), (17) provided decision with sen-
sitivity S
E
= 77,3% and specificity S
P
= 89,1% for
most of the training set codewords .
Let’s consider the option of making a decision
with a simultaneous estimation of the Levenstein
distances and the number of occurrences of single-
character and two-character patterns in a code
word . Such decisions form decision rule 5 .
Decision rule 5:
CAD, if L(S
t
, S
0
(CAD)) ≤ L(S
t
, S
0
(Healthy)) AND
( ) ( )1 1( ) ( )
( ) ( )CAD Healthyt t
t t
G GP d P d
W W
π π
− ≤ −
OR
( ) ( )( ) ( )
( ) ( )
1 1
CAD Healthyt t
t t
G ba G baP ba P ba
W W
− ≤ −
− − (18)
Healthy, if L(S
t
, S
0
(CAD)) > L(S
t
, S
0
(Healthy))
AND ( ) ( )( ) ( )
( ) ( )
1 1
CAD Healthyt t
t t
G d G dP d P d
W W
− ≥ −
− −
Table 5. Estimating the frequencies of occurrence of two-symbol patterns
iSSN 2706-8145, control systems and computers, 2021, № 2–3 35
Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications
OR
( ) ( )( ) ( )
( ) ( )
1 1
CAD Healthyt t
t t
G ba G baP ba P ba
W W
− ≥ −
− −
. (19)
Else uncertain.
Using rule (18), (19) allows making the deci-
sions with sensitivity S
E
= 76,5% and specificity
S
P
= 84,7% for more than 80 % ECG of trai-
ning set .
Finally, let’s consider the diagnostic capabilities
of the decision rule, which provides for the analysis
of three-symbol patterns π3 = xyz, x, y, z ∈ {a, b,
c, d} . Table 6 shows the results confirming statisti-
cally significant (p < 0,05) differences in the proba-
bility of the appearance of such patterns in words
corresponding to the first and second groups of the
training sample .
Table 6 shows that the pattern π3 = dad with
high statistical significance is more typical for
the group of patients with CAD, and the pattern
π3 = caa is more typical for the group of healthy .
For illustration, fig . 7 shows parts of real ECGs
that generate these patterns in code words . And al-
though the presented fragments are visually almost
indistinguishable, the proposed ECG process-
ing algorithm provides an unambiguous assign-
ment of such fragments to the pattern . π3 = dad or
π3 = caa .
This allowed to propose another decision rule
based on the analysis of the Levenshtein distances
and the number of occurrences of three-character
patterns in the codeword .
Decision rule 6:
CAD, if L(S
t
, S
0
(CAD)) ≤ L(S
t
, S
0
(Healthy)) AND
G
t
(dad) ≥ G
t
(caa), (20)
Healthy, if L(S
t
, S
0
(CAD)) > L(S
t
, S
0
(Healthy)) AND
G
t
(dad) ≤ G
t
(caa), (21)
Else uncertain,
where G
t
(dad) — the number of occurrences of a
three-character pattern π3 = dad into the analyzed
codeword of length W
t
, а G
t
(caa) — the number of
Pattern P(Healthy) |(π3) P(Healthy) |(π3) p-value
ada 0,009104 0,029465 0,000006
add 0,031545 0,025732 0,017603
baa 0,007682 0,003707 0,045142
bad 0,023236 0,012264 0,00159
bca 0,013526 0,02479 0,003743
bda 0,021176 0,042309 0,000412
bdc 0,034448 0,025401 0,003428
caa 0,01891 0,007792 0,002103
cab 0,034292 0,026427 0,015428
cad 0,025595 0,051444 0,00015
cba 0,023889 0,011414 0,000466
cbb 0,012348 0,006269 0,021577
cbd 0,018828 0,028515 0,027272
cdc 0,019029 0,010455 0,009801
dab 0,028581 0,052221 0,00127
dad 0,01142 0,031005 0,000021
dba 0,009673 0,007151 0,11723
Table 6. Estimating the frequencies of occurrence of three-symbol patterns
36 iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3
L.S. Fainzilberg, Ju.R. Dykach
dad
caa
Decision rules S
E
, % S
P
,%
Percentage of rejections,
%
1 72 79 0
2 59 58 0
3 77,2 86,2 56
4 77,3 89,1 39,5
5 76,5 84,7 21,5
6 74,7 79,5 12,5
0
10
20
30
40
50
60
70
80
90
100
rule 1 rule 2 rule 3 rule 4 rule 5 rule 6
Se Sp Percentage of rejections
Table 7. Operational characteristics of decision rules
Fig. 8. Comparative characteristics of the developed
decision rules
occurrences of a three-character pattern π3 = caa
the number of occurrences of a three-character
pattern W
t
.
Rule (20), (21) provided decision with sensi-
tivity S
E
= 74,7% and specificity S
P
= 79,5%,
and the number of decisions increased to 87,5% .
In the table 7 and fig . 8 summarized the results
of assessing the diagnostic capabilities of the deve-
loped decision rules .
Fig. 7 . Patterns for group СAD (dad ) and healthy group (caa)
iSSN 2706-8145, control systems and computers, 2021, № 2–3 37
Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications
ConClusIons
The developed approach is based on the analysis
of the dynamics of two ECG indicators calculated
on the sequence of cardiac cycles . The first indica-
tor (traditional) represents the duration of an in-
dividual cycle, and the second (original) indicator
characterizes the symmetry of the T wave .
The proposed method is based on the transi-
tion from the calculated sequence of the specified
indicators to the codeword encoding the analyzed
ECG . This coding method made it possible to use
the techniques of mathematical linguistics to solve
the problem of analyzing and interpreting real
ECGs .
Six decision rules based on the analysis of the
Levenshtein distances and the number of occur-
rences of characteristic patterns of codewords are
considered and investigated . Based on the carried
out studies, it is researched that the proposed ap-
proach allows obtaining additional diagnostic in-
formation on real ECGs, which are missing the
electrocardiographic signs of CAD, adopted in tra-
ditional electrocardiology .
REFERENCES
Connolly D . C ., Elveback L . R ., Oxman H . A ., 1984 . “Coronary heart disease in residents of Rochester, Minnesota . IV . 1 .
Prognostic value of the resting electrocardiogram at the time of initial diagnosis of angina pectoris”, Mayo . Clin . Proc .,
59, pp . 247–250 . DOI: 10 .1016/s0025-6196(12)61257-9 .
Gritsenko V . I ., Fainzilberg L . S ., 2019 . Intellektualnyye informatsionnyye tekhnologii v tsifrovoy meditsine na primere 2 .
fazagrafii [Intelligent information technologies in digital medicine on the phase-graphy example], Naukova Dumka,
Kyiv, 423 p . (In Russian) .
Dyachuk D . D ., Kravchenko A . N ., Faynzilberg L . S ., Stanislavskaya S . S ., Korchinskaya Z . A ., Orikhovskaya K . B ., 3 .
Pasko V . S ., Mikhalev K . A ., 2016 . “Skrining ishemii miokarda metodom otsenki fazy repolyarizatsii” [“Screening
of myocardial ischemia by the method of assessing the phase of repolarization], Ukrainian Journal of Cardiology, 6,
pp . 82–89 . (In Russian) .
Dyachuk D . D ., Gritsenko V . I ., Fainzilberg L . S ., Kravchenko A . M ., et . al ., 2017 . “Zastosuvannya metodu fazahrafiyi 4 .
pry provedenni skryninhu ishemichnoyi khvoroby sertsya” [“The use of the method of fasegraphy in the screening of
coronary artery disease”], Methodological Recommendations of the Ministry of Health of Ukraine № 163 .16/13 .17,
Ukrainian Center for Scientific Medical Information and Patent and License Work, Kyiv, 32 p . (In Ukrainian) .
Fainzilberg L . S ., 2020 . “New Approaches to the Analysis and Interpretation of the Shape of Cyclic Signals”, Cybernetics 5 .
and Systems Analysis, 56 (4), pp . 665–674 . DOI: 10 .1007/s10559-020-00283-0 .
Fainzilberg L . S ., Dykach Ju . R ., 2019 . “Linguistic approach for estimation of electrocardiograms’s subtle chang-6 .
es based on the Levenstein distance”, Cybernetics and Computer Engineering, 2 (196), pp . 3–26 . DOI: 10 .15407/
kvt196 .02 .003 .
Uspenskiy V . M ., 2012 . “Diagnostic System Based on the Information Analysis of Electrocardiogram”, Proceedings of 7 .
Mediterranean Conference on Embedded Computing, MECO 2012, June 19–21, Montenegro, pp . 74–76 .
Kolesnikova O . V ., Krivenko S . S ., 2018 . “Informatsiynyy analiz elektrokardiosyhnaliv: ob-hruntuvannya i mozhlyvosti” 8 .
[“Information analysis of electrocardiosignals: rationale and possibilities”], Proceedings 1st International Scientific and
Practical conference “Information systems and technologies in medicine”, ISM-2018, KHNURE, Kharkiv, pp . 161–
163 . (In Ukrainian) .
Levenshteyn V . I ., 1965 . “Dvoichnyye kody s ispravleniyem vypadeniy, vstavok i zameshcheniy simvolov” [“Binary 9 .
codes with correction of occurrences, inserts and symbol substitutions”], dokl . Academy of Sciences of the USSR,
163 (4), pp . 845–848 . (In Russian) .
Wagner R . A ., Fischer M . J ., 1971 . “The String-to-String Correction Problem”, Journal of the ACM, 21 (1), pp . 168–10 .
173 . DOI: 10 .1145/321796 .321811 .
Faynzilberg L . S ., 2010 . Matematicheskiye metody otsenki poleznosti diagnosticheskikh priznakov [Mathematical 11 .
methods for evaluating the usefulness of diagnostic features], Osvita Ukrainy, Kyiv, 152 p . (In Russian) .
Senkevich Yu . I ., 2008 . “Lingvisticheskiy analiz fiziologicheskikh signalov” [“Linguistic analysis of physiological sig-12 .
nals”], Digital Signal Processing, 2, pp . 54–57 . (In Russian) .
Received 06 .04 .2021
38 iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3
L.S. Fainzilberg, Ju.R. Dykach
ЛІТЕРАТУРА
Connolly D . C ., Elveback L . R ., Oxman H . A . Coronary heart disease in residents of Rochester, Minnesota . IV . 1 .
Prognostic value of the resting electrocardiogram at the time of initial diagnosis of angina pectoris . Mayo . Clin . Proc .
1984 . 59 . P . 247–250 . DOI: https://doi .org/10 .1016/s0025-6196(12)61257-9 .
Гриценко В . И ., Файнзильберг Л . С . Интеллектуальные информационные технологии в цифровой медицине 2 .
на примере фазаграфии . Киев : Наукова Думка, 2019 . 423 с .
Дячук Д . Д ., Кравченко А . Н ., Файнзильберг Л . С ., Станиславская С . С ., Корчинская З . А ., Ориховская К . Б ., 3 .
Пасько В . С ., Михалев К . А . Скрининг ишемии миокарда методом оценки фазы реполяризации . Український
кардіологічний журнал . 2016 . 6 . C . 82–89 .
Дячук Д . Д ., Гриценко В . І ., Файнзільберг Л . С ., Кравченко А . М . и др . Застосування методу фазаграфії при 4 .
проведенні скринінгу ішемічної хвороби серця . Методичні рекомендації МОЗ України № 163 .16/13 .17 . Київ :
Український центр наукової медичної інформації і патентно-ліцензійної роботи, 2017 . 32 с .
Fainzilberg L . S . New Approaches to the Analysis and Interpretation of the Shape of Cyclic Signals . Cybernetics and 5 .
Systems Analysis . 2020 . 56 (4) . P . 665–674 . DOI: https://doi .org/10 .1007/s10559-020-00283-0 .
Fainzilberg L . S ., Dykach Ju . R . Linguistic approach for estimation of electrocardiograms’s subtle changes based on the 6 .
Levenstein distance . Cybernetics and Computer Engineering . 2019 . 2 (196) . P . 3–26 . DOI: https://doi .org/10 .15407/
kvt196 .02 .003 .
Uspenskiy V . M . Diagnostic System Based on the Information Analysis of Electrocardiogram . MECO 2012 : Proceedings 7 .
of Mediterranean Conference on Embedded Computing (Montenegro, June 19–21) . 2012 . P . 74–76 .
Колеснікова О . В ., Крівенко С . С . Інформаційний аналіз електрокардіосигналів: обгрунтування і можливості . 8 .
ISM–2018 : збірник наукових праць Першої Міжнародної науково-практичної конференції «Інформаційні
системи та технології в медицині» . Харків : ХНУРЕ, 2018 . С . 161–163 .
Левенштейн В . И . Двоичные коды с исправлением выпадений, вставок и замещений символов : докл . АН 9 .
СССР . 1965 . 163 (4) . С . 845–848 .
Wagner R . A ., Fischer M . J . The String-to-String Correction Problem . Journal of the ACM . 1971 . 21 (1) . P . 168–173 . 10 .
DOI: https://doi .org/10 .1145/321796 .321811 .
Файнзильберг Л . С . Математические методы оценки полезности диагностических признаков . Киев : Освита 11 .
Украины, 2010 . 152 с .
Сенкевич Ю . И . Лингвистический анализ физиологических сигналов . Цифровая обработка сигналов . 2008 . 2 . 12 .
С . 54–57 .
Надійшла 06 .04 .2021
iSSN 2706-8145, control systems and computers, 2021, № 2–3 39
Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications
Л.С. Файнзільберг, доктор технічних наук, професор, головний науковий співробітник,
Міжнародний науково-навчальний центр інформаційних технологій та систем НАН та МОН України,
03187, м . Київ, просп . Академіка Глушкова, 40, Україна,
fainzilberg@gmail .com
Ю.Р. Дикач, студентка факультету біомедичної інженерії, Національний техн . ун-т України
«Київський політехнічний інститут імені Ігоря Сікорського» НТУУ «КПІ ім . І . Сікорського»,
03056, м . Київ, просп . Перемоги, 37, Україна,
jul .dykach@gmail .com
РОЗВИТОК ЛІНГВІСТИЧНОГО ПІДХОДУ ДО ЗАДАЧІ
КОМП’ЮТЕРНОЇ КЛАСИФІКАЦІЇ ЕЛЕКТРОКАРДІОГРАМ
Вступ . Лінгвістичний підхід, що заснований на переході від спостережуваного циклічного сигналу до послідовності
символів (кодового слова), які характеризують динаміку показників від циклу до циклу, дає змогу використовува-
ти процедури математичної лінгвістики для підвищення достовірності прийнятих рішень .
Мета статті — розширення діагностичних можливостей лінгвістичного підходу до аналізу та інтерпретації
електрокардіограм (ЕКГ) .
Методи . Кожен цикл ЕКГ кодують одним з чотирьох символів, що характеризують зміни двох показників:
традиційного (тривалість циклу) і оригінального (симетрія ділянки реполяризації) .
Результати . На основі обробки реальних клінічних даних верифікованих пацієнтів і здорових волонтерів побу-
довано еталони хворих на хронічну форму ішемічної хвороби серця (ІХС) і здорових пацієнтів . Еталони розробле-
но з використанням обчислювальних процедур, прийнятих в математичній лінгвістиці — відстані Левенштейна,
що являє собою мінімальну кількість операцій редагування (вставки, видалення та заміни символу), що забезпечує
перехід від одного слова до іншого і частоти входження підрядка в аналізоване слово . На основі цих процедур роз-
роблено вирішальні правила, що дають змогу ухвалювати діагностичні рішення, виходячи з відстані Левенштейна
до еталонів і частоти входження одно-, дво- і трисимвольних патернів в кодові слова . Встановлено, що поєднання
цих двох методів розширює діагностичні можливості лінгвістичного підходу до аналізу та інтерпретації ЕКГ .
Висновки . Показано, що застосування розроблених вирішальних правил дає змогу підвищити чутливість і
специфічність діагностики навіть тоді, коли на ЕКГ відсутні традиційні електрокардіологічні ознаки ішемії
міокарда .
Ключові слова: ЕКГ, відстань Левенштейна, частота входження підрядка в кодове слово, вирішальне правило.
|