Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications

Six decision rules based on the analysis of the Levenshtein distances and the number of occurrences of characteristic patterns of codewords are considered and investigated. It has been shown that using of the developed decision rules makes it possible to increase the sensitivity and specificity of d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2021
Hauptverfasser: Fainzilberg, L.S., Dykach, Ju.R.
Format: Artikel
Sprache:English
Veröffentlicht: Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України 2021
Schriftenreihe:Control systems & computers
Schlagworte:
Online Zugang:http://dspace.nbuv.gov.ua/handle/123456789/181260
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:Digital Library of Periodicals of National Academy of Sciences of Ukraine
Zitieren:Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications / L.S. Fainzilberg, Ju.R. Dykach // Control systems & computers. — 2021. — № 2-3. — С. 28-39. — Бібліогр.: 12 назв. — англ.

Institution

Digital Library of Periodicals of National Academy of Sciences of Ukraine
id irk-123456789-181260
record_format dspace
spelling irk-123456789-1812602021-11-10T01:26:28Z Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications Fainzilberg, L.S. Dykach, Ju.R. Fundamental Problems in Computer Science Six decision rules based on the analysis of the Levenshtein distances and the number of occurrences of characteristic patterns of codewords are considered and investigated. It has been shown that using of the developed decision rules makes it possible to increase the sensitivity and specificity of diagnos-tics even in cases when the ECG does not show traditional electrocardiological signs of myocardial ischemia.. Мета статті — розширення діагностичних можливостей лінгвістичного підходу до аналізу та інтерпретації електрокардіограм (ЕКГ). Результати. На основі обробки реальних клінічних даних верифікованих пацієнтів і здорових волонтерів побудовано еталони хворих на хронічну форму ішемічної хвороби серця (ІХС) і здорових пацієнтів. Еталони розроблено з використанням обчислювальних процедур, прийнятих в математичній лінгвістиці — відстані Левенштейна, що являє собою мінімальну кількість операцій редагування (вставки, видалення та заміни символу), що забезпечує перехід від одного слова до іншого і частоти входження підрядка в аналізоване слово. На основі цих процедур розроблено вирішальні правила, що дають змогу ухвалювати діагностичні рішення, виходячи з відстані Левенштейна до еталонів і частоти входження одно-, дво- і трисимвольних патернів в кодові слова . Встановлено, що поєднання цих двох методів розширює діагностичні можливості лінгвістичного підходу до аналізу та інтерпретації ЕКГ. 2021 Article Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications / L.S. Fainzilberg, Ju.R. Dykach // Control systems & computers. — 2021. — № 2-3. — С. 28-39. — Бібліогр.: 12 назв. — англ. 2706-8145 DOI https://doi.org/10.15407/csc.2021.02.028 http://dspace.nbuv.gov.ua/handle/123456789/181260 004.001 en Control systems & computers Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України
institution Digital Library of Periodicals of National Academy of Sciences of Ukraine
collection DSpace DC
language English
topic Fundamental Problems in Computer Science
Fundamental Problems in Computer Science
spellingShingle Fundamental Problems in Computer Science
Fundamental Problems in Computer Science
Fainzilberg, L.S.
Dykach, Ju.R.
Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications
Control systems & computers
description Six decision rules based on the analysis of the Levenshtein distances and the number of occurrences of characteristic patterns of codewords are considered and investigated. It has been shown that using of the developed decision rules makes it possible to increase the sensitivity and specificity of diagnos-tics even in cases when the ECG does not show traditional electrocardiological signs of myocardial ischemia..
format Article
author Fainzilberg, L.S.
Dykach, Ju.R.
author_facet Fainzilberg, L.S.
Dykach, Ju.R.
author_sort Fainzilberg, L.S.
title Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications
title_short Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications
title_full Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications
title_fullStr Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications
title_full_unstemmed Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications
title_sort development of linguistic approach to the problem of the computer electro-cardiogram's classifications
publisher Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України
publishDate 2021
topic_facet Fundamental Problems in Computer Science
url http://dspace.nbuv.gov.ua/handle/123456789/181260
citation_txt Development of Linguistic Approach to the Problem of the Computer Electro-cardiogram's Classifications / L.S. Fainzilberg, Ju.R. Dykach // Control systems & computers. — 2021. — № 2-3. — С. 28-39. — Бібліогр.: 12 назв. — англ.
series Control systems & computers
work_keys_str_mv AT fainzilbergls developmentoflinguisticapproachtotheproblemofthecomputerelectrocardiogramsclassifications
AT dykachjur developmentoflinguisticapproachtotheproblemofthecomputerelectrocardiogramsclassifications
first_indexed 2025-07-15T22:06:49Z
last_indexed 2025-07-15T22:06:49Z
_version_ 1837752355551969280
fulltext 28  iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3 doi https://doi.org/10.15407/csc.2021.02.028 Udc 004.001 l.s. FAInZIlBerG, dr.Sc. (eng.), professor, chief researcher, international research and training center for information technologies and Systems of the NaS and meS of Ukraine, acad. glushkova ave., 40, kyiv, 03187, Ukraine, fainzilberg@gmail.com Ju.r. DyKACH, Student of biomedical engineering faculty, the National technical University of Ukraine «igor Sikorsky kyiv polytechnic institute», 37, peremohy ave., kyiv, 03056, Ukraine, jul.dykach@gmail.com DeVelopment oF lInGuIstIC ApproACH   to tHe proBlem oF tHe Computer   eleCtroCArDIoGrAm's ClAssIFICAtIons Six decision rules based on the analysis of the Levenshtein distances and the number of occurrences of characteristic patterns of codewords are considered and investigated. It has been shown that using of the developed decision rules makes it possible to increase the sensitivity and specificity of diagnos-tics even in cases when the ECG does not show traditional electrocardiological signs of myocardial ischemia.. Keywords: ECG, the Levenshtein distance, occurrence frequency of a substring in a code word, decision rule. Introduction For more than a hundred years, electrocardiogra- phy has been widely used in cardiological practice to diagnose diseases of the cardiovascular system . However, it is known that the traditional approach to the analysis and interpretation of the ECG does not always provide the required reliability of diag- nostic decisions . So, for example, according to the medical statistics [1] resting ECG, assessed accor- ding to generally accepted criteria, remains normal in almost 50% of patients with chronic coronary artery disease (СAD) . Therefore, experts are ac- tively exploring new approaches to computerized ECG processing . One of these new approaches is an intelligent method of ECG processing, called “fasegraphy” . It was developed at the International Research and Training Center for Information Technologies and Systems of the National Academy of Sciences of Ukraine and Ministry of Education and Science of Ukraine [2] . The method is based on the transi- tion from a scalar signal x(t) to a vector signal on the phase plane x(t), ( )x t� , where ( )x t� is the rate of change in the electrical activity of the heart, which is determined on the basis of original computati- onal procedures according to the signal x(t) re- corded in the standard lead, for example, in the first standard lead (left and right hand) . Large-scale clinical trials have shown that the fasegraphy method provides an increase in the reli- ability of detecting latent signs of myocardial ische- mia even in those cases when the generally accep- ted electrocardiographic signs of СAD (depression or elevation of the isoelectric line) are absent in all 12 traditional leads . This is achieved through the iSSN 2706-8145, control systems and computers, 2021, № 2–3 29 Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications use of new diagnostic ECG indicators in the phase space, in particular, the parameter β T characteri- zing the symmetry of the repolarization area on the phase plane [3, 4] . Further studies have shown that not only the ave- rage value β T , but also the dynamics of its change from cycle to cycle has diagnostic value . In the sim- plest case, the variability of the indicator Tβ cha- racterizes the mean square deviation of the RMS β T . For a more subtle analysis of the change in the indicator β T , it turned out to be useful to calculate the entropy estimates of the signal, in particular, the modified permutation entropy [5] . An effective method for assessing the dynamics of changes in the shape of ECG cycles is based on the use of a linguistic approach to processing cyclic signals . This approach is based on the transition from the observed ECG to a sequence of symbols (word), which uniquely encodes the ECG [6] . The purpose of this article is a further examina- tion of this method . Basic method   of linguistic Analysis   and eCG Interpretation Let`s first consider the approach to ECG analy- sis proposed in [6], which we will need in further studies . Using a microprocessor sensor with finger electrodes, the ECG signal x(t) of the first standard lead is recorded (Fig . 1) . For each i cycle (i = 1, . . ., N) of the digitized sig- nal x(t), using special computational procedures implemented in the fasegraphy method, the dura- tions of the cycles (RR i -intervals) and the values of the mentioned original indicator β T, i are deter- mined . Further, the dynamics of these indicators is as- sessed in the process of ECG registration . For this purpose, indicator variables are introduced 1( ) 1 1, if 0, 1, if 0. i iRR i i i RR RR V RR R − − + − > = − − > (1) , , 1(β) , 1 1, if β β 0, 1, if β β , 0, T i T ii i T i T i V − − + − >= − − > (2) where i = 2, . . ., N . iRR iTβ Sequences V i (R) and V i (β) allow you to encode each ECG cycle with one of the alphabet symbols A = {a, b, c, d} as follows (Table 1) . As a result, the N – 1 — digit word S k , composed of the symbols a, b, c, d, uniquely encodes the k-th processed ECG (Fig . 3) . The transition from the observed ECG to the code word makes it possible to use the methods of mathematical linguistics to solve the problem of the analysis and interpretation of the ECG . In particular, the proposed method provides for an assessment of the proximity L(Sµ, Sν) between the codewords Sµ, Sν, of processed ECGs based on the editorial distance L(Sµ, Sν) — the Levenshtein dis- Fig. 1. Microprocessor ECG recorder1 1 Sensor developed by Solvaig, J .S .C . (Kyiv) https://solvaig .com/fasegraphy Fig. 2 . ECG indicators Indicator variable value Vi (RR) +1 +1 -1 -1 Indicator variable value Vi (βT) +1 -1 +1 -1 Symbol а b c d Table 1. Principle of ECG cycle coding 30  iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3 L.S. Fainzilberg, Ju.R. Dykach tance, which is defined as the minimum number of editing operations (insertion, deletion, and re- placement of a symbol) that ensures the transition from word Sµ to word Sν [9] . To calculate the Levenshtein distance, the Wagner-Fischer algorithm [10], based on the dy- namic programming method, is used . Table 2 shows the optimal path to transition from the word, Sµ = ddabdcbadcbadca (3) to the word Sν = bacdaaacdadccbb . (4) The closeness of these words assesses the Leven- shtein distance L(Sµ, Sν) = 10 . Electrocardiogram Parameters iRR , iT ,β , ( Ni ,...,1= ) Indicator variables )(RR iV , )(β iV , ( 1,...,1 −= Ni ) Coding word ddabdcbadcbadca ECG classification based on the proposed ap- proach assumes: • construction of class standards based on the Levenshtein distances between pairs of code words of the training set; • comparison of the code word of the processed ECG with the standards . The algorithm for constructing standards is as follows . Let, as a result of the experiments, Q CAD electrocardiograms of patients with coronary artery disease (CAD) were recorded, which, in accordance with tables 1, are coded with words S q (CAD), q = 1, . . . . . ., Q CAD . Let`s determine the Levenshtein distances ( ) ( )( , )CAD CADL S Sµν µ ν between each pair ( ) ( ),CAD CADS Sµ ν , µ = 1, . . ., Q CAD , ν = 1, . . ., M 1 of the indicated words and form a square Q CAD × QCAD matrix of distances ( ) ( )( , )CAD CADL S Sµν µ ν , µ = 1, . . ., Q CAD , ν = 1, . . ., Q CAD : . (5) Then the CAD patient's reference word will de- termine the row of the matrix (5), the sum of the elements of which is minimal, i .e . . (6) The reference word of the healthy group (Healthy) is determined in a similar way by the elements of the Levenshtein distance matrix ( ) ( )( )μν μ ν, Healthy HealthyL S S , Fig. 3. The principle of forming a code word Step Original word Operation Result of editing 1 Sµ = ddabdcbadcbadca Replacement d → b S = bdabdcbadcbadca 2 S = bdabdcbadcbadca Deleting d S = babdcbadcbadca 3 S = babdcbadcbadca Replacement b → c S = bacdcbadcbadca 4 S = bacdcbadcbadca Deleting с S = bacdbadcbadca 5 S = bacdbadcbadca Replacement b → a S = bacdaadcbadca 6 S = bacdaadcbadca Replacement d → a S = bacdaaacbadca 7 S = bacdaaacbadca Replacement b → d S = bacdaaacdadca 8 S = bacdaaacdadca Replacement a → c S = bacdaaacdadcc 9 S = bacdaaacdadcc Insert b S = bacdaaacdadccb 10 S = bacdaaacdadccb Insert b Sν = bacdaaacdadccbb Table 2. Optimal transition from word to word 11 12 1 21 22 2 1 2 CAD CAD CAD CAD Q Q CAD CAD Q Q L L L L L L L L L      Λ =        … … … … … … … ( ) 0 1 1 argmin CAD CAD Q CAD Q S Lµ ν ν µ≤ ≤ = = ∑ iSSN 2706-8145, control systems and computers, 2021, № 2–3 31 Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications µ = 1, . . ., Q Healthy , ν = 1, . . ., Q Healthy , built for all pairs of codewords of the control group, i .e . ( ) 0 μν 1 ν μ 1 arg min . Healthy Healthy Q Healthy Q S L ≤ ≤ = = ∑ (7) Reference code words (6), (7) allow to classify the analyzed ECG based on the comparison of the Levenshtein distances between its code word tS and reference words S 0 (CAD) and S 0 (Healthy) using the following decision rule: Decision rule 1: CAD, if , (8) Healthy, if . . (9) The study of the diagnostic capabilities of the proposed method was carried out on the basis of real ECGs registered at the Department of Ische- mic Heart Diseases of the V .D . Strazhesko AMS of Ukraine (Kyiv) and four German clinics: Essen University Hospital (Essen), Katholical Hospital “Phillpusstift” (Essen), Heart and Diabetes Center of North Rhein-Weasfalia (Bad-Oeynhausen), German Heart Center (Berlin) . The clinical material consisted of 100 ECG records of patients with coronary artery disease (CAD), the diagnosis of which was previously es- tablished based on the results of coronary angiog- raphy, and 100 ECG records of healthy volunteers included in the control group . It is important to note that the training set included only those ECGs in which traditional electrocardiographic signs of coronary artery disease (flat or negative wave T, de- pression or isoelectric line elevation) were absent . In other words, from the point of view of traditio- nal cardiology, all ECGs, including those of verified patients, would be classified as healthy (Healthy) . Table 3 shows a fragment of the database of code words built for the ECG of the training set . ( ) ( ) 0 0( , ) ( , )CAD Healthy t tL S S L S S≤ CAD Healthy Table 3. ECG training set code words ( ) ( ) 0 0( , ) ( , )CAD Healthy t tL S S L S S> 32  iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3 L.S. Fainzilberg, Ju.R. Dykach Using the training set according to formulas (6), (7), reference code words of the indicated classes were determined: S 0 (CAD) = adcbdadcadabdabcadabdadcbdab, S 0 (Healthy) = cbcdcdabdcabddcaadcaa . For illustration, we present the results of the ECG assessment of a verified sick patient (male, 69 years old) whose codogram had the form S t (1) = adcabdadcadabdaddabdaadabdbdda and a representative of the control group — a 54-year-old man whose codogram looked like, S t (2) = bdcbbcdcabcdcabcdcbaa . It is easy to verify that L(S t (1) , S 0 (CAD)) = 13 and L(S t (1) , S 0 (Healthy)) = 15, i .e . L(S t (1) , S 0 (CAD)) < L(S t (1) , S 0 (Healthy)) and in accordance with rule (8), the subject was as- signed to the CAD group . Similarly, for the second subject we have L(S t (2) , S 0 (CAD)) = 14 and L(S t (1) , S 0 (Healthy)) = 8, i .e ., L(S t (1) , S 0 (CAD)) > L(S t (1) , S 0 (Healthy)) and in accordance with rule (9) the subject was as- signed to the healthy group . Clinical studies have shown that, despite the presence of traditional electrocardiographic signs on the ECG of patients with coronary artery di- sease (CAD), decision rule 1 provides sensitivity S E = 72% and specificity S P = 79% . Fig . 4 presents estimates of the conditional dis- tributions of Levenshtein distances with respect to the reference codograms of the sick P(L(S t , S 0 (CAD))) and the Healthy P(L(S t , S 0 (Healthy))) . Checking the hypothesis about the homogene- ity of conditional distributions P(L(S t , S 0 (CAD))) and P(L(S t , S 0 (Healthy))) according to the Kolmogorov- Smirnov criterion showed, that with high statistical significance (p < 0,001) the hypothesis of equality of distributions must be rejected . A similar fact was confirmed by the Man-Whitney test for indepen- dent samples . Statistically significant difference in conditional distributions P(L(S t , S 0 (CAD))) and P(L(S t , S 0 (Healthy))) allows to hypothesize that, the Levenshtein dis- tance is not only a useful diagnostic feature, but also beneficial in combination with other diagnos- tic features [11] . Therefore, the next stage of our research was aimed at finding additional diagnostic signs that would increase the sensitivity and speci- ficity of the decision rule (8), (9) . extension of the Basic method According to [12], the probability of the appea- rance of symbols in code words carries valuable information in the linguistic analysis of physiologi- cal signals . Based on this idea, lets introduce into consideration three types of patterns, which are substrings of code words of the training set: π• 1 = x, x ∈ {a, b, c, d} — one-cha- racter pattern; π• 2 = xy, x, y ∈ {a, b, c, d} — two- character pattern; π• 3 = xyz, x, y, z ∈ {a, b, c, d} — three-character pattern . Based on the data of the training sample, we calculate the average frequencies P(Healthy)(π1) and P(CAD)(π1) of the appearance of single-symbol pat- terns in the ECG code words of healthy and sick groups: ( ) ( ) ( )1 1 1 π1π , HealthyQ Healthy i iHealthy i G P Q W= = ∑ (10) ( ) ( ) ( )1 2 1 π1π , CADQ CAD i iCADA i G P Q W= = ∑ (11) where G(π1) — the number of occurrences of a two-character pattern π1 = x, x ∈ {a, b, c, d} in the i-th codeword, W 1 — the total number of charac- ters in the i-th codeword, Q Healthy и Q CAD — number of ECGs of healthy and CAD in the training set . The results of evaluating the frequency P(Healthy)(π1) and P(CAD)(π1) of occurrence of sin- gle-character patterns in groups are summarized in Table 4 . To assess the statistical significance (p-value of deviations in the mean frequency P(Healthy)(π1) and P(CAD)(π1) was used the Student's test, since checking by the Kolmogorov-Smirnov test confirmed the normal distribution of the ap- pearance of patterns π1 in the codewords . iSSN 2706-8145, control systems and computers, 2021, № 2–3 33 Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications ),( )( 0 CAD t SSL ),( )( 0 Healthy t SSL Fig. 6. Conditional distributions of the Levenshtein dis- tances to the reference codograms CAD and Healthy Table 4 shows that the one-character pattern π1 = d occurs in the codewords of healthy patients and patients with CAD with different probabilities . This fact made it possible to formulate the second decisive rule: Decision rule 2: CAD, if , (12) Healthy, if ( ) ( )( ) ( ) ( ) ( )CAD Healthyt t t t G d G dP d P d W W − > − , (13) Else uncertain . where G t (d) — number of occurrences of a charac- ter x = d into the analyzed codeword of length W t , а P(CAD) (d) ≈ 0,311 and P(Healthy) (d) ≈ 0,286 — esti- mating the probabilities of a one-character pattern π1 = d in the codewords of the corresponding groups . However, the single-character pattern π = d pro- vides low reliability of the decisions made: the de- cision rule (12), (13) provided sensitivity S E = 59% and specificity S p = 58% . Therefore, the diagnostic value of the decision rule that was investigated, which allows to make a decision with the Levenshtein distances together with an estimate of the number of occurrences . G t (d) pattern d=π into the analyzed codeword . Thus, it somewhat improves the decision rule 1 . Decision rule 3: CAD, if L(S t , S 0 (CAD)) ≤ L(S t , S 0 (Healthy)) AND ( ) ( )1 1( ) ( ) ( ) ( )CAD Healthyt t t t G GP d P d W W π π − ≤ − , (14) Healthy, if L(S t , S 0 (CAD)) > L(S t , S 0 (Healthy)) AND Pattern P(Healthy)(π1) P(CAD)(π1) p-value a 0,273275 0,274386 0,907793 b 0,203286 0,196676 0,504594 c 0,236435 0,217466 0,082603 d 0,286128 0,311056 0,024375 Table 4. Estimating the frequencies of occurrence of one-symbol patterns ( ) ( )1 1( ) ( ) ( ) ( )CAD Healthyt t t t G GP d P d W W π π − ≤ − , (15) Else uncertain. The rule (14), (15) provided decision with sensi- tivity S E = 77,2% and specificity S p = 86,2% . Consider the possibility of improving the deci- sion rule by analyzing two-symbol patterns π2 = xy, x, y ∈ {a, b, c, d} . Table 5 presents the results of assessing the probability of the appearance of such patterns in the words of the training sample . Since the variety of two-pattern π2 characters is greater than that of single-pattern π1, characters, the frequency of occurrence π2 is low . Nevertheless, as can be seen from Table 5, the two-character pattern π2 = ab is almost twice as common in the ( ) ( )( ) ( ) ( ) ( )CAD Healthyt t t t G d G dP d P d W W − ≤ − 34  iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3 L.S. Fainzilberg, Ju.R. Dykach codewords of healthy patients than in the group of CAD patients, moreover, these differences are statistically significant with high reliability (p-value = 0,0001) . Taking this fact into account, we formulate the decision rule, based on joint analysis of the Levenshtein distances and pattern frequency π2 = ba . Decision rule 4: CAD, if L(S t , S 0 (CAD)) ≤ L(S t , S 0 (Healthy)) AND ( ) ( )( ) ( ) ( ) ( ) 1 1 CAD Healthyt t t t G ba G baP ba P ba W W − ≤ − − − , (16) Healthy, if L(S t , S 0 (CAD)) > L(S t , S 0 (Healthy)) AND ( ) ( )( ) ( ) ( ) ( ) 1 1 CAD Healthyt t t t G ba G baP ba P ba W W − > − − − , ( 17) Else uncertain, where G t (ab) — the number of occurrences of a two-character pattern π2 = ba into the analyzed codeword of length W t . Pattern P(Healthy) |(π2) P(CAD) |(π2) p-value aa 0,059126 0,038845 0,000189 ab 0,091659 0,093691 0,400061 ac 0,025715 0,019925 0,154795 ad 0,094804 0,118936 0,012481 ba 0,048478 0,027195 0,00001 bb 0,02254 0,015407 0,024597 bc 0,059796 0,063595 0,385212 bd 0,071602 0,084538 0,166476 ca 0,082968 0,086643 0,376135 cb 0,065612 0,064056 0,208652 cc 0,026535 0,020845 0,087007 cd 0,0544 0,044895 0,010992 da 0,079259 0,120013 0,000026 db 0,021287 0,022272 0,453138 dc 0,123771 0,111342 0,044878 dd 0,057954 0,053647 0,050011 The rule (16), (17) provided decision with sen- sitivity S E = 77,3% and specificity S P = 89,1% for most of the training set codewords . Let’s consider the option of making a decision with a simultaneous estimation of the Levenstein distances and the number of occurrences of single- character and two-character patterns in a code word . Such decisions form decision rule 5 . Decision rule 5: CAD, if L(S t , S 0 (CAD)) ≤ L(S t , S 0 (Healthy)) AND ( ) ( )1 1( ) ( ) ( ) ( )CAD Healthyt t t t G GP d P d W W π π − ≤ − OR ( ) ( )( ) ( ) ( ) ( ) 1 1 CAD Healthyt t t t G ba G baP ba P ba W W − ≤ − − − (18) Healthy, if L(S t , S 0 (CAD)) > L(S t , S 0 (Healthy)) AND ( ) ( )( ) ( ) ( ) ( ) 1 1 CAD Healthyt t t t G d G dP d P d W W − ≥ − − − Table 5. Estimating the frequencies of occurrence of two-symbol patterns iSSN 2706-8145, control systems and computers, 2021, № 2–3 35 Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications OR ( ) ( )( ) ( ) ( ) ( ) 1 1 CAD Healthyt t t t G ba G baP ba P ba W W − ≥ − − − . (19) Else uncertain. Using rule (18), (19) allows making the deci- sions with sensitivity S E = 76,5% and specificity S P = 84,7% for more than 80 % ECG of trai- ning set . Finally, let’s consider the diagnostic capabilities of the decision rule, which provides for the analysis of three-symbol patterns π3 = xyz, x, y, z ∈ {a, b, c, d} . Table 6 shows the results confirming statisti- cally significant (p < 0,05) differences in the proba- bility of the appearance of such patterns in words corresponding to the first and second groups of the training sample . Table 6 shows that the pattern π3 = dad with high statistical significance is more typical for the group of patients with CAD, and the pattern π3 = caa is more typical for the group of healthy . For illustration, fig . 7 shows parts of real ECGs that generate these patterns in code words . And al- though the presented fragments are visually almost indistinguishable, the proposed ECG process- ing algorithm provides an unambiguous assign- ment of such fragments to the pattern . π3 = dad or π3 = caa . This allowed to propose another decision rule based on the analysis of the Levenshtein distances and the number of occurrences of three-character patterns in the codeword . Decision rule 6: CAD, if L(S t , S 0 (CAD)) ≤ L(S t , S 0 (Healthy)) AND G t (dad) ≥ G t (caa), (20) Healthy, if L(S t , S 0 (CAD)) > L(S t , S 0 (Healthy)) AND G t (dad) ≤ G t (caa), (21) Else uncertain, where G t (dad) — the number of occurrences of a three-character pattern π3 = dad into the analyzed codeword of length W t , а G t (caa) — the number of Pattern P(Healthy) |(π3) P(Healthy) |(π3) p-value ada 0,009104 0,029465 0,000006 add 0,031545 0,025732 0,017603 baa 0,007682 0,003707 0,045142 bad 0,023236 0,012264 0,00159 bca 0,013526 0,02479 0,003743 bda 0,021176 0,042309 0,000412 bdc 0,034448 0,025401 0,003428 caa 0,01891 0,007792 0,002103 cab 0,034292 0,026427 0,015428 cad 0,025595 0,051444 0,00015 cba 0,023889 0,011414 0,000466 cbb 0,012348 0,006269 0,021577 cbd 0,018828 0,028515 0,027272 cdc 0,019029 0,010455 0,009801 dab 0,028581 0,052221 0,00127 dad 0,01142 0,031005 0,000021 dba 0,009673 0,007151 0,11723 Table 6. Estimating the frequencies of occurrence of three-symbol patterns 36  iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3 L.S. Fainzilberg, Ju.R. Dykach dad caa Decision rules S E , % S P ,% Percentage of rejections, % 1 72 79 0 2 59 58 0 3 77,2 86,2 56 4 77,3 89,1 39,5 5 76,5 84,7 21,5 6 74,7 79,5 12,5 0 10 20 30 40 50 60 70 80 90 100 rule 1 rule 2 rule 3 rule 4 rule 5 rule 6 Se Sp Percentage of rejections Table 7. Operational characteristics of decision rules Fig. 8. Comparative characteristics of the developed decision rules occurrences of a three-character pattern π3 = caa the number of occurrences of a three-character pattern W t . Rule (20), (21) provided decision with sensi- tivity S E = 74,7% and specificity S P = 79,5%, and the number of decisions increased to 87,5% . In the table 7 and fig . 8 summarized the results of assessing the diagnostic capabilities of the deve- loped decision rules . Fig. 7 . Patterns for group СAD (dad ) and healthy group (caa) iSSN 2706-8145, control systems and computers, 2021, № 2–3 37 Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications ConClusIons The developed approach is based on the analysis of the dynamics of two ECG indicators calculated on the sequence of cardiac cycles . The first indica- tor (traditional) represents the duration of an in- dividual cycle, and the second (original) indicator characterizes the symmetry of the T wave . The proposed method is based on the transi- tion from the calculated sequence of the specified indicators to the codeword encoding the analyzed ECG . This coding method made it possible to use the techniques of mathematical linguistics to solve the problem of analyzing and interpreting real ECGs . Six decision rules based on the analysis of the Levenshtein distances and the number of occur- rences of characteristic patterns of codewords are considered and investigated . Based on the carried out studies, it is researched that the proposed ap- proach allows obtaining additional diagnostic in- formation on real ECGs, which are missing the electrocardiographic signs of CAD, adopted in tra- ditional electrocardiology . REFERENCES Connolly D . C ., Elveback L . R ., Oxman H . A ., 1984 . “Coronary heart disease in residents of Rochester, Minnesota . IV . 1 . Prognostic value of the resting electrocardiogram at the time of initial diagnosis of angina pectoris”, Mayo . Clin . Proc ., 59, pp . 247–250 . DOI: 10 .1016/s0025-6196(12)61257-9 . Gritsenko V . I ., Fainzilberg L . S ., 2019 . Intellektualnyye informatsionnyye tekhnologii v tsifrovoy meditsine na primere 2 . fazagrafii [Intelligent information technologies in digital medicine on the phase-graphy example], Naukova Dumka, Kyiv, 423 p . (In Russian) . Dyachuk D . D ., Kravchenko A . N ., Faynzilberg L . S ., Stanislavskaya S . S ., Korchinskaya Z . A ., Orikhovskaya K . B ., 3 . Pasko V . S ., Mikhalev K . A ., 2016 . “Skrining ishemii miokarda metodom otsenki fazy repolyarizatsii” [“Screening of myocardial ischemia by the method of assessing the phase of repolarization], Ukrainian Journal of Cardiology, 6, pp . 82–89 . (In Russian) . Dyachuk D . D ., Gritsenko V . I ., Fainzilberg L . S ., Kravchenko A . M ., et . al ., 2017 . “Zastosuvannya metodu fazahrafiyi 4 . pry provedenni skryninhu ishemichnoyi khvoroby sertsya” [“The use of the method of fasegraphy in the screening of coronary artery disease”], Methodological Recommendations of the Ministry of Health of Ukraine № 163 .16/13 .17, Ukrainian Center for Scientific Medical Information and Patent and License Work, Kyiv, 32 p . (In Ukrainian) . Fainzilberg L . S ., 2020 . “New Approaches to the Analysis and Interpretation of the Shape of Cyclic Signals”, Cybernetics 5 . and Systems Analysis, 56 (4), pp . 665–674 . DOI: 10 .1007/s10559-020-00283-0 . Fainzilberg L . S ., Dykach Ju . R ., 2019 . “Linguistic approach for estimation of electrocardiograms’s subtle chang-6 . es based on the Levenstein distance”, Cybernetics and Computer Engineering, 2 (196), pp . 3–26 . DOI: 10 .15407/ kvt196 .02 .003 . Uspenskiy V . M ., 2012 . “Diagnostic System Based on the Information Analysis of Electrocardiogram”, Proceedings of 7 . Mediterranean Conference on Embedded Computing, MECO 2012, June 19–21, Montenegro, pp . 74–76 . Kolesnikova O . V ., Krivenko S . S ., 2018 . “Informatsiynyy analiz elektrokardiosyhnaliv: ob-hruntuvannya i mozhlyvosti” 8 . [“Information analysis of electrocardiosignals: rationale and possibilities”], Proceedings 1st International Scientific and Practical conference “Information systems and technologies in medicine”, ISM-2018, KHNURE, Kharkiv, pp . 161– 163 . (In Ukrainian) . Levenshteyn V . I ., 1965 . “Dvoichnyye kody s ispravleniyem vypadeniy, vstavok i zameshcheniy simvolov” [“Binary 9 . codes with correction of occurrences, inserts and symbol substitutions”], dokl . Academy of Sciences of the USSR, 163 (4), pp . 845–848 . (In Russian) . Wagner R . A ., Fischer M . J ., 1971 . “The String-to-String Correction Problem”, Journal of the ACM, 21 (1), pp . 168–10 . 173 . DOI: 10 .1145/321796 .321811 . Faynzilberg L . S ., 2010 . Matematicheskiye metody otsenki poleznosti diagnosticheskikh priznakov [Mathematical 11 . methods for evaluating the usefulness of diagnostic features], Osvita Ukrainy, Kyiv, 152 p . (In Russian) . Senkevich Yu . I ., 2008 . “Lingvisticheskiy analiz fiziologicheskikh signalov” [“Linguistic analysis of physiological sig-12 . nals”], Digital Signal Processing, 2, pp . 54–57 . (In Russian) . Received 06 .04 .2021 38  iSSN 2706-8145, системи керування та комп'ютери, 2021, № 2–3 L.S. Fainzilberg, Ju.R. Dykach ЛІТЕРАТУРА Connolly D . C ., Elveback L . R ., Oxman H . A . Coronary heart disease in residents of Rochester, Minnesota . IV . 1 . Prognostic value of the resting electrocardiogram at the time of initial diagnosis of angina pectoris . Mayo . Clin . Proc . 1984 . 59 . P . 247–250 . DOI: https://doi .org/10 .1016/s0025-6196(12)61257-9 . Гриценко В . И ., Файнзильберг Л . С . Интеллектуальные информационные технологии в цифровой медицине 2 . на примере фазаграфии . Киев : Наукова Думка, 2019 . 423 с . Дячук Д . Д ., Кравченко А . Н ., Файнзильберг Л . С ., Станиславская С . С ., Корчинская З . А ., Ориховская К . Б ., 3 . Пасько В . С ., Михалев К . А . Скрининг ишемии миокарда методом оценки фазы реполяризации . Український кардіологічний журнал . 2016 . 6 . C . 82–89 . Дячук Д . Д ., Гриценко В . І ., Файнзільберг Л . С ., Кравченко А . М . и др . Застосування методу фазаграфії при 4 . проведенні скринінгу ішемічної хвороби серця . Методичні рекомендації МОЗ України № 163 .16/13 .17 . Київ : Український центр наукової медичної інформації і патентно-ліцензійної роботи, 2017 . 32 с . Fainzilberg L . S . New Approaches to the Analysis and Interpretation of the Shape of Cyclic Signals . Cybernetics and 5 . Systems Analysis . 2020 . 56 (4) . P . 665–674 . DOI: https://doi .org/10 .1007/s10559-020-00283-0 . Fainzilberg L . S ., Dykach Ju . R . Linguistic approach for estimation of electrocardiograms’s subtle changes based on the 6 . Levenstein distance . Cybernetics and Computer Engineering . 2019 . 2 (196) . P . 3–26 . DOI: https://doi .org/10 .15407/ kvt196 .02 .003 . Uspenskiy V . M . Diagnostic System Based on the Information Analysis of Electrocardiogram . MECO 2012 : Proceedings 7 . of Mediterranean Conference on Embedded Computing (Montenegro, June 19–21) . 2012 . P . 74–76 . Колеснікова О . В ., Крівенко С . С . Інформаційний аналіз електрокардіосигналів: обгрунтування і можливості . 8 . ISM–2018 : збірник наукових праць Першої Міжнародної науково-практичної конференції «Інформаційні системи та технології в медицині» . Харків : ХНУРЕ, 2018 . С . 161–163 . Левенштейн В . И . Двоичные коды с исправлением выпадений, вставок и замещений символов : докл . АН 9 . СССР . 1965 . 163 (4) . С . 845–848 . Wagner R . A ., Fischer M . J . The String-to-String Correction Problem . Journal of the ACM . 1971 . 21 (1) . P . 168–173 . 10 . DOI: https://doi .org/10 .1145/321796 .321811 . Файнзильберг Л . С . Математические методы оценки полезности диагностических признаков . Киев : Освита 11 . Украины, 2010 . 152 с . Сенкевич Ю . И . Лингвистический анализ физиологических сигналов . Цифровая обработка сигналов . 2008 . 2 . 12 . С . 54–57 . Надійшла 06 .04 .2021 iSSN 2706-8145, control systems and computers, 2021, № 2–3 39 Development of Linguistic Approach to the Problem of the Computer Electrocardiogram's Classifications Л.С. Файнзільберг, доктор технічних наук, професор, головний науковий співробітник, Міжнародний науково-навчальний центр інформаційних технологій та систем НАН та МОН України, 03187, м . Київ, просп . Академіка Глушкова, 40, Україна, fainzilberg@gmail .com Ю.Р. Дикач, студентка факультету біомедичної інженерії, Національний техн . ун-т України «Київський політехнічний інститут імені Ігоря Сікорського» НТУУ «КПІ ім . І . Сікорського», 03056, м . Київ, просп . Перемоги, 37, Україна, jul .dykach@gmail .com РОЗВИТОК ЛІНГВІСТИЧНОГО ПІДХОДУ ДО ЗАДАЧІ КОМП’ЮТЕРНОЇ КЛАСИФІКАЦІЇ ЕЛЕКТРОКАРДІОГРАМ Вступ . Лінгвістичний підхід, що заснований на переході від спостережуваного циклічного сигналу до послідовності символів (кодового слова), які характеризують динаміку показників від циклу до циклу, дає змогу використовува- ти процедури математичної лінгвістики для підвищення достовірності прийнятих рішень . Мета статті — розширення діагностичних можливостей лінгвістичного підходу до аналізу та інтерпретації електрокардіограм (ЕКГ) . Методи . Кожен цикл ЕКГ кодують одним з чотирьох символів, що характеризують зміни двох показників: традиційного (тривалість циклу) і оригінального (симетрія ділянки реполяризації) . Результати . На основі обробки реальних клінічних даних верифікованих пацієнтів і здорових волонтерів побу- довано еталони хворих на хронічну форму ішемічної хвороби серця (ІХС) і здорових пацієнтів . Еталони розробле- но з використанням обчислювальних процедур, прийнятих в математичній лінгвістиці — відстані Левенштейна, що являє собою мінімальну кількість операцій редагування (вставки, видалення та заміни символу), що забезпечує перехід від одного слова до іншого і частоти входження підрядка в аналізоване слово . На основі цих процедур роз- роблено вирішальні правила, що дають змогу ухвалювати діагностичні рішення, виходячи з відстані Левенштейна до еталонів і частоти входження одно-, дво- і трисимвольних патернів в кодові слова . Встановлено, що поєднання цих двох методів розширює діагностичні можливості лінгвістичного підходу до аналізу та інтерпретації ЕКГ . Висновки . Показано, що застосування розроблених вирішальних правил дає змогу підвищити чутливість і специфічність діагностики навіть тоді, коли на ЕКГ відсутні традиційні електрокардіологічні ознаки ішемії міокарда . Ключові слова: ЕКГ, відстань Левенштейна, частота входження підрядка в кодове слово, вирішальне правило.