Визначення рівня пропаганди в оперних лібрето за допомогою засобів data mining та machine learning

The article presents an adapted multifactorial model that can be used to determine the level of propaganda in librettos to world operas. This model was created using the linear convolution method, for which eight indicators were selected that are most effective in identifying elements of propaganda...

Повний опис

Збережено в:
Бібліографічні деталі
Дата:2025
Автори: Dats, Iryna, Gavrilenko, Olena, Feshchenko, Kyrylo
Формат: Стаття
Мова:Англійська
Опубліковано: The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025
Теми:
Онлайн доступ:https://journal.iasa.kpi.ua/article/view/335973
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:System research and information technologies
Завантажити файл: Pdf

Репозитарії

System research and information technologies
_version_ 1866391928169299968
author Dats, Iryna
Gavrilenko, Olena
Feshchenko, Kyrylo
author_facet Dats, Iryna
Gavrilenko, Olena
Feshchenko, Kyrylo
author_sort Dats, Iryna
baseUrl_str http://journal.iasa.kpi.ua/oai
collection OJS
datestamp_date 2025-07-25T15:56:08Z
description The article presents an adapted multifactorial model that can be used to determine the level of propaganda in librettos to world operas. This model was created using the linear convolution method, for which eight indicators were selected that are most effective in identifying elements of propaganda in the text, taking into account the subject area's peculiarities. Each of the selected indicators was calculated using statistical analysis, data mining, and machine learning methods. As a result of applying the proposed method, the value function is calculated for each libretto, based on which a conclusion is made as to whether it contains elements of propaganda or not.
doi_str_mv 10.20535/SRIT.2308-8893.2025.2.05
first_indexed 2025-07-27T04:04:08Z
format Article
fulltext  Publisher IASA at the Igor Sikorsky Kyiv Polytechnic Institute, 2025 Системні дослідження та інформаційні технології, 2025, № 2 81 UDC 519.688; 004.89; 004.9; 78.071.4: 78.078(477)+316.74 DOI: 10.20535/SRIT.2308-8893.2025.2.05 DETERMINING THE LEVEL OF PROPAGANDA IN OPERA LIBRETTOS USING DATA MINING AND MACHINE LEARNING I. DATS, O. GAVRILENKO, K. FESHCHENKO Abstract. The article presents an adapted multifactorial model that can be used to determine the level of propaganda in librettos to world operas. This model was created using the linear convolution method, for which eight indicators were selected that are most effective in identifying elements of propaganda in the text, taking into account the subject area's peculiarities. Each of the selected indicators was calculated using statistical analysis, data mining, and machine learning methods. As a result of applying the proposed method, the value function is calculated for each libretto, based on which a conclusion is made as to whether it contains elements of propaganda or not. Keywords: art, propaganda, opera, libretto, multivariate model, statistical analysis, Data Mining, Machine Learning, information technology. INTRODUCTION Propaganda in art is the use of artistic forms to influence public opinion, shape ideas, and spread specific ideologies or political views. It can be both explicit and subtle, serving as an instrument of the state, religion, or social movements. When studying the factors that influence human opinions in various areas of activity, it is worth paying attention to the vast and diverse realm of “agitation” in art. Since classical times, this has included visual and monumental art; during the Renaissance, masterpieces carried propaganda of a new era for humanity. Later, theatrical art acquired a dual meaning, while musical compositions and cinema, with their strong emotional impact, took on a special role in global propaganda. The development of propaganda in art is based on:  The promotion of an individual or a collective’s creative activity (promo- tional advertising), which helped advance the careers of “useful” figures in the creative field.  The involvement of specialists in the propaganda of artistic products, where musical content and literary foundations contributed to patriotic songwrit- ing (particularly from the perspective of socialist state leaders). Quite often, musical works used for propaganda incorporated compositions by other composers or folk songs, embedding entirely new meanings into them. For example: The anthem of the USSR (at least its musical material) was taken from My- kola Lysenko’s “Epic Fragment”, whose impact and emotional depth made it highly suitable for Soviet state propaganda. The agitational song “Far Beyond the River”, which fully adopted a Ukrain- ian insurgent song about a fallen hero, was repurposed by the Red Army to pro- mote the fight against what they considered old and bourgeois elements. I. Dats, O. Gavrilenko, K. Feshchenko ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 82 These are just a few examples of musical works that, in addition to raising issues of plagiarism in music, also highlight the problem of identifying propaganda. Due to the vast diversity of art forms, this article focuses on the propagandistic impact on opera audiences, considering opera as a genre with a long history, an elite form of art, and a significant part of world culture. Propaganda in opera has been particularly evident in productions staged in China [1], Nazi Germany, and the Soviet Union. For example, the works of Rich- ard Wagner, which glorify ancient Germanic legends, were used to emphasize the superiority of the German nation and the Aryan race, reinforcing the ideology of world domination. Similarly, the soviet regime implemented propaganda slogans by repurpos- ing older russian operas and creating new, ideologically charged soviet works that praised and glorified the soviet government, its achievements, and way of life. In Ukraine: “The Death of the Squadron” by Yuliy Meitus, “Standard-Bearers” by Oleksandr Bilash. In russia: “In the Storm and Alpine Story” by Tikhon Khren- nikov, as well as the film-operetta “Wedding in Malinovka” and the film-musical “Three Fat Men”, among others. Another intriguing aspect of musical propaganda is its presence in modern advertising. Commercials often feature simple, easily memorable melodies con- sisting of just a few notes, making them instantly recognizable and associated with the promoted product or message. In instrumental, vocal, and stage music, propaganda can be embedded in an emphasized form, calling for specific conclu- sions or even radical actions. Overall, propaganda in opera has significant historical importance, particu- larly in societies where culture was used as a tool of ideological influence. As a synthesis of music, drama, and visual art, opera has a strong emotional impact, making it an effective medium for conveying political and ideological messages. Given the large volume of textual data in opera librettos and arias, identifying propagandistic elements requires advanced technologies. Therefore, addressing this issue necessitates the integration of artistic exper- tise, including the work of playwrights, directors, actors, composers, and poets, along with information technologies such as mathematical modeling, Data Min- ing, statistical analysis, and Machine Learning techniques. This combination will enable systematic detection of propaganda in opera librettos, providing new in- sights into how ideological messages are embedded in classical and modern oper- atic works. ANALYSIS OF LITERARY SOURCES AND PROBLEM STATEMENT Research on propaganda detection demonstrates a variety of approaches and conclusions in this field. Scholars are increasingly leveraging modern techniques, particularly machine learning models such as BERT and GPT–4, to analyze and detect propaganda in textual data streams. These models can identify and classify different propaganda techniques across various texts. Study [2] used a pre-trained BERT model to improve the detection of propa- ganda in news articles. The model processed text at the word level and integrated sentence-level features, effectively distinguishing between propagandistic and non-propagandistic content. However, issues such as data imbalance were identi- fied, leading researchers to employ methods like oversampling and data augmen- tation to address them. Determining the level of propaganda in opera librettos using data mining and machine learning Системні дослідження та інформаційні технології, 2025, № 2 83 Study [3] focused on annotating and detecting propaganda using GPT–4. The research involved a multi-stage annotation process to ensure high-quality da- ta, compiling a dataset of annotated paragraphs from diverse news sources to ana- lyze propaganda techniques across different topics. Study [4] examined the impact of propaganda on the political landscape in the U.S., revealing that disinformation in mass media significantly influenced so- cial discourse and policymaking. This study proposed further research through ontology construction based on interdisciplinary methods from computer science and social sciences. Study [5] conducted detailed text analysis, identifying 18 propaganda tech- niques in manually annotated news articles. The research also introduced a new BERT-based neural network to enhance propaganda detection. Study [6] presented a credibility assessment methodology for questionable information, using semantic similarity metrics on knowledge graphs to calculate the shortest paths between conceptual nodes. Study [7] explored the history and evolution of information warfare method- ologies, comparing American, British, and Russian models while introducing the concept of “semantic warfare” in the modern world. A crucial limitation of current machine learning models is their reliance on supervised learning, meaning they require human-labeled training datasets. This introduces an element of subjectivity, as the classification of certain texts as prop- aganda depends on human judgment. Additionally, social media plays a significant role in propaganda dissemina- tion today [8]. For example: Study [9] introduced the CatRevenge model, designed to identify active and passive revenge communication in social media, which aligns with propaganda detection. The model used Slangzy (an internet slang dictionary) for preprocess- ing, assigning TF–IDF-based weights to words and employing a CATBoost classi- fier to reduce overfitting. Study [10] investigated influential individuals in knowledge-sharing proc- esses within internal social networks, predicting future knowledge flow patterns and analyzing propaganda’s ideological impact through a four-phase methodol- ogy combining social network analysis and structural modeling. Study [11] analyzed how social media posts by influential figures affected cryptocurrency markets, highlighting an example of propaganda in commerce. Study [12] deals with the problem of detecting propaganda in text files. The authors consider methods for solving the problem of classifying textual information for spam filtering, contextual advertising, news categorization, and creating thematic catalogs. Study [13] presents a multifactorial model for determining the level of prop- aganda in a publication. The publications used were text news and social media posts. The model was created based on the linear convolution method. This model considered 10 indicators, a high level of each of which indicates the presence of propaganda in the publication. This model is based only on statistical data and calculations made using Data Mining, statistical analysis and Decision Theory algorithms. Study [14] provides an overview of multilingual models for working with limited data sets and analyzes their development. The following models are con- sidered: XLM–RoBERTa, mBERT, LASER, MUSE. I. Dats, O. Gavrilenko, K. Feshchenko ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 84 These studies emphasize the importance of using sophisticated Machine Learning, Statistical Analysis, Data Mining, and careful data annotation processes to detect and analyze propaganda. They provide valuable insights into method- ologies that can improve the accuracy and reliability of propaganda detection sys- tems, which is crucial for understanding and mitigating the impact of propaganda. It should be noted that the process of propaganda detection continues to re- quire the development of various mathematical models to better identify this form of communication. In addition, it should be emphasized that none of the proposed models has been used to identify propaganda in the musical and theatrical arts in general and opera in particular. The authors of this study propose a modified version of the Multi-Factor Propaganda Detection Model (MMDP) from Study [13], adapted specifically for evaluating propaganda levels in opera librettos. Additionally, the study examines how propaganda detection results from MMDP correlate with the assessments of opera experts. An information technology was developed to conduct experimental research. By integrating Machine Learning, Statistical Analysis, and Data Mining with artistic expertise, this study aims to fill the existing gap in identifying propa- ganda in opera as an elite and historically significant art form. OBJECTIVE AND TASKS OF THE RESEARCH The objective of this research is to adapt the MMDP [13] for processing and analyzing the libretto of world operas to identify signs of propaganda within them. To achieve this objective, the following tasks have been set:  Compile a dataset of libretto from well-known world operas that differs from the dataset presented in study [13].  Select from the 10 propaganda indicators outlined in study [13] those that are most relevant to the chosen artistic domain.  Improve methods for determining indicators that are characteristic of propaganda detection in publications.  Utilize the MMDP to calculate the level of propaganda content within the compiled dataset.  Draw conclusions regarding the presence of propaganda indicators in the libretto texts. MATERIALS AND METHODS OF RESEARCH The object of the study is the process of identifying propaganda in opera libretto (hereafter referred to as publications) based on an analysis of information about them. Specifically, the study considers the following factors:  Primary source of the publication (in this context, the literary work that served as the basis for the opera libretto).  Brief description of the primary source.  Word count in the publication.  Sentence count in the publication.  Syllable count in the publication. Determining the level of propaganda in opera librettos using data mining and machine learning Системні дослідження та інформаційні технології, 2025, № 2 85  Total number of opera productions currently available on streaming platforms.  Number of productions of operas based on libretto contained in the dataset.  Number of reviews of opera performances based on the libretto in the dataset.  Number of re-posts of the publication (in this context, the number of vid- eo recordings of the opera based on the given libretto on a streaming platform).  Number of likes under the video recording of the opera based on the giv- en libretto on a streaming platform.  Number of comments under the video recording of the opera based on the given libretto on a streaming platform.  Sources of re-posts (in this context, channels that share opera video re- cordings based on the given libretto). The set of publications and all necessary information for this research was obtained from [15]. The successful completion of the study requires both basic statistical data and data obtained using Data Mining and Machine Learning techniques Fig. 1 illustrates the main steps involved in determining the level of propaganda in publications. The proposed propaganda detection principle is based on calculating a metric that reflects the degree of correspondence between a given publication and pre-selected propaganda indicators. This is achieved using the convolution method. To compute the values of the indicators, the study employs statistical analy- sis methods, as well as Data Mining and Machine Learning techniques. Addition- ally, specialized software was developed for conducting intelligent analysis and obtaining results based on these methods. ADAPTATION OF A MULTIFACTOR MODEL FOR CALCULATING THE LEVEL OF PROPAGANDA IN OPERA LIBRETTO The process of constructing a multifactor model for calculating the level of prop- aganda in publications, based on the convolution method, can be outlined in the following stages [16; 17]. Stage 0: Preprocessing of the publication text. Stage 1: Calculation of numerical indicators for the model. Stage 2: Calculation of importance coefficients for each indicator. Stage 3: Calculation of the value function. Stage 4: Formulation of conclusions regarding whether the given publication is propagandistic. Fig. 1. Main Steps for Determining the Level of Propaganda in Publications I. Dats, O. Gavrilenko, K. Feshchenko ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 86 These stages are illustrated in Fig. 2. Step 0 Input: a set of publications. ),,( 1 lPPP  . Output: sets of words lAAA ,,, 21  used in the publications lPP ,,1  respectively. To form each set, it is recommended to preprocess the text of the publications using lemmatization and stemming processes. This helps reduce the size of the set by eliminating root-related words and auxiliary parts of speech. PHASE 1 Input: a set of publications ),,( 1 lPPP  a set of propaganda features ),,( 81 xxX  [13]: 1x {attempts to manipulate the audience}; 2x {the publication is aimed at evoking emotions}; 3x {frequent repetition of a specific idea in the publication}; 4x {frequent reposting of the publication}; 5x {simplicity of the publication’s text}; 6x {a high level of propaganda in the original source}; 7x {belonging to a specific topic that is particularly susceptible to propaganda}; 8x {the publication has an impact on the viewer}. It is necessary to calculate the levels of propaganda for each of the given features. At the output, a set is formed ),,( 81 jjj KKK  , where ;8,,1, iK j i lj ,,1 — the values of the metrics that indicate the level of propaganda in publication according to feature ix . 1. Calculation of the metric jK1 . A numerical assessment of manipulative attempts in texts can be based on methods of computational linguistics, sentiment analysis, lexical analysis, and machine learning. Emotional tone analysis. Manipulative texts often contain emotionally charged words (e.g., fear, threats, exaltation). Emotional dictionaries  VADERNRCLIWCetSentiWordN ,,, are widely used to determine the emotional tone of a text. For example, if a publication contains a negative tone (fear, anger), manipulation is possible. If a publication contains excessive positivity, propaganda is possible. If the negative emotion index is higher, and the aggregated sentiment score is too low, deliberate escalation is possible. Detection of logical fallacies and manipulative techniques. Manipulators use certain rhetorical techniques: Fig. 2. The process of building a multifactor model Determining the level of propaganda in opera librettos using data mining and machine learning Системні дослідження та інформаційні технології, 2025, № 2 87  Appeal to Fear (e.g., the phrase “Either you are mine, or death!” from G. Puccini’s opera Tosca).  False Dilemma (e.g., the phrase “Who does not fall at my feet will per- ish!” from G. Verdi’s opera Nabucco).  Ad Hominem (e.g., the phrase “God, who has placed a ray of His divinity within us, created man to rule!” from G. Verdi’s opera Don Carlos). Lexical patterns and Machine Learning methods are used to detect emotional fallacies. The model is trained on datasets containing labeled manipulative phrases. If a text contains excessively negative predictions, it may be an attempt at manipulation. Lexical Analysis: Frequency of Manipulative Constructions. Manipulative texts often contain:  Generalizations (“Everyone knows this!” from G. Verdi’s opera Rigoletto; “No sinner will escape God’s judgment!” from G. Verdi’s opera Don Carlos).  Evaluative Judgments (“There has never been a more ruthless tyrant!” from G. Verdi’s opera The Sicilian Vespers).  Appeals to Authority (“The law is the law!” from G. Puccini’s opera Tosca). If a text contains many generalizations and emotionally charged evaluative judgments, it may be manipulative. Text Style Analysis (Stylometry). Manipulative texts may contain a high number of exclamations, many interrogative sentences (rhetorical questions), as well as excessively long or very short sentences. Thus, score jK1 for a publication ljPj ,,1,  is calculated as follows: jjjjj CFLSK 11111  , (1) where jS — sentiment of the text, determined using a word dictionary with specific polarity (positive, negative, neutral) ( 1jS for a positive or negative tone, 0jS for neutral text); jL — relative frequency of manipulative clichés (lexical features) compared to their total variety; jF — relative frequency of logical fallacies (fallacies detection) compared to their total variety; jC – relative frequency of identified stylistic characteristics (stylometry) compared to their total variety; 1111 ,,,  — weight coefficients. In this study ;4,01  ;3,01  .1,0;2,0 11  The values of the weight coefficients, as well as those in the subsequent models, were chosen according to the specifics of the subject area and agreed upon with an expert — M.I. Hamkalo, director of a musical-dramatic theater and associate professor at the Tchaikovsky National Music Academy of Ukraine. It is evident that 10 1  jK , and the closer its value is to one, the more manipulative features the given publication contains. Thus, based on 1x criterion, it can be considered propagandistic. 2. Calculation of the Metric jK2 . The emotional orientation of a text indicates the extent to which it evokes specific emotions (fear, joy, anger, etc.). It can be assessed using the following approaches: 1. Sentiment Analysis. I. Dats, O. Gavrilenko, K. Feshchenko ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 88 2. Emotion Detection. 3. Lexical Analysis of Emotional Intensity. 4. Deep Learning ( NLP -models) (models: LSTMGPTBERT ,, ). In this study, sentiment analysis was used to evaluate the emotional orientation of the text. Thus, the metric jK2 for a publication ljPj ,,1,  is calculated as the overall emotional score: jjjj CESK 2222  , (2) where jS — sentiment of the text (this parameter was described earlier); jE — proportion of emotional words in the text; jC — аtext style analysis (this parameter was also described earlier); 222 ,,  — weight coefficients. In this study .2,0;3,0;5,0 222  It is evident that 10 2  jK and the closer its value is to one, the higher the level of emotional intensity in the given publication. Thus, based on this criterion, it can be considered propagandistic. 3. Calculation of the Metric jK3 . If an idea is expressed using different words, vector models  BERTVecWord ,2 can be used to find similar expressions. In this study, a vector model VecWord2 was used, with cosine similarity as the similarity measure. Thus, the metric jK3 for a publication ljPj ,,1,  is calculated as follows: ∣∣∣∣∣∣∣∣ jj jj j DB DB K )( )(cos3  , (3) where jB and jD — are vectors representing objects (word vectors extracted from the publication j ); )( jj DB — is the dot product of the vectors; ∣∣∣∣ jB , ∣∣∣∣ jD — are the magnitudes (norms) of the vectors; )(cos  — represents the cosine of the angle between the vectors. It is evident that 10 3  jK and the closer its value is to one, the more frequently a particular idea is repeated in the given publication. Thus, based on this criterion, it can be considered propagandistic. 4. Calculation of the Metric jK4 . The frequency of reposting a publication refers to the number of video recordings of opera performances based on the ana- lyzed libretto found on streaming platforms (Netflix, YouTube, etc.). Thus, the metric jK4 for a publication ljPj ,,1,  is calculated as the relative frequency of the opera performance’s jP on a streaming platform using the following formula [18; 19]: n n K jj 4 , (4) where jn — the number of video recordings of the opera based on the given libretto j ; n — the total number of operas found on the platform. Determining the level of propaganda in opera librettos using data mining and machine learning Системні дослідження та інформаційні технології, 2025, № 2 89 It is evident that 10 4  jK and the closer its value is to one, the more frequently the given publication is reposted. Thus, based on this criterion, it can be considered propagandistic. It should be noted that the accuracy of this metric jK4 depends on the choice of the streaming platform. The more popular the platform, the larger audience it covers within the study. On the other hand, major platforms require processing a large volume of statistical data, which may introduce additional complexities in calculating this metric. For example, on the OperaVision website [20], 264 video recordings of opera performances were found. G. Verdi’s opera Aida was represented in 8 videos. Thus, for the libretto of this opera, 03,04 jK . On other platforms, this metric may have a different value due to variations in statistical data. 5. Calculation of the Metric jK5 . This metric indicates the readability of the given publication’s text. The metric jK5 for a publication ljPj ,,1,  is calculated as follows: ,01,06,84015,1835,2065          j j j jj a c b a K (5) де ja — total number of words; jb — total number of sentences; jc — total number of syllables. This metric jK5 is known as the Flesch Reading Ease Index [21]. The interpretation of this metric’s values is shown in Table 1. T a b l e 1 . Interpretation of Flesch Reading Ease Index Values Score School level Notes 0,1–9,0 Grade 5 Very easy to read. Easily understood by an average 11-year-old student 8,0–9,0 Grade 6 Easy to read. Conversational language for consumers 7,0–8,0 Grade 7 Fairly easy to read 6,0–7,0 Grades 8-9 Standard language. Easily understood by 13–15-year-old students 5,0–6,0 Grades 10-12 Fairly difficult to read 3,0–5,0 College Difficult to read 1,0–3,0 Technical Graduate Very difficult to read. Best understood by university graduates 0,0–1,0 Professional Extremely difficult to read. Best understood by university graduates It is evident that 10 5  jK and the closer its value is to one, the easier the given publication is to read. Thus, based on this criterion, it can be recommended as propagandistic. 6. Calculation of the Metric jK6 . The primary source refers to the literary work that served as the basis for the libretto (publication). The metric jK6 for a publication ljPj ,,1,  is calculated as follows: I. Dats, O. Gavrilenko, K. Feshchenko ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 90 Step 1. Identify the primary source. Step 2. Find a brief description of this work. Step 3. Use a model VecWord2 to determine the key words from the text description. Step 4. Calculate the cosine similarity (equation 3) between the key word vector and a predefined reference vector. ;;;;;;( tragedyallianceenemytraitdangerfightQ  ;;;;; fameuniteglorypatriotndestructio );; unbeatableherorecretion . (6) In equation (6), the vector Q used in this study was constructed based on a set of words characteristic of propaganda detection. It was reviewed and approved by an expert — M.I. Hamkalo, director of a musical-dramatic theater and associate professor at the Tchaikovsky National Music Academy of Ukraine. This vector can be adjusted or modified depending on the specific subject area of analysis. It is evident that 10 6  jK , and the closer its value is to one, the higher the likelihood that the given publication has a propagandistic nature. Thus, based on criterion 6x , it can be considered propagandistic. As an example, we can consider the opera “The Golden Ring” by Ukrainian composer Borys Lyatoshynsky, based on the libretto by Yakiv Mamontiv, which was inspired by Ivan Franko’s novel “Zakhar Berkut”. It is well known that the novel contains a call to struggle against external and internal enemies. This leitmotif was transferred into the libretto and, consequently, into the opera. Thus, according to criterion 6x , the opera “The Golden Ring” exhibits prop- aganda elements. 7. Calculation of the Metric jK7 . Consider a publication ljPj ,,1,  ; the set of words used in the publication jA ; the set of topics );;;( 21 rsssS  ,in which propagandistic publications are most frequently found, and the dictionaries of characteristic words for these topics rTTT ;;; 21  . The topics and their corresponding dictionaries should be predefined. Some of these topics include:  Politics: “power”, “tyranny”, “monarchy”, “autocracy”, “rebellion”, “dis- cord”, “revolutionary movement”, “coup”, “betrayal”, “intrigue”, “enemies”, “opponents”,...  Military Conflicts: “army”, “legion”, “foreign rule”, “tyranny of conquer- ors”, “conquest”,...  Ideology: “people”, “nation”, “society”, “unity”, “solidarity”, “cohesion”, “alliance”, “threat”, “danger”, “monarchy”,...  Conspiracies and Disinformation: “the real truth”, “triumphant truth”, “secret conspiracy”, “treacherous plan”, “spies”, “accomplices”,... The metric jK7 indicates whether the publication jP belongs to one of the topics in the set S . It is calculated as follows: Step 1. Compute the Jaccard similarity coefficients between the set of words jA and each topic dictionary rkTk ,,2,1,  [22]: Determining the level of propaganda in opera librettos using data mining and machine learning Системні дослідження та інформаційні технології, 2025, № 2 91 kj kj kj TA TA TAJ   ),( ; (7) Step 2. Select the maximum Jaccard coefficient: ,),(),( ,,2,1 kj max rkkjmax TAJTAJ  lj ,,1 and establish which topic T corresponds to this maximum value Ssk  . Step 3. The metric jK7 is defined as: ),(max7 kj j TAJK  . (8) It is evident that 10 7  jK and the closer its value is to one, the more closely the publication aligns with topics that are most susceptible to propaganda. Thus, based on this criterion 7x it can be considered propagandistic. As an example, we can again consider the opera “The Golden Ring”. The libretto of this opera can be categorized under the “Ideology” topic, which is frequently influenced by propaganda. Therefore, for this libretto (publication), the value of the metric jK7 is quite close to 1. 8. Calculation of the Metric jK8 . To assess the audience reach and its impact, the overall score is calculated as follows: 3322118 XXXK j  , (9) where 1X — relative number of likes to the total number of opera views; 2X — proportion of opera views relative to the most popular opera in the dataset; 3X — relative number of comments to the total number of opera views; 321 ,,  — weight coefficients. It is evident that 10 8  jK and the closer its value is to one, the greater the level of influence the given publication has on the audience. Thus, based on criterion 8x , it can be considered propagandistic. It should be noted that the accuracy of the metric jK8 similar to jK4 depends on the choice of the streaming platform. PHASE 2 Input: indicators ljiK j i ,,1;8,,1,  , calculated using formulas (1)–(9). It is necessary to calculate importance coefficients for each criterion to determine the value function. Output: сoefficients i . To compute these coefficients i , the following steps must be performed: Step 1: Form statistical samples from the indicators j iK with corresponding names. Step 2: Select a threshold value, exceeding which a publication can be considered propagandistic. I. Dats, O. Gavrilenko, K. Feshchenko ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 92 In this study, by analogy with Chaddock’s scale [18; 19], which defines the strength of correlation between two random variables, the following scaling was proposed: 1,00,0  — no propaganda; 3,01,0  — low level of propaganda; 5,03,0  — noticeable level of propaganda; 7,05,0  — moderate level of propaganda; 9,07,0  — high level of propaganda; 0,19,0  — very high level of propaganda. In this study, all levels of propaganda starting from the noticeable level were considered. Thus, the threshold value was set at .3,0iK This threshold was introduced to facilitate further statistical calculations and ensure the convenient comparison of results with expert opinions. It should be noted that no universally defined percentage threshold exists in scientific sources that explicitly determines when a text is considered propagan- distic [23]. This study emphasizes the importance of qualitative analysis and the recognition of specific influence techniques rather than establishing a universal quantitative threshold. In future research, a more personalized approach is planned for each propa- ganda characteristic. Step 3: If i j i KK  , the given publication jP is considered propagandistic based on the feature ix . Otherwise, it is classified as non-propagandistic. Each publication is assigned the value »1« , if it is propaganda based on this feature, and »0« otherwise.        ., , if0 ;if1~ i j i K i j ij ij K KKKP The transition from quantitative values j iK to boolean functions j iK ~ was made to facilitate the comparison of results with expert opinions. Step 4: Calculate the Relative Frequency of Propagandistic Publications for Each Feature ix . n m w i i  , where im — the number of propagandistic publications based on feature ix ; n — the total number of publications in the dataset. Step 5: Normalize the Relative Frequencies iw : 821 www wi i   . PHASE 3 Input: a set of publications ,),,( 1 lPPP  indicators ,j iK ;8,,1i lj ,,1 and coefficients .i It is necessary to calculate the value function for each publication to determine the presence of propaganda features. Determining the level of propaganda in opera librettos using data mining and machine learning Системні дослідження та інформаційні технології, 2025, № 2 93 Output: the value function result jV . The value function jV , s computed using the linear aggregation method as follows [16; 17]: .)( 8 1 j ii i j KV   (10) Based on the values of jV a statistical sample of value function results is formed according to equation (10): .),,,( ,21 lVVVV  PHASE 4 Input: a set of publications ),,( 1 lPPP  and a statistical sample ),,,( ,21 lVVVV  (see Step 3). Output: conclusions regarding which publications ),,( 1 lPPP  єare prop- agandistic. Recommendations are made according to the following rule [24]:  If VV j  , ( ),,1 lj  , then the publication jP is recommended as prop- agandistic.  If VV j  , ( ),,1 lj  , then the publication jP is not recommended as propagandistic. In this rule 3,0V — is the threshold value for the sample V (analogous to Step 2).        ., , if0 ;if1~ i i j ij ij KK KKKP j i Thus, a publication is assigned »1« , if it is considered propaganda and »0« otherwise. The correctness of the provided conclusions is evaluated using the Recall та Precision metrics: fptp tp Precision   , fntp tp Recall   , where tp — the number of correctly identified propagandistic publications (true positives); fp — the number of incorrectly identified propagandistic publications (false positives); fn — the number of incorrectly identified non-propagandistic publications (false negatives). OBTAINED RESULTS As part of this study, a dataset was compiled, containing the librettos of 10 operas (Table 2). I. Dats, O. Gavrilenko, K. Feshchenko ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 94 For these operas, the value function was calculated, based on which conclusions were drawn regarding the presence of propaganda elements in their librettos. T a b l e 2 . Compiled Dataset Libretto Opera Title Composer 1P The Huguenots Giacomo Meyerbeer 2P The Mastersingers of Nuremberg Richard Wagner 3P Fidelio Ludwig van Beethoven 4P The Troubadour Giuseppe Verdi 5P A Life for the Tsarc Mikhail Glinka 6P La Traviata Giuseppe Verdi 7P Carmen Georges Bizet 8P Madame Butterfly Giacomo Puccini 9P Turandot Giacomo Puccini 10P The Marriage of Figaro Wolfgang Amadeus Mozart The obtained results are presented in Table 3. T a b l e 3 . Obtained Results Libretto jK1 jK2 jK3 jK4 jK5 jK6 jK7 jK8 jV 1P 1 1 1 0 1 0 1 1 1 2P 1 1 1 0 1 0 1 1 1 3P 1 1 1 0 1 0 1 1 1 4P 1 1 1 1 1 1 1 1 1 5P 1 1 1 1 1 1 1 1 1 6P 0 1 0 1 0 0 0 0 0 7P 0 1 0 1 0 0 0 0 0 8P 0 1 0 1 0 0 0 0 0 9P 0 1 0 1 0 0 0 0 0 10P 0 1 0 1 1 0 0 0 0 In Table 3, each publication 10,,1, jPj is assigned a value »1« , if it is considered propaganda based on feature ix , 8,,1i and value »0« otherwise. DISCUSSION OF RESEARCH RESULTS The obtained results were compared with the expert opinion of M.I. Hamkalo, associate professor in the field of musical directing at the Tchaikovsky National Music Academy of Ukraine. The comparison is presented in Table 4. Thus, from Table 4, it is evident that the proposed MMDP identified the presence of propaganda elements in the same opera librettos as the expert. Accordingly, the values of the 1Precision  , 1Recall  metrics, confirm the high accuracy of the MMDP. Determining the level of propaganda in opera librettos using data mining and machine learning Системні дослідження та інформаційні технології, 2025, № 2 95 T a b l e 4 . Comparison of MMDP Results with Expert Opinion Libretto jV Expert Opinion Expert’s Argumentation 1P 1 1 Propaganda: Anti-Catholicism d l i i F 2P 1 1 Propaganda: German nationalism 3P 1 1 Propaganda: Liberalism and the struggle for freedom 4P 1 1 Propaganda: Revolutionary spirit and fight for independence 5P 1 1 Propaganda: Russian imperial narrative 6P 0 0 No propaganda: Pure melodrama about personal emotions, without political or social context 7P 0 0 No propaganda: The opera has no ideological connota- tions, only depicting emotions and the fatality of destiny 8P 0 0 No propaganda: A personal tragedy and cultural misunderstandings, without a political message 9P 0 0 No propaganda: A mythical story not tied to specific political events 10P 0 0 No propaganda: Despite criticism of the feudal system, it is more about romantic twists than politics CONCLUSIONS Propaganda in opera is a powerful tool for influencing society, utilizing the impact of music, librettos, and stage performances to shape specific ideological narratives. Throughout different historical periods, opera has served as an instrument of state propaganda, expressing political, social, and nationalist ideas. In the XIX century, during the era of Romanticism, opera was often used to elevate national spirit and support struggles for independence (for example, “Nabucco” by Giuseppe Verdi became a symbol of the Italian liberation movement). In the ХХ century, totalitarian regimes actively employed opera to reinforce state ideology: Soviet socialist realism, Nazi Germany, and Maoist Chi- na promoted productions that glorified the party, leaders, or the “ideal citizen”. Despite this, opera also served as a means of protest and counter- propaganda. It became a tool for criticizing authority or social structures, often using allegorical plots or hidden messages. Thus, opera not only reflects historical context but also actively shapes pub- lic consciousness, making it a significant instrument of both official and opposi- tional propaganda. This study presents an adapted multifactor model, which allows for the as- sessment of propaganda levels in the librettos of world opera masterpieces. This model is based on the linear aggregation method, for the implementation of which eight indicators were selected. These indicators are the most effective in detecting propaganda elements in a text, taking into account the specific features of the subject area. Each of the selected indicators was calculated using statistical analysis, Data Mining methods, and Machine Learning techniques. As a result of the proposed method, a value function is computed for each publication, based on which a conclusion is drawn regarding whether it contains propaganda elements or not. Advantages of the Proposed Model: 1. Elimination of Human (Subjective) Influence — the model’s calculations rely solely on statistical data or data obtained through Data Mining and Machine Learning methods, ensuring objectivity in detecting propaganda indicators. I. Dats, O. Gavrilenko, K. Feshchenko ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 96 2. Scalability — the model can be easily expanded by adding new indicators or removing outdated ones, making it adaptable to evolving research needs. 3. Result Accuracy — the correctness of the obtained results is guaranteed by the use of classical Data Mining and Machine Learning methods. Disadvantages of the Proposed Model: 1. Large Data Requirements — the model requires the collection and storage of vast amounts of statistical and textual data, which may pose challenges in data management. 2. Continuous Accuracy Monitoring — the reliability of conclusions must be regularly evaluated. In this study, an expert in the subject area was consulted. In other domains, the accuracy of the MMDP model should be validated using mul- tiple propaganda detection methods. The obtained results can be used as an effective tool in information warfare, both in Ukraine and globally, serving as a powerful element of intent analysis. Additionally, they can assist directors and actors in musical-dramatic theaters, including opera houses and operetta theaters. Focusing specifically on the concept of artistic propaganda, the proposed methodology can be applied to all forms of art that are in some way related to tex- tual data, such as songs, films, theater, literature, and poetry. For these domains, the methodology would differ only in terms of input statistical data, such as song lyrics, brief descriptions of literary works, or play scripts. It would also vary in the values of weight coefficients in formulas (1), (2), and (9), as well as in the adaptation of propaganda features presented in [13], where some characteristics may be added or removed depending on the specific artistic field. REFERENCES 1. Zongrui Zhang, “Model Opera” of the 20th Century in Chinese Musical Culture,” Art History Notes, no. 43, pp. 206–210, 2023. doi: https://doi.org/10.32461/2226-2180.43.2023.286862 2. W. Li, S. Li, C. Liu, L. Lu, Z. Shi, S. Wen, “Span identification and technique classifica- tion of propaganda in news articles,” Complex Intell. Syst., vol. 8, pp. 3603–3612, 2022. doi: https://doi.org/10.1007/s40747-021-00393-y 3. Maram Hasanain, Fatema Ahmed, Firoj Alam, “Can GPT-4 Identify Propaganda? Anno- tation and Detection of Propaganda Spans in News Articles,” Computation and Language (cs.CL), 2024. doi: https://doi.org/10.48550/arXiv.2402.17478 4. K. Hamilton, “Towards an Ontology for Propaganda Detection in News Articles,” in R. Verborgh et al. The Semantic Web: ESWC 2021 Satellite Events. ESWC 2021. Lecture Notes in Computer Science, vol. 12739. Springer, Cham, 2021. doi: https://doi.org/10.1007/978-3-030-80418-3_35 5. G. Da San Martino, S. Yu, A. Barrón-Cedeño, R. Petrov, P. Nakov, “Fine-grained analy- sis of propaganda in news article,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Lin- guistics, pp. 5635–5645, 2019. doi: https://doi.org/10.18653/ v1/D19-1565 6. G.L. Ciampaglia, P. Shiralkar, L.M. Rocha, J. Bollen, F. Menczer, A. Flammini, “Com- putational fact checking from knowledge networks,” PLoS One, 10(6), 15, 2015. doi: https://doi.org/10.1371/journal.pone.0128193 7. G. Pocheptsov, Modern information wars. Kyiv: Kyiv-Mogylianska Academy, 2015, 497 p. 8. S. Ghosal, A. Jain, “CatRevenge: towards effective revenge text detection in online social media with paragraph embedding and CATBoost,” Multimed Tools Appl., 83, pp. 89607– 89633, 2024. doi: https://doi.org/10.1007/s11042-024-18791-y 9. R. Alhajj, J. Rokne (Eds), Encyclopedia of Social Network Analysis and Mining. Springer, New York, NY., 2018, 2200 p. doi: https://doi.org/10.1007/978-1-4614-7163-9 10. Ramona-Diana Leon, Raúl Rodríguez-Rodríguez, Pedro Gómez-Gasquet, Josefa Mula, “Social network analysis: A tool for evaluating and predicting future knowledge flows from an insurance organization,” Technological Forecasting and Social Change, vol. 114, pp. 103–118, 2017. doi: https://doi.org/10.1016/j.techfore.2016.07.032 Determining the level of propaganda in opera librettos using data mining and machine learning Системні дослідження та інформаційні технології, 2025, № 2 97 11. Sergii Telenyk, Grzegorz Nowakowski, Olena Gavrilenko, Mykhailo Miahkyi, Olena Khalus, “Analysis of the influence of posts of famous people in social networks on the cryptocurrency course,” Bulletin of the Polish Academy of Sciences Technical Sciences, vol. 72(4), 2024. doi: https://doi.org/10.24425/bpasts.2024.150117 12. O. Gavrilenko, Y. Oliinyk, H. Khanko, “Analysis of Propaganda Elements Detecting Al- gorithms in Text Data,” in Z. Hu, S. Petoukhov, I. Dychka, M. He, (Eds) Advances in Computer Science for Engineering and Education II. ICCSEEA 2019. Advances in Intel- ligent Systems and Computing, vol. 938, Springer, Cham, 2020, pp. 438–447. doi: https://doi.org/10.1007/978-3-030-16621-2_41 13. O. Gavrilenko, K. Feshchenko, “Detecting propaganda in news flows,” Adaptive systems of automatic control, no. 1 (46), pp. 160–177, 2025. doi: https://doi.org/10.20535/1560- 8956.46.2025.323759 14. V. Oliinyk, I. Matviichuk, “Low-resource text classification using cross-lingual models for bullying detection in the Ukrainian language,” Adaptive systems of automatic control, no. 1 (42), pp. 87–100, 2023. doi: https://doi.org/10.20535/1560-8956.42.2023.279093 15. Opera librettos and arias by foreign authors notes. Accessed on: Feb. 23, 2025. [Online]. Available: https://musicinukrainian.wordpress.com/biblio/import_opera/ 16. Jürgen Branke, Kalyanmoy Deb, Kaisa Miettinen, Roman Słowiński, Multiobjective Optimization. Springer-Verlag Berlin Heidelberg, 2008, 470 p. doi: https://doi.org/10.1007/978-3-540-88908-3 17. Kalyanmoy Deb, Multi-Objective Optimization using Evolutionary Algorithms. Wiley, 2001, 536 p. 18. Ronald E. Walpole, Raymond H. Myers, Sharon L. Myers, E.Ye. Keying, Probability and Statistics for Engineers and Scientists; 9th ed. Pearson, 2016, 816 p. 19. Sheldon Ross, A First Course in Probability; 10th ed. Pearson, 2018, 528 p. 20. Operavision. Accessed on: Feb. 23, 2025. [Online]. Available: https://operavision.eu/ 21. Rudolf Flesch, How to Write Plain English: A Book for Lawyers and Consumers. Harper & Row, 1979, 126 p. 22. Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, Mining of Massive Datasets. Cam- bridge University Press, 2014, 326 p. 23. Giovanni Da San Martino, Seunghak Yu, Alberto Barrón-Cedeño, Rostislav Petrov, Pres- lav Nakov, “Fine-Grained Analysis of Propaganda in News Articles,” Computation and Lan- guage (cs.CL), 2019. doi: https://doi.org/10.48550/arXiv.1910.02517 24. P.G. Preethi, V. Uma, A. Kumar, “Temporal Sentiment Analysis and Causal Rules Ex- traction from Tweets for Event Prediction,” Procedia Computer Science, no. 48, pp. 84–89, 2015. doi: https://doi.org/10.1016/j.procs.2015.04.154 Received 01.03.2025 INFORMATION OF THE ARTICLE Iryna V. Dats, ORCID: 0000-0003-3851-2047, Tchaikovsky National Music Academy of Ukraine, Ukraine, e-mail: irynadats@gmail.com Olena V. Gavrilenko, ORCID: 0000-0003-0413-6274, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: gelena1980@gmail.com Kyrylo Yu. Feshchenko, ORCID: 0009-0002-8142-179X, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: fkirill440@gmail.com ВИЗНАЧЕННЯ РІВНЯ ПРОПАГАНДИ В ОПЕРНИХ ЛІБРЕТО ЗА ДОПОМОГОЮ ЗАСОБІВ DATA MINING ТА MACHINE LEARNING / І.В. Даць, О.В. Гавриленко, К.Ю. Фещенко Анотація. Подано адаптовану багатофакторну модель, яку можна використати для визначення рівня пропаганди в лібрето до світових опер. Модель створено на основі методу лінійної згортки, для реалізації якого обрано 8 індикаторів, найбільш ефективних для виявлення елементів пропаганди в тексті з ураху- ванням особливостей предметної галузі. Кожного з обраних індикаторів роз- раховано з використанням методів статистичного аналізу, Data Mining та ма- шинного навчання. У результаті застосування запропонованого методу для кожного лібрето розраховується значення функції цінності, на основі якого робиться висновок про те, чи містить вона елементи пропаганди, чи ні. Ключові слова: мистецтво, пропаганда, опера, лібрето, багатофакторна модель, статистичний аналіз, Data Mining, Machine Learning, інформаційна технологія.
id journaliasakpiua-article-335973
institution System research and information technologies
keywords_txt_mv keywords
language English
last_indexed 2025-09-17T09:26:03Z
publishDate 2025
publisher The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
record_format ojs
resource_txt_mv journaliasakpiua/4f/b716dc234c2656b284d618d5d00e284f.pdf
spelling journaliasakpiua-article-3359732025-07-25T15:56:08Z Determining the level of propaganda in opera librettos using data mining and machine learning Визначення рівня пропаганди в оперних лібрето за допомогою засобів data mining та machine learning Dats, Iryna Gavrilenko, Olena Feshchenko, Kyrylo мистецтво пропаганда опера лібрето багатофакторна модель статистичний аналіз Data Mining Machine Learning інформаційна технологія art propaganda opera libretto multivariate model statistical analysis Data Mining Machine Learning information technology The article presents an adapted multifactorial model that can be used to determine the level of propaganda in librettos to world operas. This model was created using the linear convolution method, for which eight indicators were selected that are most effective in identifying elements of propaganda in the text, taking into account the subject area's peculiarities. Each of the selected indicators was calculated using statistical analysis, data mining, and machine learning methods. As a result of applying the proposed method, the value function is calculated for each libretto, based on which a conclusion is made as to whether it contains elements of propaganda or not. Подано адаптовану багатофакторну модель, яку можна використати для визначення рівня пропаганди в лібрето до світових опер. Модель створено на основі методу лінійної згортки, для реалізації якого обрано 8 індикаторів, найбільш ефективних для виявлення елементів пропаганди в тексті з урахуванням особливостей предметної галузі. Кожного з обраних індикаторів розраховано з використанням методів статистичного аналізу, Data Mining та машинного навчання. У результаті застосування запропонованого методу для кожного лібрето розраховується значення функції цінності, на основі якого робиться висновок про те, чи містить вона елементи пропаганди, чи ні. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025-06-28 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/335973 10.20535/SRIT.2308-8893.2025.2.05 System research and information technologies; No. 2 (2025); 81-97 Системные исследования и информационные технологии; № 2 (2025); 81-97 Системні дослідження та інформаційні технології; № 2 (2025); 81-97 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/335973/324763
spellingShingle мистецтво
пропаганда
опера
лібрето
багатофакторна модель
статистичний аналіз
Data Mining
Machine Learning
інформаційна технологія
Dats, Iryna
Gavrilenko, Olena
Feshchenko, Kyrylo
Визначення рівня пропаганди в оперних лібрето за допомогою засобів data mining та machine learning
title Визначення рівня пропаганди в оперних лібрето за допомогою засобів data mining та machine learning
title_alt Determining the level of propaganda in opera librettos using data mining and machine learning
title_full Визначення рівня пропаганди в оперних лібрето за допомогою засобів data mining та machine learning
title_fullStr Визначення рівня пропаганди в оперних лібрето за допомогою засобів data mining та machine learning
title_full_unstemmed Визначення рівня пропаганди в оперних лібрето за допомогою засобів data mining та machine learning
title_short Визначення рівня пропаганди в оперних лібрето за допомогою засобів data mining та machine learning
title_sort визначення рівня пропаганди в оперних лібрето за допомогою засобів data mining та machine learning
topic мистецтво
пропаганда
опера
лібрето
багатофакторна модель
статистичний аналіз
Data Mining
Machine Learning
інформаційна технологія
topic_facet мистецтво
пропаганда
опера
лібрето
багатофакторна модель
статистичний аналіз
Data Mining
Machine Learning
інформаційна технологія
art
propaganda
opera
libretto
multivariate model
statistical analysis
Data Mining
Machine Learning
information technology
url https://journal.iasa.kpi.ua/article/view/335973
work_keys_str_mv AT datsiryna determiningthelevelofpropagandainoperalibrettosusingdataminingandmachinelearning
AT gavrilenkoolena determiningthelevelofpropagandainoperalibrettosusingdataminingandmachinelearning
AT feshchenkokyrylo determiningthelevelofpropagandainoperalibrettosusingdataminingandmachinelearning
AT datsiryna viznačennârívnâpropagandivopernihlíbretozadopomogoûzasobívdataminingtamachinelearning
AT gavrilenkoolena viznačennârívnâpropagandivopernihlíbretozadopomogoûzasobívdataminingtamachinelearning
AT feshchenkokyrylo viznačennârívnâpropagandivopernihlíbretozadopomogoûzasobívdataminingtamachinelearning