Концептуальна модель та система для заміни тексту на зображенні зі збереженням стилю

Text replacement in images, particularly while preserving its style, is a complex task that requires solving a range of scientific challenges and developing new technical solutions. One of the main issues is maintaining the authenticity and harmony of the image after modifications. The Research Obje...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Datum:	2025
Hauptverfasser:	Maslianko, Pavlo, Romanov, Mykola
Format:	Artikel
Sprache:	Englisch
Veröffentlicht:	The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025
Schlagworte:	метод системної інженерії бізнес профіль Еріксона–Пенкера концептуальна модель система для заміни тексту на зображенні зі збереженням стилю
Online Zugang:	https://journal.iasa.kpi.ua/article/view/335821
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:	System research and information technologies
Завантажити файл:

Institution

System research and information technologies

_version_	1867334451447463936
author	Maslianko, Pavlo Romanov, Mykola
author_facet	Maslianko, Pavlo Romanov, Mykola
author_institution_txt_mv	[ { "author": "Pavlo Maslianko", "institution": "National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" }, { "author": "Mykola Romanov", "institution": "National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" } ]
author_sort	Maslianko, Pavlo
baseUrl_str	http://journal.iasa.kpi.ua/oai
collection	OJS
datestamp_date	2025-07-25T15:56:08Z
description	Text replacement in images, particularly while preserving its style, is a complex task that requires solving a range of scientific challenges and developing new technical solutions. One of the main issues is maintaining the authenticity and harmony of the image after modifications. The Research Objective is the development of a conceptual model and a system for text replacement in images with style preservation based on systems engineering methodology and the Eriksson-Penker business profile, ensuring the natural integration of new text elements into the image’s context. Implementation Methodology – the systems engineering methodology and the Eriksson-Penker business profile are used to formalize the structured process of developing a system for text replacement in images with style preservation. Research Results – a method for developing the system based on systems engineering techniques was proposed, consisting of four main stages. In the first stage, the system structure is modeled as an Eriksson-Penker business profile. In the second stage, a set of processes is defined that are characteristic of the Data Science system class and the CRISP-DM international standard. Also, the structural and dynamic representations of the conceptual model, as well as the component interaction interfaces, are modeled. The third stage involves implementing a specific version of the system, while the fourth stage focuses on system verification and validation. A systems engineering method for the conceptual model and system for text replacement in images with style preservation has been proposed. It is based on a modified Eriksson-Penker business profile for metalevel system representation and international standards for Data Science and Data Mining processes.
doi_str_mv	10.20535/SRIT.2308-8893.2025.2.04
first_indexed	2025-07-27T04:04:08Z
format	Article
fulltext	 Publisher IASA at the Igor Sikorsky Kyiv Polytechnic Institute, 2025 Системні дослідження та інформаційні технології, 2025, № 2 61 UDC 004.6:33 DOI: 10.20535/SRIT.2308-8893.2025.2.04 A CONCEPTUAL MODEL AND A SYSTEM FOR REPLACING TEXT IN AN IMAGE WHILE PRESERVING THE STYLE P. MASLIANKO, M. ROMANOV Abstract. Text replacement in images, particularly while preserving its style, is a complex task that requires solving a range of scientific challenges and developing new technical solutions. One of the main issues is maintaining the authenticity and harmony of the image after modifications. The Research Objective is the development of a conceptual model and a system for text replacement in images with style preservation based on systems engineering methodology and the Eriksson-Penker business profile, ensuring the natural integration of new text elements into the image’s context. Implementation Methodology – the systems engineering methodology and the Eriksson-Penker business profile are used to formalize the structured process of developing a system for text replacement in images with style preservation. Research Results – a method for developing the system based on systems engineering techniques was proposed, consisting of four main stages. In the first stage, the system structure is modeled as an Eriksson-Penker business profile. In the second stage, a set of processes is defined that are characteristic of the Data Science system class and the CRISP-DM international standard. Also, the structural and dynamic representations of the conceptual model, as well as the component interaction interfaces, are modeled. The third stage involves implementing a specific version of the system, while the fourth stage focuses on system verification and validation. A systems engineering method for the conceptual model and system for text replacement in images with style preservation has been proposed. It is based on a modified Eriksson-Penker business profile for metalevel system representation and international standards for Data Science and Data Mining processes. Keywords: systems engineering method, Eriksson-Penker business profile, concep- tual model, system for text replacement in images with style preservation. BACKGROUND Replacing text in an image, in particular, while preserving its style, is a complex task that requires solving a number of scientific problems and new technical solu- tions. One of the main challenges is to preserve the authenticity and harmony of the image after the changes have been made, as any modification of the text can disrupt the original style, fonts, colours and visual composition. To make such changes look natural, a mechanism is needed that can effectively identify the text style, recreate it, and adapt it to the new content. Key aspects of the task: 1. Preserving the style: when replacing text, it is important not only to cor- rectly reproduce the font, size, colour and other visual characteristics, but also to integrate it into the overall visual aesthetics of the image (including background, shadows, textures). 2. Adaptability to different styles: texts on images can be executed in a wide variety of styles — from handwritten text to complex graphic elements. The sys- P. Maslianko, M. Romanov ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 62 tem should be able to work with different text styles and formats, automatically detecting them. 3. Contextual replacement: Often, text is not just a separate element, but is integrated into a complex visual scene. Therefore, the replacement should take into account not only the style but also the context of the image to make the new text look appropriate. 4. Automation and scalability: manual text replacement is a time-consuming process in many cases. Automation of this task will significantly save time with a large volume of images, which is important for industries such as marketing, de- sign or content localisation (Table 1). T a b l e 1 . Overview and comparative analysis of existing solutions Approach Advantages Disadvantages Manual tools High precision at the right skill level, full user control Labour intensive, difficult to scale Classic computer vision algorithms Speed, automation of key processes Limited ability to save complex text styles, need for additional customisation Generative models (GAN) Ability to automatically save complex stylistic elements of text High cost of computing, dependence on data quality, long learning curve Current methods for style-preserving text replacement in images are mostly based on generative adversarial networks and deep learning. StyleGAN remains one of the most effective solutions, thanks to its ability to accurately reproduce stylistic elements of text and integrate them into the overall context of the image. Motivation for the development of the conceptual model and system to replace text in an image while preserving the style. The main motivation is the need for a system that will automatically replace text in an image while maintaining its style. This is necessary for such cases of activity: 1. Content localisation: Many companies and brands need to adapt their visuals for different markets and languages. For example, advertising banners or posters may have text that needs to be translated and replaced in other languages without losing style and graphic authenticity. 2. Graphic design: Quickly changing text on design mockups can be impor- tant when working on prototypes or changes during the approval phase of projects. Thus, the relevance of the problem lies in the need to develop a scientifically grounded conceptual model of this class of systems and implement an automated tool for replacing text in images while preserving the style. The solution to this problem opens up opportunities to speed up the content creation process, improve its quality and accuracy of visual adaptation for various purposes. PROBLEM STATEMENT Object of research. Conceptual ideas and approaches, existing mathematical models and algorithms for detecting, analysing and synthesising text in images, their theoretical aspects, architecture and existing software tools for image and text processing. Subject of research. Conceptual model and system of text replacement in an image with style preservation. Structural and dynamic representation of the system, image generation algorithms. Models and algorithms for style control in the process of image generation. A conceptual model and a system for replacing text in an image while preserving… Системні дослідження та інформаційні технології, 2025, № 2 63 Research objective. To develop a conceptual model and a system for replacing text in images while preserving its style, which allows to accurately identify, analyse and reproduce the visual characteristics of the text (font, size, colour, orientation, textures, shadows and other stylistic elements), ensuring the natural integration of new text elements into the context of the image. Final result. A conceptual model and a system for replacing text in an im- age while preserving its style, as well as verification and validation results. A method of system engineering of a conceptual model and a system for re- placing text on images while preserving its style The main idea behind the development of the conceptual model is based on [1–4] and consists in applying the systems engineering methodology and the Eriksson– Penker business profile to formalise an orderly way of developing a text-to- picture image replacement system with style preservation (Fig. 1). One of the most common models of activity representation is the Eriksson– Penker business profile [5–8], in the context of which the authors formulated four main essences of the formal representation of the activity of any business system (Fig. 1): – goals (represent the purpose of the system and are formulated as a rule. Goals can be broken down into sub-goals and achieved through the implementa- tion of processes); – processes (the main actions that make up the system’s activities and are intended to achieve the goal in accordance with the established business rules. Processes are usually subject to rules, can change the state of input resources, and produce new resources — the system’s output resources — in accordance with the conditions and requirements set by stakeholders); – resources (physical, abstract or informational objects that the system con- sumes, uses, processes and produces throughout its activities to achieve the goal); – rules (certain formalised restrictions, frameworks, conditions and re- quirements, etc. that are imposed on processes and describe the nature of the rela- tionships between resources). Such an ordered set of formalised (in particular, in UML notation) entities and system representations based on the Eriksson–Penker business profile formalises and systemises the conceptual model (meta-model) of a text-to-picture image replacement system with style preservation (Fig. 1). Thus, the essence of the system engineering method of the conceptual model of the image text replacement system with style preservation is to apply the system approach and the Eriksson–Penker business profile to formalise and produce such systems At the second stage, we formalise the structural and dynamic representation of the conceptual model of the system, interfaces for component interaction, technical requirements and specifications of all stakeholders. At the third stage, we implement a specific version of the system based on technological and mathematical tools and in accordance with the technical requirements and specifications of all stakeholders. The fourth stage involves verification and validation of a specific version of the system. P. Maslianko, M. Romanov ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 64 Formalisation of Eriksson–Penker business profile classes for a conceptual model and a system for replacing text on images while preserving its style Let’s define the content of each of the classes of the diagram (Fig. 1) in terms of the system engineering problem, namely classes: 1. Problem (an actual issue that requires appropriate solutions, the main mo- tivation for developing a system that leads to the formulation of a specific goal. The problem of this work: the need to develop a conceptual model and an auto- mated tool for replacing text in images while preserving the style. The solution to this problem opens up opportunities for significantly speeding up the content creation process, improving its quality and accuracy of visual adaptation for vari- ous purposes). 2. Purpose (expresses the global goal of the work designed to solve the problem. The purpose of this work: To develop a system for automatically replac- ing text in images while preserving its style, which allows for accurate identifica- tion, analysis and reproduction of visual text characteristics (font, size, colour, orientation, textures, shadows and other stylistic elements), ensuring natural inte- gration of new text elements into the image context). 3. Process (a set of processes of the system’s activity, which results in achieving the goal, a clearly defined sequence of actions/subprocesses that leads to the fulfilment of a certain task. The processes of this system are: Loading train- ing data, Pre-processing training data, Training the model, Replacing text in an image with preservation of style, Functioning of the web application). 4. State change (possible changes in certain resources as a result of the proc- esses. The system has three state changes: Image with text and word coordinate file → Segmented images (the Pre-process training data process), Segmented im- ages with text labels → Transformed images and text labels (the Replace text in image with preservation of style process), Initialised model → Trained model (the Train model process)). Fig. 1. Improved Eriksson–Penker business profile. Class diagram in UML notation [1] A conceptual model and a system for replacing text in an image while preserving… Системні дослідження та інформаційні технології, 2025, № 2 65 5. Resource (any entities (tangible or intangible) that are consumed and pro- duced by the system under development. The resources of the lowest level of the hierarchy, which are directly in- volved in the processes, are also divided into the following three classes by the nature of their influence on the processes: – Business process output (resources produced by the system, the end result of its operation. This includes the generated image with new text). – Business process support resource (resources that support the execution of processes, but are not the final result of work: Trained model, Computer hardware and computing resources, Pre-processing algorithms, Metadata files, Software). – Business process input (primary input resources of the initial processes that initialise the system cycle: Image with text, Metadata file for segmentation, Image for text replacement, New text). 6. Event (occurs due to certain external factors or as a result of interaction between processes. The potential events of this system are Uploading an image with text by a user, Entering a new text to replace it, Loading training data, Com- pleting preprocessing, Completing model training, Transforming input parame- ters, Outputting results via the interface). 7. Business Rule (BR): formal instructions that regulate, limit, establish the context and framework for the functioning of processes. Example of a business rule: the format of images must be JPG, JPEG, PNG). Thus, on the basis of the conceptual model (meta-model) of the image text replacement system with style preservation shown in Fig. 1, we can reasonably formalise the structural representation of the image text replacement system with style preservation in the form of a component diagram (Fig. 2). Formalisation of the structural representation of the conceptual model of the system Such a conceptual model of structural representation (Fig. 2) formalises the class of systems for replacing text in an image with preserving the style in the form of an ordered set of entities and relations between them. An important property of such a conceptual model is the ability to implement the internal structure of the system components on the basis of various engineering and mathematical tools necessary for the implementation of a particular system without changing the structure and interfaces of interaction between the components. Here is a short list of the system’s interfaces and their functionality:  IDD (Interface Dataset Download) — interface for downloading an exter- nal data set;  IIP (Interface Image Preprocessing) — interface for transferring the ini- tially processed downloaded data set;  ISIP (Interface Segmented Image Processing) — interface for transferring processed segmented images with annotations;  IMT (Interface Model Training) — interface for transferring the trained model;  IIITP (Interface Input Image and Text Processing) — an interface for transmitting the result of processing images and text entered by the user. Based on the model of structural representation, we formalise the model of dynamic representation of the system, which shows the internal structure of com- ponents and the algorithm of interaction between them (Figs. 3, 4). P. Maslianko, M. Romanov ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 66 Formalisation of the dynamic representation of the conceptual model of the system The formalisation of the dynamic representation is determined on the basis of a set of processes specific to the class of Data Science systems, according to the Data Science process defined by O’Neill and Schutt [2] and the international standard CRISP-DM as interpreted by Foster and Fawcett [3] (Figs. 3, 4). At this level, the organisation of activities can be refined to take into account the specifics of the system and, as a result, decomposed into the following three sub-stages: 1. Collection, analysis, and processing of training text data to be used in model training according to the Data Science process model proposed by O’Neill and Schutt [2] or the Data under standing and Data processing stages of the CRISP-DM (CRoss Industry Standard Process for Data Mining) analysed by Foster and Fawcett [3]. 2. The actual construction (architecture development) and training of the model is analogous to the Machine Learning Algorithms Statistical Models stage [2] or the Modeling stage of the standard Data Mining process [3]. 3. Determining metrics for evaluating the effectiveness of both trained models and the system as a whole is analogous to the Report Findings stage [2] or the Evaluation stage of the standard Data Mining process [3]. Fig. 2. Conceptual model of a system for replacing text in an image while preserving the style. Component diagram in UML notation A conceptual model and a system for replacing text in an image while preserving… Системні дослідження та інформаційні технології, 2025, № 2 67 Thus, the Eriksson–Penker business profile is a system of classes and relationships between them that are necessary and sufficient to represent and develop a conceptual model of the system. And the corresponding set of Data Science technologies are tools for implementing the components of the conceptual model of the system. An exhaustive list of business rules (specifications) for a business profile regulates the functionality of the user interface of a particular system, for example: BR1 — The user must enter new text and select a replacement image before starting the process. If this data is not provided, the system does not start processing and sends a request to fill in all fields. BR2 — If the user uploads an image for the style, it must comply with the established formats (JPEG, JPG, PNG) and be no larger than 10 MB. BR3 — After the text replacement is complete, the user can preview the result before uploading it. BR4 — If an error occurs (incorrect image loading, segmentation errors, insufficient data for training, etc.), the system should stop the process and send a clear error message to the user indicating possible ways to fix it. Implementation of the conceptual model for a specific version of the system for replacing text in an image while preserving the style. Implementation of the structural representation of the system Since the conceptual model of the system is a system of classes and relations be- tween them, necessary and sufficient for the functioning of the system (Fig. 1), and the conceptual model of the system in the form of a diagram of components has the form shown in Fig. 2, then the purpose and functionality of the system components will be as follows: Fig. 3. Conceptual model of a system for replacing text in an image while preserving the style. The first level activity diagram in UML notation P. Maslianko, M. Romanov ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 68 1. Training data loading. This module is responsible for loading images from the IMGUR5K dataset [9], checking their integrity using hashes, and load- ing annotations for use in further processing steps. 2. Training data preprocessing. This module is responsible for segmenting the uploaded images and saving them to the appropriate directories for training, validation and testing. It is also possible to reduce the size of the training dataset by random sampling, which is useful in situations of lack of computing power. 3. Model training. The main module of the system, which results in a trained model for transforming the entered images and text. For training, pre-segmented images are used with corresponding annotations that describe the text on the seg- mented image. Fig. 4. Conceptual model of a system for replacing text in an image while preserving the style. A detailed process diagram based on a second-level activity diagram in UML notation A conceptual model and a system for replacing text in an image while preserving… Системні дослідження та інформаційні технології, 2025, № 2 69 4. Replacing text with images. A module that serves as a facade for interact- ing with the trained model, where the trained model is loaded into memory, input parameters are passed to the model, and the results are returned in the form of generated images. 5. User interface. A module that is an interface (web page) where a user can upload their image with a style and enter text to generate an image with text in the style of the uploaded image. Implementation of a dynamic system representation.The dynamic repre- sentation of the system in the form of an activity diagram (Fig. 3) is a visualisa- tion of the main cycles of the processes and their first-level interaction. Fig. 4 shows a diagram of the processes of the conceptual model of the sys- tem in the form of a diagram of interaction of components (see Fig. 2) and shows their internal structure. Such a representation makes it possible to group the main processes of the system by the relevant components. The process diagram is an integral part of the conceptual model based on the Eriksson–Penker business pro- file, as it combines both types of representations of the system, structural and dy- namic, and formalises its activities. Implementation of the Training Data Loading and Training Data Preproc- essing components Data download module. This module is responsible for loading images from the IMGUR5K dataset [9], checking their integrity using hashes, and loading annota- tions for use in further processing steps. Uploading images: For each image, the module generates a URL, uses the requests library to download it, and saves the file in a specific directory. After the image is uploaded, its hash is checked to ensure authenticity. Parallel uploading: To optimise the uploading process, the module uses the ThreadPoolExecutor from the futures library, which allows you to upload several images simultaneously, which significantly speeds up processing. Loading annotations: After the images are successfully uploaded, the module loads annotations for each image in JSON format, containing information about the image, its hash, and the coordinates of text elements. The annotations are also divided into three sets: training, validation, and test. Training data processing module. This module is responsible for process- ing the uploaded images, extracting text features, and preparing the dataset for model training. Processing of text elements: For each image, the module uses annotations to find text features in the images. The crop_minAreaRect() function is used to crop the image around the text, taking into account its coordinates and orienta- tion. This function is based on the perspective transformation algorithm cv2.getPerspectiveTransform() from the OpenCV library, which allows you to accurately select text taking into account its slope. Saving processed images: The cropped text elements are saved in PNG for- mat, and a corresponding record is created in the JSON file with the text on the image for each of them. Reducing the data sample: If you specify the reduce option, the module ran- domly selects a part of the images for processing, which allows you to work with a smaller sample for testing or speeding up the work. P. Maslianko, M. Romanov ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 70 Implementation of the Model Training component. Model training process: 1. Load the training dataset (segmented images with annotations). 2. Creating an initialised model of the initial architecture using available py- thon libraries and frameworks / loading a pre-trained model — the latest version of the saved model. 3. Training the model using the GAN algorithm on the training data set by optimising the loss function. 4. Conducting validation to determine the optimal model architecture and model hyperparameters on the validation dataset with the calculation of MSE, SSIM, PSNR and FID metrics. 5. Final testing on the test data set. 6. Saving the current version of the model, selecting the most efficient ve 7. rsion for processing user requests. Mathematical support of the model training component In Fig. 5 we can see a high-level diagram of the model training process. This dia- gram is based on the TextStyleBrush architecture [9]. It shows the main compo- nents of the model, and we will describe them: 1. Content — an image of the text we want to see on the generated image. This image is formed from the annotations to the images, namely, the text is taken and transformed into an image using a file that describes the font (VerilySerif- Mono) and functions from the PIL library. The size of all images is fixed — 192 by 64 pixels. This is done in order to have the same data types in all modules of our model, which greatly simplifies implementation. 2. Style content — an image of the text that appears on the image with the desired style. 3. Style image — an image that contains the desired style for the generated image. 4. Content Encoder — it is a pre-trained ResNet18 model that does not in- clude average pooling layers and all subsequent ones, which is done to preserve the spatial properties of the image. 5. Style Encoder — it is a pre-trained ResNet18 model that lacks fully- connected layers. Fig. 5. High-level diagram of the model training process A conceptual model and a system for replacing text in an image while preserving… Системні дослідження та інформаційні технології, 2025, № 2 71 6. Style mapping network — the key component of the StyleGAN model, which transforms the style vector obtained by the Style Encoder into a vector of parameters used to control the style on each layer of the generator using fully connected layers. The network structure consists of the following layers: Pix- elNorm (for normalizing the values of the vector obtained from the StyleEncoder component), Linear (fully connected layer), and LeakyReLU (nonlinear activation function). 7. StyleGAN generator — is the main component of the model that is re- sponsible for generating images, taking as input an image with text and a vector of parameters obtained using the Style mapping network. The synthesizing network consists of several convolutional units that gradually increase the image resolution starting from a small initial feature map. Each block uses stylized layers, such as AdaIN [10], to control the style of the image. AdaIN (Adaptive Instance Normalization) — is a key component of StyleGAN that is responsible for styling images based on the style derived from the mapping network. AdaIN normalizes the feature activations for each feature map separately, and then scales and shifts their values according to the parameters obtained from the style vector. Formally, the work of this layer can be represented as follows: Let’s say we have: x — input feature tensor for the convolutional layer; w — style vector obtained from the mapping network; )(x and )(x — mean and standard deviation for any channel. First, we calculate the affine transformation:   bwy A ,   by A where A and A — weight matrices for multiplying the vector w, which pro- vide different scaling depending on the style; b and b — displacement vectors that shift the parameters that control the mean value (for y ) and standard devia- tion (for y ) of features in the image space. After the affine transformation, the style vector w creates the parameters y and y , which are then passed to AdaIN. Each feature channel in the tensor is normalized and modified according to these parameters, allowing each channel to respond to the style w . Next, the features are normalized, where the mean )(x and standard deviation )(x are calculated for each channel, and then the value is normalized: )( )( x xx x ij ij    . After normalization, each channel is scaled by y and shifted by y , which allows you to change the style:   yxyxAdaIN Thus, the affine transformation helps to transfer style parameters from the style vector w to the feature level, where they determine the color, texture, and other characteristics of the generated image. This allows for flexible control of visual attributes through the latent vector w . P. Maslianko, M. Romanov ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 72 8. Prediction — an image generated by the generator that combines a style from the Style image and text from the Content image. 9. Reconstruction — is an image generated by a generator that combines the style from the Style image and the text from the Style content image; under ideal conditions, the Reconstruction image should be identical to the Style image. Next, we will describe additional components of the system that help calcu- late the loss function for our model. Let’s start with the discriminator (Fig. 6), as we can see, using the Discrimi- nator component we calculate two loss functions — Discriminator adversarial loss and Generator adversarial loss. To calculate these two functions, we use the classical approach for GAN models, namely: ))0)), , ((()1), ((( 2 1 imagestylecontentstyleimagestyletordiscrimina zzGDMSEzDMSEL  , where MSE — mean squared error; D — discriminator; G — generator; imagestylez and contentstylez — the results of the Style Encoder and Content En- coder components, respectively. The generator loss function is calculated as follows: ))1)), , ((( imagestylecontentstylegenerator zzGDMSEL  . To calculate the OCR loss function (Fig. 7), we use the trained TRBA (Text Recognition by Attention) model, which is denoted as OCR Recognizer in the diagram, Content label and Style content label are the text in the image Content and Style content, respectively, which are available during training. In order to calculate the OCR loss function, we follow the following steps:  Normalize the image size.  Encode the text labels Content label and Style content label.  Pass the Prediction and Reconstruction images to the model and get the predicted text on these images in the encoded form.  Calculate the cross-entropy for the pairs (Prediction, Content label) and (Reconstruction, Style content label).  Sum the obtained values and divide them by two. Fig. 6. Diagram of the process of calculating the competitive loss function for the dis- criminator and generator A conceptual model and a system for replacing text in an image while preserving… Системні дослідження та інформаційні технології, 2025, № 2 73 Formally, the process of calculating the OCR loss function can be represented as follows:  ) ),((( 2 1 labelContentPredictionTRBACELOCR ) ),(() labelcontentStyletionReconstrucTRBACE , where CE — cross entropy; TRBA — pretrained model TRBA. To calculate the Reconstruction of the loss function (Fig. 8), we need to calculate the difference between the Style image and the Reconstruction image. Formally, this is written as follows: ), (1 tionReconstrucimageStyleLL tionreconstruc  , where 1L — is the 1L loss function or mean absolute error. Fig. 7. Diagram of the OCR loss function calculation process Fig. 8. Diagram of the process of calculating the Reconstruction loss function P. Maslianko, M. Romanov ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 74 To calculate the Cycle of the loss function (Fig. 9), we first need to get a Reconstruction image, and then pass it to the model input and get another image. The idea is that we should lose as little data as possible during the transformations driven by the model. Once we have two images, we calculate the average absolute error between them: To calculate the Typeface loss function (Fig. 10), we use the pre-trained VGG16 model, in which the last classification layer is removed, which means that the model will not perform classification, but will simply return a feature vector. After that, we pass two images to the model input — Style image and Reconstruction and calculate the L1 loss function between them, formally written as follows: Fig. 9. Diagram of the process of calculating the Cycle of the loss function Fig. 10. Diagram of the Typeface loss function calculation process A conceptual model and a system for replacing text in an image while preserving… Системні дослідження та інформаційні технології, 2025, № 2 75 ))(16), (16(1 tionReconstrucVGGimageStyleVGGLLtypeface  . To calculate the Perceptual and Texture loss functions (Fig. 11), we use the trained VGG16 model. For each block of VGG16, the difference between the ac- tivations of the corresponding layers of the input and target images is calculated, and the difference is measured using L1 loss, thus we obtain the Perceptual loss function: wh YXL Lperceptual ), (1  , where X and Y — are activations at the i-th layer for the input and target images, respectively; h and w — height and width of the corresponding activations. Texture losses are calculated in a similar way, but additionally, Gram matri- ces are calculated (The element ijG represents the mutual correlation between the i-th and j-th channels. It calculates how similar these two channels are on average, i.e. how much the pixel value variations of one channel are related to the varia- tions of the other. The gram matrix provides an idea of how different parts of the filters “cooperate” with each other to reflect the style or texture of the image): wh YGXGL Ltexture ))(),((1  , where )(XG and )(YG — are Gram matrices, which are defined as T )( XXXG  . The final loss function of the generator is as follows:  cycletionreconstrucOCRgeneratorfinal LLLLL 0.20.207.006.0 textureperceptualtypeface LLL 0.70.250.0  . The coefficients were selected empirically. Fig. 11. Diagram of the process of calculating Perceptual and Texture loss functions P. Maslianko, M. Romanov ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 76 Verification and validation of the system engineering method of the system The fourth stage involves verification and validation of the developed system to ensure that all technical specifications and stakeholder requirements are met. First of all, the model used to implement the Model Training component should be taken into account. It can be one of the following types of neural network archi- tectures (this list is not exhaustive):  convolutional neural networks;  recurrent neural networks;  generative-adversarial neural networks. Next, you can analyze the functionality, adequacy, and performance of the developed system based on the selected benchmarking metric. So, based on the definition of the concept of “method” [4], we can formulate the definition of the System Engineering Method of the conceptual model and the system of text replacement in images with style preservation: it is an ordered set of classes of tasks, processes, resources, business rules and relationships between them to produce a system based on the system engineering methodology, the Eriksson–Penker business profile and Data Science technologies. Defining criteria for evaluating system performance When designing a specific text-to-picture image replacement system, it is impor- tant to use appropriate metrics to evaluate the quality of image generation. The goal of such metrics is to measure how well the new images match the original in terms of visual similarity, stylistic authenticity, and detail accuracy. The metrics used in this paper are: mean square error (MSE), structural similarity (SSIM), peak signal-to-noise ratio (PSNR), and Frechet distance between distributions (FID). Each of them assesses different aspects of system quality and is effective for evaluating the performance of generative models. 1. Mean squared error (MSE) Mean Squared Error (MSE) — is a metric that evaluates the difference between the pixel values in the original image and the generated image. It is calculated as the arithmetic mean of the square deviations between the corresponding pixels of the two images. The formula for calculating MSE looks like this: 2 genorig 1 ))()(( 1 iIiI N MSE N i    , where N — number of pixels in the image; )(orig iI and )(gen iI — pixel values of the original and generated images, respectively. Advantages of MSE: Easy to implement and interpret; Well suited for evaluating pixel-by-pixel rendering accuracy. Why it’s useful: MSE allows you to measure the exact difference between the original and the generated image, which is important for assessing the quality of preservation of text elements and image details after text replacement. 2. Structural similarity (SSIM) Structural Similarity Index Measure (SSIM) — is a metric that evaluates the similarity between two images in terms of their structure, brightness, and contrast. A conceptual model and a system for replacing text in an image while preserving… Системні дослідження та інформаційні технології, 2025, № 2 77 SSIM tries to model human perception, which makes it more sensitive to changes in image structure [11]. SSIM calculation formula: )()( )2()2( ),( 2 2 gen 2 orig1 2 gen 2 orig 2gen-orig1genorig genorig CC CC IISSIM    , where orig and gen — average pixel values of the original and generated im- ages; orig and gen — covariance between the original and generated image; 1C and 2C — small constants to stabilize calculations. Advantages of: Good at showing structural changes in an image that are difficult to detect with MSE. Simulates visual perception, making it more suitable for assessing image quality from the perspective of the human eye. Why it’s useful: SSIM helps to assess how well the basic structures and tex- tures of the original image are preserved after text replacement, which is impor- tant for maintaining the style and authenticity of visual content. 3. Peak Signal-to-Noise Ratio (PSNR) Peak Signal-to-Noise Ratio (PSNR) — is a metric used to evaluate the differences between an original and a generated image by measuring the level of noise or distortion. PSNR is calculated based on MSE and is expressed in decibels (dB) [12]: 2 10 MAX PSNR 10 log MSE         , where MAX — is the maximum possible pixel value in the image (for example, 255 for 8-bit images), and MSE is the root mean square error. Advantages of PSNR: Sensitive to small changes in brightness and colour. A high PSNR value indicates that the image is close to the original. Why it’s useful: PSNR allows you to evaluate how well the system preserves brightness and contrast after text replacement, which is important for ensuring the visual quality of the image after editing. 4. Fréchet Inception Distance (FID) Fréchet Inception Distance (FID) — is a metric used to evaluate the quality of generated images based on the similarity of distributions between real and generated images. FID measures the distance between the Gaussian distributions of features from images that are extracted using a pre-trained neural network (usually InceptionV3) [13]. FID calculation formula: ))ΣΣ(2Σ(ΣTrμμFID 2/1 genoriggenorig 2 genorig  , where orig and gen — average feature values for original and generated im- ages, origΣ and genΣ — covariation of these features. Advantages of: Evaluates not only pixel similarity, but also deeper features of images, mak- ing it more flexible for evaluating high-level stylistic characteristics. P. Maslianko, M. Romanov ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 78 Metric sensitivity to small changes that can be important for generating visually realistic images. Why it’s useful: FID is one of the main metrics for evaluating the quality of generative models such as StyleGAN. It allows you to evaluate how well the system is able to generate realistic images while preserving the visual and stylistic characteristics of the text and overall composition. Conclusion. The use of MSE, SSIM, PSNR, and FID metrics provides a compreh ensive approach to evaluating the performance of a style-preserving text-to-picture system. MSE and PSNR evaluate the pixel-by-pixel reproduction accuracy, SSIM takes into account structural similarity, and FID allows you to determine how realistic the generated images look. Together, these metrics help to comprehensively evaluate the quality of the system and its ability to accurately reproduce the stylistic and visual features of images after text replacement. Verification and validation of the system engineering method The fourth stage involves verification and validation of the conceptual model and the developed version of the image text replacement system with style preserva- tion. The model was built on the basis of the TextStyleBrush model architecture and trained for 50 epochs on a dataset consisting of 20 thousand segmented im- ages. The system was also verified and validated, and the results are shown in the following Table 2. T a b l e 2 . Comparative analysis of the use of metrics to assess the quality of the developed system Metric Test set 1 Test set 2 Test set 3 Average MSE 0.0196 0.0229 0.0288 0.0238 SSIM 0.5071 0.5489 0.4460 0.5007 PSNR 15.3077 15.8087 15.4113 15.5092 FID 1.3876 0.7256 1.1510 1.0881 Interpreting the results of metrics such as MSE, SSIM, PSNR, and FID can give you an idea of the quality of the generated images compared to the original ones. Here is a breakdown of what each metric typically shows: 1. Mean squared error (MSE) MSE quantifies the root mean square difference between the pixel values of two images. Lower values indicate better similarity. Although there is no absolute scale, an MSE value close to zero indicates high similarity, while higher values indicate greater divergence. 2. Structural similarity index (SSIM) The SSIM ranges from -1 to 1, where 1 indicates complete similarity. An SSIM value of 0.5 indicates moderate structural similarity between images. In general, values above 0.7 are considered acceptable, and values above 0.9 are considered excellent. 3. Peak signal-to-noise ratio (PSNR) PSNR is a logarithmic measure that compares the maximum possible signal power with the power of distorting noise. In image processing, a PSNR of 20–30 dB is generally considered acceptable, while a value of more than 30 dB is good and more than 40 dB is excellent. A PSNR value of 15 dB indicates poor quality, often indicating that the generated images are very different from the originals. 4. Fréchet Inception Distance (FID) A conceptual model and a system for replacing text in an image while preserving… Системні дослідження та інформаційні технології, 2025, № 2 79 FID compares the distribution of generated images to real images. Lower FID scores indicate better quality and diversity of the generated images. Scores below 10 are generally considered good, while scores above 50 indicate poor quality. A FID score of 1.0881 indicates that your GAN is generating high quality images that are very similar to the real dataset. Overall score:  MSE: Low, which is good.  SSIM: Moderate; can be improved.  PSNR: Low; indicates noticeable differences in image quality.  FID: Excellent; indicates good GAN performance. Conclusion. MSE, SSIM, and PSNR metrics are used in almost every work related to generative networks, they are simple and quick to calculate, but such simplicity is usually not suitable for evaluating such systems, it is more done to be able to compare results with older works. It is better to focus on the results of the FID metric, which more accurately reflects the result for our particular problem. CONCLUSIONS 1. A method of system engineering of a conceptual model and a system for re- placing text in images with preservation of style is proposed, based on the modi- fied Eriksson–Penker business profile [5–8] of the system’s representation at the meta-level, as well as international standards of DataScience [2] and DataMining [3] processes, which is the basis for algorithmizing the development of specific system components. The effectiveness of the method is investigated on the exam- ple of developing a system for automatic text replacement in an image while pre- serving the style. 2. The proposed method of system engineering is aimed at developing specialized systems designed to replace text on design layouts and various products of companies and brands that need to adapt their visual materials for different markets and languages for promoting goods and services. 3. The use of the system engineering method significantly speeds up and streamlines the implementation of a particular system and reduces the cost of its development, verification and validation. 4. Prospects for further research are aimed at applying the system engineering method to implement a system based on other mathematical models, forming performance evaluation metrics and scientifically sound methods for verifying and validating the method. REFERENCES 1. P.Р. Maslianko, O.S. Maystrenko, “The system engineering of organizational system informatization projects,” KPI Sci. News, no. 6, pp. 34–42, 2008. 2. C. O’Neil, R. Schutt, Doing data science: Straight talk from the frontline. O’Reilly Media, Inc., 2013, 406 p. 3. F. Provost, T. Fawcett, Data science for business: What you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc., 2013. 4. Pavlo P. Maslianko, Yevhenii P. Sielskyi, “Method of system engineering of neural machine translation systems,” KPI Science News, no. 2, pp. 46–55, 2021. doi: https://doi.org/10.20535/kpisn.2021.2.236939 5. H.-E. Eriksson, M. Penker, Business modeling with UML. New York: John Wiley & Sons, 2000, 459 p. P. Maslianko, M. Romanov ISSN 1681–6048 System Research & Information Technologies, 2025, № 2 80 6. A. Kossiakoff, W. Sweet, S. Seymour, S. Biemer, System Engineering Principles and Practice. M.: DMK Press, 2014, 624 p. 7. D.K. Hitchins, Systems Engineering: A 21st Century Systems Methodology. Wiley, 2007, 528 p. 8. S.Krymskyi, “Metod,” in Filosofskyi Entsyklopedychnyi Slovnyk; V.I. Shynkaruk, Ed. Kyiv, Ukraine: Abrys, 2002, 742 p. doi: https://doi.org/10.20535/ kpisn.2021.2.236939 9. Praveen Krishnan, Rama Kovvuri, Guan Pang, Boris Vassilev, Tal Hassner, “TextStyle- Brush: Transfer of Text Aesthetics from a Single Example,” Journal of Latex Class Files, vol. 14, no. 8, August 2015. Available: https://arxiv.org/pdf/ 2106.08385 10. X. Huang, S. Belongie, “Arbitrary Style Transfer in Real-Time with Adaptive In- stance Normalization,” 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 1510–1519. doi: https://doi.org/10.1109/ICCV.2017.167 11. Peter Ndajah, Hisakazu Kikuchi, Masahiro Yukawa, Hidenori Watanabe, Shogo Mu- ramatsu, “SSIM image quality metric for denoised images,” International Confer- ence on Visualization, Imaging and Simulation(VIS '10), pp. 53–57, 2010. 12. A. Horé, D. Ziou, “Image Quality Metrics: PSNR vs. SSIM,” 20th International Conference on Pattern Recognition, Istanbul, Turkey, 2010, pp. 2366–2369. doi: https://doi.org/10.1109/ICPR.2010.579 13. Yaniv Benny, Tomer Galanti, Sagie Benaim, Lior Wolf, “Evaluation Metrics for Conditional Image Generation,” International Journal of Computer Vision, 129, pp. 1712–1731, 2021. doi: https://doi.org/10.1007/s11263-020-01424-w Received 25.11.2024 INFORMATION ON THE ARTICLE Pavlo P. Maslianko, ORCID: 0000-0003-4001-7811, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: masliankop@gmail.com Mykola D. Romanov, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: kolya.romanov8@gmail.com КОНЦЕПТУАЛЬНА МОДЕЛЬ ТА СИСТЕМА ДЛЯ ЗАМІНИ ТЕКСТУ НА ЗОБРАЖЕННІ ЗІ ЗБЕРЕЖЕННЯМ СТИЛЮ / П.П. Маслянко, М.Д. Романов Анотація. Заміна тексту на зображенні, зокрема, зі збереженням його стилю, є складним завданням, яке потребує вирішення низки наукових задач та нових технічних рішень. Однією з основних проблем є збереження автентичності та гармонійності зображення після внесення змін. Мета дослідження — розроб- лення концептуальної моделі та системи заміни тексту на зображеннях зі збе- реженням його стилю на основі методології системної інженерії та бізнес- профілю Еріксона–Пенкера, забезпечуючи природну інтеграцію нових тексто- вих елементів у контекст зображення. Методика реалізації — методологія сис- темної інженерії і бізнес-профіль Еріксона–Пенкера для формалізації впоряд- кованого процесу розроблення системи для заміни тексту на зображенні зі збереженням стилю. Результати дослідження — метод розроблення системи на основі застосування технік системної інженерії, який складається з чотирьох основних етапів. На першому етапі структуру системи моделюють як бізнес- профіль Еріксона–Пенкера, на другому — визначають множину процесів, ха- рактерну для класу систем Data Science та міжнародного стандарту CRISP-DM, моделюють структурне і динамічне представлення концептуальної моделі сис- теми та інтерфейси взаємодії компонентів, на третьому етапі виконують імп- лементацію конкретної версії системи, а на четвертому — верифікація та валі- дація системи. Запропоновано метод системної інженерії концептуальної моделі та системи заміни тексту на зображеннях зі збереженням його стилю, що ґрунтується на модифікованому бізнес-профілі Еріксона–Пенкера подання системи на метарівні, а також міжнародних стандартів процесів Data Science та Data Mining. Ключові слова: метод системної інженерії, бізнес профіль Еріксона–Пенкера, концептуальна модель, система для заміни тексту на зображенні зі збережен- ням стилю.
id	journaliasakpiua-article-335821
institution	System research and information technologies
keywords_txt_mv	keywords
language	English
last_indexed	2025-09-17T09:26:03Z
publishDate	2025
publisher	The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
record_format	ojs
resource_txt_mv	journaliasakpiua/d6/1c1ab8e4af3bd066d118a4936c2b03d6.pdf
spelling	journaliasakpiua-article-3358212025-07-25T15:56:08Z A conceptual model and a system for replacing text in an image while preserving the style Концептуальна модель та система для заміни тексту на зображенні зі збереженням стилю Maslianko, Pavlo Romanov, Mykola метод системної інженерії бізнес профіль Еріксона–Пенкера концептуальна модель система для заміни тексту на зображенні зі збереженням стилю systems engineering method Eriksson-Penker business profile conceptual model system for text replacement in images with style preservation Text replacement in images, particularly while preserving its style, is a complex task that requires solving a range of scientific challenges and developing new technical solutions. One of the main issues is maintaining the authenticity and harmony of the image after modifications. The Research Objective is the development of a conceptual model and a system for text replacement in images with style preservation based on systems engineering methodology and the Eriksson-Penker business profile, ensuring the natural integration of new text elements into the image’s context. Implementation Methodology – the systems engineering methodology and the Eriksson-Penker business profile are used to formalize the structured process of developing a system for text replacement in images with style preservation. Research Results – a method for developing the system based on systems engineering techniques was proposed, consisting of four main stages. In the first stage, the system structure is modeled as an Eriksson-Penker business profile. In the second stage, a set of processes is defined that are characteristic of the Data Science system class and the CRISP-DM international standard. Also, the structural and dynamic representations of the conceptual model, as well as the component interaction interfaces, are modeled. The third stage involves implementing a specific version of the system, while the fourth stage focuses on system verification and validation. A systems engineering method for the conceptual model and system for text replacement in images with style preservation has been proposed. It is based on a modified Eriksson-Penker business profile for metalevel system representation and international standards for Data Science and Data Mining processes. Заміна тексту на зображенні, зокрема, зі збереженням його стилю, є складним завданням, яке потребує вирішення низки наукових задач та нових технічних рішень. Однією з основних проблем є збереження автентичності та гармонійності зображення після внесення змін. Мета дослідження — розроблення концептуальної моделі та системи заміни тексту на зображеннях зі збереженням його стилю на основі методології системної інженерії та бізнес-профілю Еріксона–Пенкера, забезпечуючи природну інтеграцію нових текстових елементів у контекст зображення. Методика реалізації — методологія системної інженерії і бізнес-профіль Еріксона–Пенкера для формалізації впорядкованого процесу розроблення системи для заміни тексту на зображенні зі збереженням стилю. Результати дослідження — метод розроблення системи на основі застосування технік системної інженерії, який складається з чотирьох основних етапів. На першому етапі структуру системи моделюють як бізнес-профіль Еріксона–Пенкера, на другому — визначають множину процесів, характерну для класу систем Data Science та міжнародного стандарту CRISP-DM, моделюють структурне і динамічне представлення концептуальної моделі системи та інтерфейси взаємодії компонентів, на третьому етапі виконують імплементацію конкретної версії системи, а на четвертому — верифікація та валідація системи. Запропоновано метод системної інженерії концептуальної моделі та системи заміни тексту на зображеннях зі збереженням його стилю, що ґрунтується на модифікованому бізнес-профілі Еріксона–Пенкера подання системи на метарівні, а також міжнародних стандартів процесів Data Science та Data Mining. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025-06-28 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/335821 10.20535/SRIT.2308-8893.2025.2.04 System research and information technologies; No. 2 (2025); 61-80 Системные исследования и информационные технологии; № 2 (2025); 61-80 Системні дослідження та інформаційні технології; № 2 (2025); 61-80 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/335821/324682
spellingShingle	метод системної інженерії бізнес профіль Еріксона–Пенкера концептуальна модель система для заміни тексту на зображенні зі збереженням стилю Maslianko, Pavlo Romanov, Mykola Концептуальна модель та система для заміни тексту на зображенні зі збереженням стилю
title	Концептуальна модель та система для заміни тексту на зображенні зі збереженням стилю
title_alt	A conceptual model and a system for replacing text in an image while preserving the style
title_full	Концептуальна модель та система для заміни тексту на зображенні зі збереженням стилю
title_fullStr	Концептуальна модель та система для заміни тексту на зображенні зі збереженням стилю
title_full_unstemmed	Концептуальна модель та система для заміни тексту на зображенні зі збереженням стилю
title_short	Концептуальна модель та система для заміни тексту на зображенні зі збереженням стилю
title_sort	концептуальна модель та система для заміни тексту на зображенні зі збереженням стилю
topic	метод системної інженерії бізнес профіль Еріксона–Пенкера концептуальна модель система для заміни тексту на зображенні зі збереженням стилю
topic_facet	метод системної інженерії бізнес профіль Еріксона–Пенкера концептуальна модель система для заміни тексту на зображенні зі збереженням стилю systems engineering method Eriksson-Penker business profile conceptual model system for text replacement in images with style preservation
url	https://journal.iasa.kpi.ua/article/view/335821
work_keys_str_mv	AT masliankopavlo aconceptualmodelandasystemforreplacingtextinanimagewhilepreservingthestyle AT romanovmykola aconceptualmodelandasystemforreplacingtextinanimagewhilepreservingthestyle AT masliankopavlo konceptualʹnamodelʹtasistemadlâzamínitekstunazobražennízízberežennâmstilû AT romanovmykola konceptualʹnamodelʹtasistemadlâzamínitekstunazobražennízízberežennâmstilû AT masliankopavlo conceptualmodelandasystemforreplacingtextinanimagewhilepreservingthestyle AT romanovmykola conceptualmodelandasystemforreplacingtextinanimagewhilepreservingthestyle

Концептуальна модель та система для заміни тексту на зображенні зі збереженням стилю

Institution

Ähnliche Einträge