Оцінювання якості моделей та методів глибокого навчання для формування суперроздільних зображень
This article examines evaluation metrics for the results of super-resolution image generation in solving the SISR task. The study comprises two experiments: the implementation of custom network architectures for SRGAN, VDSR, and SRCNN, and fine-tuning of pre-trained SRGAN, VDSR, and SRCNN models. An...
Gespeichert in:
| Datum: | 2025 |
|---|---|
| Hauptverfasser: | , |
| Format: | Artikel |
| Sprache: | Englisch |
| Veröffentlicht: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2025
|
| Schlagworte: | |
| Online Zugang: | https://journal.iasa.kpi.ua/article/view/351424 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | System research and information technologies |
| Завантажити файл: | |
Institution
System research and information technologies| _version_ | 1867334455919640576 |
|---|---|
| author | Lanko, Anna Nedashkovskaya, Nadezhda |
| author_facet | Lanko, Anna Nedashkovskaya, Nadezhda |
| author_institution_txt_mv | [
{
"author": "Anna Lanko",
"institution": "National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv"
},
{
"author": "Nadezhda Nedashkovskaya",
"institution": "National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv"
}
] |
| author_sort | Lanko, Anna |
| baseUrl_str | http://journal.iasa.kpi.ua/oai |
| collection | OJS |
| datestamp_date | 2026-02-02T20:49:24Z |
| description | This article examines evaluation metrics for the results of super-resolution image generation in solving the SISR task. The study comprises two experiments: the implementation of custom network architectures for SRGAN, VDSR, and SRCNN, and fine-tuning of pre-trained SRGAN, VDSR, and SRCNN models. An algorithm for assessing the quality of models and deep learning methods for generating super-resolution images is suggested. The VDSR model performed best in terms of pixel, structural, and perceptual metrics, as well as training time and visual confirmation by a human, highlighting that residual learning is more effective than recursive learning under the conditions of the two conducted experiments. Threshold values for practically acceptable and high-quality results were determined through visual analysis of many generated images and their corresponding quality metrics, including those reported by other researchers. |
| doi_str_mv | 10.20535/SRIT.2308-8893.2025.4.06 |
| first_indexed | 2026-02-08T08:06:12Z |
| format | Article |
| fulltext |
N. Nedashkovskaya, A. Lanko, 2025
104 ISSN 1681–6048 System Research & Information Technologies, 2025, № 4
UDC 519.816; 004.032.26; 004.9; 004.85
DOI: 10.20535/SRIT.2308-8893.2025.4.06
QUALITY ASSESSMENT OF MODELS AND DEEP LEARNING
METHODS FOR SUPER-RESOLUTION IMAGE FORMATION
N. NEDASHKOVSKAYA, A. LANKO
Abstract. This article examines evaluation metrics for the results of super-resolution
image generation in solving the SISR task. The study comprises two experiments:
the implementation of custom network architectures for SRGAN, VDSR, and
SRCNN, and fine-tuning of pre-trained SRGAN, VDSR, and SRCNN models. An
algorithm for assessing the quality of models and deep learning methods for generat-
ing super-resolution images is suggested. The VDSR model performed best in terms
of pixel, structural, and perceptual metrics, as well as training time and visual con-
firmation by a human, highlighting that residual learning is more effective than re-
cursive learning under the conditions of the two conducted experiments. Threshold
values for practically acceptable and high-quality results were determined through
visual analysis of many generated images and their corresponding quality metrics,
including those reported by other researchers.
Keywords: single image super-resolution, quality assessment, generative models,
deep learning methods, convolutional neural network, residual learning, recursive
learning, fine-tuning of pre-trained models, perceptual metric, LPIPS, multicriteria
decision analysis, DIV2K dataset, thresholds for practically acceptable and high-
quality generated images.
INTRODUCTION
The task of Single Image Super-Resolution (SISR) involves the formation of
highly detailed versions of low-resolution images [1]. Despite significant progress
in modern imaging technologies, this task remains relevant due to such factors as
image quality deterioration after transmission through communication channels
and hardware failures, image compression for compact storage on data carriers,
and the inability to use professional equipment in certain natural conditions.
The goal of SISR methods is to create high-quality images by restoring or
adding details missing in the original low-resolution images. To achieve this, gen-
erative models and deep learning methods are used [2].
Generative models form new parts by simulating the data distribution in the
training selection [2]. Among them, the most common for SISR are modifications
of generative adversarial networks (GAN); diffusion models are more complex
and efficient, the use of streaming models and autoencoders is also known [3].
Deep learning methods analyze important features of training images to re-
construct image details [2]. These include convolutional neural networks (CNN),
recurrent neural networks (RNN), and residual neural networks (ResNet) [3]. It is
important to note that they are often part of architecture of generative models that
implement a particular learning principle. For example, the generator and dis-
criminator in a GAN are deep neural networks.
SISR models are trained by learning pairs of low- and high-resolution im-
ages from the training selection. The effectiveness of super-resolution image gen-
Quality assessment of models and deep learning methods for super-resolution image formation
Системні дослідження та інформаційні технології, 2025, № 4 105
eration is assessed based on a set of indicators, which must include both quantita-
tive and perceptual metrics. An important step in evaluating the results of SISR is
the visual analysis of the generated images by a human.
It should be noted that SISR algorithms are complex and time-consuming, so
they require powerful computing resources, and model optimization is still the
main focus of researchers’ work on this topic. That is why, when choosing the
optimal model, technical indicators are added to the evaluation criteria, including
time of training, training cost, and the availability of a hardware accelerator in the
form of a graphics processing unit (GPU) [4].
PROBLEM STATEMENT
Let us introduce the notation H for height, W for width, аnd C for the number
of image channels (e.g. RGB). Let CWH
LR RI be a low-resolution image, and
CWH
HR RI be its corresponding high-resolution image. The goal of the SISR
problem is to find the following mapping
,: HRLR IIf (1)
that will ensure the most accurate recovery of the details of the HRI image based
on the information from the LRI .
Mapping (1) is a formalization tool, as it can describe different processes de-
pending on the resolution enhancement method. That is why we will further con-
sider the implementation of (1), the model Ff , where are the model pa-
rameters, F is the set of all SISR models. The target super-resolution image is
the output of f and the result of solving the problem:
).( LRSR IfI
An important step in the process of training models from F is to solve the
optimization problem
) ,, (min SRHR IIL
where ), ( SRHR IIL is the model loss function. The objective is to find such model
parameters that the value of the loss function L is minimal.
In this paper, the task of multicriteria quality assessments of images gener-
ated (formed) by different models and deep learning methods is set. Let
},,2,1|{ niaA i be a set of super-resolution images SRI , generated by dif-
ferent deep learning models based on a single low-resolution image LRI ;
},,2, 1|{ mjcC j be a set of quality criteria for the generated images and
technical characteristics of model training. In the following, ia will be considered
as alternatives, and jc as decision criteria.
The task is to find the aggregated or global weights
},,2,1 |{ niwW aggr
i
aggr (2)
of alternative generated (formed) images according to a set of criteria from C
and selection of the best generated image.
N. Nedashkovskaya, A. Lanko
ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 106
The quality criteria for the generated images are:
traditional quantitative metrics PSNR [5], SSIM [6], MSSIM [6] (1st
group of criteria);
perceptual indicators BRISQUE [7], NIQE [8], PIQUE [9], LPIPS [10]
and their modifications (e.g., LR-PSNR) (2nd group).
The decision criteria also include technical characteristics (3rd group):
training time and cost;
availability of a hardware accelerator in the form of a graphics processing
unit (GPU).
The purpose of the studied generative models and deep learning methods is
to increase the resolution of images, scale them by 4, 8, or more times, and gener-
ate realistic and beautiful images based on a given low-resolution image for fur-
ther display of the generated images on large screens and human perception.
Therefore, another group of criteria (4th group) ensures that the generated image is
evaluated directly by a human: effects of smoothing, blurring, edge lightening,
and photorealism of the image.
The coefficients of relative importance of decision criteria are determined by
decision support methods [11–13] using expert pairwise comparison judgements
depending on the application. The interdependence between individual decision
criteria and the need to take into account fuzzy judgements provided by an expert
require the use of hybrid methods [14; 15].
MATERIALS AND METHODS
Deep learning models for generating super-resolution images
The following models were used in the study, representing generative and deep
learning methods.
1. SRGAN (Super-Resolution Generative Adversarial Network) is a
generative adversarial network for increasing the resolution, where the generator
creates super-resolution images, and the discriminator is trained to recognize real
and generated images. The generator is optimized using a combination of loss
functions: adversarial loss for plausibility and content loss for pixel accuracy. Full
implementations also use a perceptual loss function to improve textures [16].
2. VDSR (Very Deep Super Resolution) is a very deep convolutional neu-
ral network for resolution enhancement tasks [17]. Its main advantage is usage of
residual connections, which allow the model to learn from the difference between
the input low-resolution image and the corresponding super-resolution image.
This reduces the risk of gradient vanishing during training, accelerates conver-
gence and increases training stability. Due to a large number of convolutional lay-
ers, VDSR effectively captures both fine textures and complex structures of ob-
jects in the image, which ensures high-quality results.
3. DRCN (Deeply-Recursive Convolutional Network) uses the concept of
recursive blocks, where the same set of parameters is applied repeatedly. This
allows for significant depth without increasing the number of model parameters,
which reduces its computational complexity and memory requirements. As a re-
sult, DRCN effectively recovers the details of a high-resolution image while
maintaining resource efficiency. The network also uses methods of averaging the
Quality assessment of models and deep learning methods for super-resolution image formation
Системні дослідження та інформаційні технології, 2025, № 4 107
output results, supervised skip connections, which increase the stability and accu-
racy of recovery of details [18].
4. SRCNN (Super-Resolution Convolutional Neural Network) is a con-
volutional neural network for resolution enhancement that performs the following
three sequential operations: interpolation of the input image to high resolution,
feature extraction using convolutional layers, and reconstruction of the super-
resolution image [19]. The model is simple and efficient, but limited in depth and
ability to reconstruct complex textures. In this study, it is used as a discriminator
in our implementation of SRGAN, as well as a separate pre-trained model in the
framework of retraining experiments.
Two types of blocks were also used in the networks:
1) a residual block to maintain the stability of the gradients;
2) a recursive block that repeats convolutional layers with the same weights
multiple times to enhance the selected features and create a more complex archi-
tecture.
The architecture of the implemented models [20] is shown in Table 1, and
the architecture of their component blocks is further explained in Table 2.
T a b l e 1 . Architecture of the implemented models in-house
Model Architecture
Generator
SRResNet
Consists of an initial 9×9 convolutional layer, 5 residual blocks
(ResidualBlock), an intermediate 3×3 convolutional block,
a resolution upscaling block (2 3×3 convolutional layers
with PixelShuffle), and a final 9×9 convolutional layer
SRGAN
Discriminator
SRCNN
Consists of 8 3×3 convolutional layers with increasing number of
channels with normalization (BatchNorm2d) and LeakyReLU acti-
vation, 1 adaptive averaging layer and 2 final fully connected lay-
ers. The filter size for all convolutional layers is 3×3
VDSR
Consists of an initial convolutional layer, 18 convolutional layers
with ReLU activation, and an output layer that adds the residual to
the input image. The filter size for all convolutional layers is 3×3
DRCN
Consists of an input convolutional layer, a recursive block (Recursive-
Block) that is repeated a specified number of times (16), and an output
convolutional layer. The filter size for all convolutional layers is 3×3
T a b l e 2 . Architecture of the model components
Model Architecture
ResidualBlock
Contains 2 3×3 convolutional layers, a normalization layer (Batch-
Norm2d) after each convolutional layer, and a PReLU activation function
after the 1st layer
RecursiveBlock Contains 1 3×3 convolutional layer with ReLU activation
Algorithm for training and evaluation models from scratch
The following algorithm for training SRGAN, VDSR, and DRCN models for
generating super-resolution images and evaluation of these models in terms of
quantitative and perceptual indicators is suggested:
1. Splitting the set into training and validation samples. In the case of using
the DIV2K set [1], this stage is skipped, since the images are already distributed
in the set.
N. Nedashkovskaya, A. Lanko
ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 108
2. Initialization of model weights using the methods of Kaiming He [21] or
Xavier Glaurot [22], depending on the characteristics of the model to be trained.
3. Training on a given number of epochs (200 for the generating model with
a batch size of 16; and 100 epochs for deep learning methods with a batch size of
32) on the training set with tracking the values of the loss function (adversarial
loss (MSE+BCE) for the generating model, MSE for deep learning methods).
4. Saving model weights in case of training interruption or early stopping.
5. Calculating the training time of models.
6. Evaluation of the results on the test sample: calculation of the quantitative
indicators PSNR, SSIM, MSSIM and the perceptual indicator LPIPS of the gener-
ated images. The pre-trained VGG network19 is used to calculate the LPIPS met-
ric. The average value of the indicators for each model is presented for 10 random
images.
Algorithm for training models using pre-training technology
An algorithm for training of pre-trained models for the formation of super-
resolution images is suggested, which consists of the following steps:
1. Careful selection of a pre-trained model, which must be aimed at the same
task and preferably trained on a large universal data set.
2. Loading the weights for the selected model, with the values of which
training will continue.
3. Determine the number of epochs for which the model should be retrained.
4. Fine-tuning the model: freezing layers (usually the initial ones) and add-
ing new ones which extract high-level features (residual blocks, convolutions with
small kernels, normalization layers, Upsampling or PixelShuffle), using a low
learning rate to ensure its stability, combining the main loss with the perceptual
loss to focus on the visual quality of the generated images.
5. Applying early stopping in case of signs of model overfitting according to
metrics PSNR, SSIM, MSSIM and a perceptual metric LPIPS.
The experiment on retraining of pre-trained models was conducted on 20 ep-
ochs. The purposes of the experiment are: to improve the result of image genera-
tion, as well as to check whether it is possible to obtain a result better than that of
other researchers [23], and whether overfitting is occur.
Quantitative and perceptual metrics and indicators
The quality of SISR models is traditionally evaluated based on metrics and indi-
cators that compare the SR image generated by the model with the original HR
image from a labeled test image set [24].
The classical PSNR (Peak Signal-to-Noise Ratio) metric has limitations for
evaluating structured data such as images, as it assumes pixel independence.
PSNR measures the difference between pixels of a pair of images as a ratio be-
tween the maximum possible signal strength and noise. For example, blurring an
image can cause a large perceptual change and at the same time a small change in
the 2L measure. SSIM [6] index assesses structural similarity of two images.
The perceptual distance estimates the similarity of high-level features of two
images similar to human visual perception. Perceptual indicators such as
BRISQUE [7], NIQE [8], PIQUE [9], LPIPS [10], and others have been sug-
gested. Let us describe some of them in more detail.
Quality assessment of models and deep learning methods for super-resolution image formation
Системні дослідження та інформаційні технології, 2025, № 4 109
SSIM (Structural Similarity Index Measure) evaluates the similarity of two im-
ages x and y based on three image components: brightness, contrast, and structure [6]:
)],([)],([)],([),( yxsyxcyxlyxSSIM ,
where 0 ,, are the coefficients of relative importance of the three compo-
nents, are the parameters.
The SSIM satisfies the symmetry properties ),(),( xySSIMyxSSIM ;
boundedness 1),( yxSSIM ; and unique maximum: 1),( yxSSIM if and only if
yx .
Later, the authors of [6] move on to a following simplified expression:
)()(
)2()2(
),(
2
22
1
22
21
CC
CC
yxSSIM
yxyx
xyyx
, (3)
where x is the average image intensity value x ; x is the standard deviation for
image x , which serves as an unbiased estimate of its contrast; xy is the covari-
ance between two images x and y , which is the basis for comparing image struc-
tures after subtracting brightness and normalizing variance, and also use the fol-
lowing modified estimates of local statistics x , x та xy :
ii
N
i
x xv
1
,
2/1
2
1
xii
N
i
x xv ;
)()(
1
yixii
N
i
xy yxv
with a circularly symmetric normalized Gaussian weight function ivv i |{
}, ,2,1 N with a standard deviation of 1.5 samples, 1
1
i
N
i
v , and a sliding
window approach that ensures the property of local isotropy of the quality maps.
The constants 1C і 2C are included in (3) to avoid instability when the ex-
pressions 22
yx і 22
yx are practically zero. 2
11 )( LKC and 2
22 )( LKC
are defined, where L is the dynamic range of pixel values, e.g., 255L for 8-bit
grayscale images, and 11 K and 12 K are small constants, for example,
01.01 K , 03.02 K [6].
In practice, in cases where a single overall measure of quality of the entire
image is required, the average value of SSIM indices (3) over a set of image pix-
els called MSSIM is suggested, which aggregates the structural similarity between
the reference and distorted images. MSSIM is calculated as the arithmetic mean
of ),( jj yxSSIM over the image content in the j-th local window [6].
In this paper, a weighted average of different samples in the SSIM index
map is proposed:
),(),(_
1
jjj
M
j
yxSSIMwYXSSIMWM
,
N. Nedashkovskaya, A. Lanko
ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 110
where M is the number of local windows in the image, jx and jy are the con-
tent of the reference X and distorted Y images at the j -th local window, and
jw are weighting coefficients for different samples (e.g. different image textures
attract a person’s attention with varying degrees). Weights jw are calculated de-
pending on the practical problem by analyzing decision hierarchies or networks
with the consideration of human assessments [11; 12; 14].
LPIPS (Learned Perceptual Image Patch Similarity) is a perceptual metric
that aimed at evaluating the visual perception of an image by a person at the level
of details and uses deep neural networks to assess the visual similarity of a pair of
features based on extracted features [10]:
,
)()(
),(
2
2
lll
HRlSRl
l
l
HRSR CWH
II
wIILPIPS
where ) ( SRl I is an activation of VGG or another deep network on the l-th layer
for the image SRI ; lH , lW , lC are the height, width and number of channels of
the l-th feature map; lw is a weighting factor that adjusts the contribution of dif-
ferent layers.
An explanation of the values for each indicator is provided in Table 3.
Through visual analysis of a large number of generated images and the corre-
sponding values of quality indicators, thresholds for practically acceptable and
high-quality results were obtained, which are given in the last two columns of
Table 3.
T a b l e 3 . Indicator analysis criteria for the SISR task [20]
Indicator Value range Practically
acceptable result High-quality result
PSNR↑ [0; 1] >20 >30
MSSIM↑ [0; 1] >0.7 >0.9
LPIPS↓ [0; 1] <0.3 <0.1
For an objective evaluation of the models, it is necessary to add the training
time of the models to the indicator analysis. Attention should also be paid to the
fact that the indicator values are not worse than the bicubic increase (scaling LR
to HR), as this will indicate extremely poor quality of the models even if practi-
cally acceptable values are obtained.
Algorithm for assessing the quality of models and deep learning methods in
terms of multiple quantitative and qualitative criteria
Generative models and deep learning methods, which are studied, are aimed at
increasing the resolution of images, scale them by 4 or more times, and as a result
generate realistic and beautiful images for further human perception. Therefore, it
is necessary to add another group of qualitative decision criteria, including effects
of smoothing, blurring, edge lightening, and photorealism of the image. In terms
of these criteria, we evaluate the set of images (decision alternatives) generated by
different generative models and deep learning methods. Evaluation is made di-
rectly by a human using one of the paiwise comparison methods [11–15]. The
decision support (DS) problem of multiple criteria evaluation of decision alterna-
Quality assessment of models and deep learning methods for super-resolution image formation
Системні дослідження та інформаційні технології, 2025, № 4 111
tives can be solved using a systematic approach and methodology based on hier-
archical and network models [25]. On their basis, an algorithm to solve the prob-
lem is suggested, which has the following five stages:
1. Determine interdependencies among decision criteria and decision alter-
natives. A hierarchy or DS network is formed, which includes the overall goal —
selection of the best generated image, qualitative decision criteria: effects of
smoothing, blurring, edge lightening, and photorealism of the image, and decision
alternatives: image_SRGAN, image_VDSR and image_DRCN (Fig. 1).
2. The importance of the decision criteria in relation to the main goal is as-
sessed by experts using the pairwise comparison method on a special scale. Based
on the results of the assessment, pairwise comparison matrices (PCMs) are con-
structed, and the quality of expert opinions is analyzed and, if necessary, im-
proved using the method of evaluation and consistency improvement. The most
inconsistent expert opinion is founded. As a result, for all elements of the hierar-
chy or the DS network, we obtain a set of PCMs of acceptable quality.
3. The coefficients of relative importance (local weights) of the elements of
the hierarchy or the DS network are calculated based on the PCMs.
4. The local weights are aggregated using different methods depending on
whether the decision criteria are independent (hierarchy case), interdependent (hi-
erarchy case with a loop at the criterion level), or whether there are feedbacks
from alternatives to decision criteria (DS network case).
5. The sensitivity analysis of aggregated results (2) is performed.
The purposes of the algorithm are: to calculate local weights for decision al-
ternatives (image_SRGAN, image_VDSR, and image_DRCN) in terms of each
decision criteria, as well as to calculate aggregated weights and perform their sen-
sitivity analysis.
RESULTS OF THE EXPERIMENTS
Dataset
The DIV2K dataset [1] was introduced as part of the NTIRE 2017 Challenge on
Single Image Super-Resolution, held during the CVPR Workshops 2017 confer-
ence. It was created to enhance the effectiveness of solving the SISR problem by
Fig. 1. An hierarchy for assessing the quality of images generated by different models
N. Nedashkovskaya, A. Lanko
ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 112
addressing the limitations of existing datasets, namely insufficient scene diversity
and the limited number of images.
DIV2K consists of a labeled set of 1000 pairs of low-resolution (LR) and
high-resolution (HR) color images. The dataset is divided into three subsets: 800
samples for training, 100 samples for testing, and 100 samples for validation.
Historically, the test set was designed for contestants to evaluate their models
after training, while the validation set was reserved for organizers to determine
the winners. The validation set initially included only LR images, and participants
were required to generate their super-resolution (SR) counterparts. Once the HR
versions of the validation set were made publicly available, both the test and
validation sets could be utilized to assess model performance (Fig. 2).
The low-resolution (LR) images in the DIV2K dataset are derived from the
original high-resolution (HR) images using either bicubic downscaling or more
advanced methods that simulate real-world degradations. These methods include
modeling blurring caused by motion, introducing fractional noise, and applying
distortions due to uneven pixel mapping, among others.
The dataset includes images reduced by scaling factors of 2 (2), 3 (3), and
4 (4). Greater downscaling significantly diminishes image quality (Fig. 3) while
also reducing the time required for model training. The classical approach to Sin-
gle Image Super-Resolution (SISR) typically employs LR images generated
through a 4-fold reduction of the original HR images using bicubic interpolation.
After its introduction in 2017, the DIV2K dataset has been extensively used
to evaluate various super-resolution (SR) models, including in studies conducted
in 2019 [26], 2020 [23], and 2023 [27].
DIV2K 100 validatiiion images DIV2K 100 test images
Fig. 2. Example of images for model evaluation from the DIV2K set [1]
Fig. 3. Demonstration of image quality deterioration with a 2 and 4 times reduction in resolution
Quality assessment of models and deep learning methods for super-resolution image formation
Системні дослідження та інформаційні технології, 2025, № 4 113
Training process and results
In the first experiment (Section 3.2), we trained our own implementations of the
SRGAN, VDSR, and DRCN models from scratch using the DIV2K dataset. The
optimization processes of their respective loss functions during training are illus-
trated in Figs. 4 and 5, while the metric values obtained are presented in Table 4.
The second experiment (Section 3.3) involved retraining the previously
trained SRGAN, VDSR, and DRCN models. The results of this retraining process
are provided in Table 5, and the evolution of perceptual quality, as measured by
the LPIPS metric, is shown in Fig. 6. For the pre-trained models, we used imple-
mentations of SRGAN [16; 28], VDSR [17; 29], and SRCNN [19; 30].
Fig. 5. The process of optimising the loss functions of VDSR and DCRN networks
1
2
1 –
2 –
M
S
E
L
os
s
Epoch
Fig. 4. The process of optimising the loss functions of the generator and discriminator of
the SRGAN model [20]
1
2
1 –
2 –
L
os
s
Epochs
N. Nedashkovskaya, A. Lanko
ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 114
T a b l e 4 . Values of quality indicators of the generated super-resolution images
for our own model implementations at 4-fold image magnification [20]
Indicator Model
PSNR↑ MSSIM↑ LPIPS↓
Training time
(h)
Bicubic 25.80 0.74 0.46 –
SRGAN 24.50 0.71 0.33 32
VDSR 26.73 0.77 0.31 16
DRCN 26.41 0.76 0.37 25
T a b l e 5 . Values of quality indicators of images enlarged by 4 times as a result
of retraining of pre-trained models
Indicator
Model
PSNR↑ MSSIM↑ LPIPS↓
Training time
(min)
Bicubic 25.80 0.74 0.46 –
EDSR [31] 28.98 0.83 0.270 –
RRDB [32] 29.44 0.84 0.253 –
ESRGAN [32] 26.22 0.75 0.124 –
pre-trained SRGAN 26.9 0.79 0.16 27
pre-trained VDSR 28.9 0.84 0.1 11
pre-trained SRCNN 27.5 0.81 0.12 2
The software solutions for these experiments were developed in the Jupyter
Notebook environment using Python, along with the PyTorch library for model
development and the matplotlib library for visualization. The models were trained
on a PC equipped with an Nvidia GeForce RTX 4060 GPU accelerator.
ANALYSIS OF THE RESULTS AND DISCUSSION
The results of the first experiment (Section 3.2, Figs. 4, 5, Table 4) demonstrate
practically acceptable outcomes for all considered models, with VDSR perform-
Fig. 6. Change in the perceptual quality of LPIPS images enlarged by a factor of 4 when
retraining pre-trained SRGAN, VDSR and SRCNN models
1
2
1 –
2 –
3 –
L
pi
ps
Epoch
3
Quality assessment of models and deep learning methods for super-resolution image formation
Системні дослідження та інформаційні технології, 2025, № 4 115
ing the best. This highlights, in particular, that residual learning proved to be
more effective than recursive learning. The SRGAN architecture, in this experi-
ment, was too simplistic for the given task, as generating new details often outper-
forms feature refinement.
A comparison of the results in Table 4 with those obtained by other re-
searchers [23] indicates that the metrics in Table 4 are worse than those reported
for other SISR models [23]. However, the visual comparison of the generated su-
per-resolution (SR) images with their low-resolution (LR) and high-resolution
(HR) counterparts (Fig. 7) shows satisfactory results, provided that the models
were trained using the algorithm proposed in Section 3.2.
The results of the second experiment (Section 3.3, Table 5), which employed
pre-training techniques, are comparable to those achieved by other researchers
[23]. Specifically, the VDSR model, implemented and fine-tuned using the algo-
rithm proposed in this study, achieved an MSSIM value of 0.84, which is on par
with the RRDB model [32] and surpasses the MSSIM values of other models de-
veloped and fine-tuned in this study: SRGAN (MSSIM = 0.79), SRCNN (MSSIM
= 0.81), as well as EDSR [31] and ESRGAN [32].
In terms of the perceptual quality metric LPIPS, the VDSR model trained
with the proposed algorithm outperformed other SRGAN and SRCNN models
implemented in this study, as well as the EDSR [31], RRDB [32], and ESRGAN
[32] models.
Fig. 7. Visual comparison of the generated SR images with the high-resolution (HR)
original and low-resolution (LR) input image for the proprietary implementation of the
VDSR model
N. Nedashkovskaya, A. Lanko
ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 116
The second experiment (Section 3.3) revealed no signs of overfitting, and the
generated SR images demonstrated high quality compared to the input LR-HR
pairs (Fig. 8). The VDSR model consistently produced the best visual results, un-
derscoring the advantage of feature enhancement when addressing SISR tasks for
highly detailed data and complex real-world scenes.
CONCLUSIONS
This study presents an algorithm for the comprehensive evaluation of image su-
per-resolution results based on quantitative metrics, perceptual indicators, techni-
cal characteristics, and aspects of human image perception. Threshold criteria for
practically acceptable and high-quality results were determined through visual
analysis of many generated images and their corresponding quality metrics, in-
cluding those obtained by other researchers.
The VDSR model was identified as the optimal one (among those consid-
ered) in terms of pixel, structural, and perceptual metrics, as well as training time.
The absence of overfitting and the quality of super-resolution images generated
by VDSR were visually confirmed on selected test set samples depicting various
Fig. 8. Visual comparison of the generated SR images with the high-resolution (HR)
original and low-resolution (LR) input image for the VDSR model trained with the
suggested algorithm
Quality assessment of models and deep learning methods for super-resolution image formation
Системні дослідження та інформаційні технології, 2025, № 4 117
shapes, textures, and color combinations. Overall, deep learning methods demon-
strated superiority over generative models in the conducted experiments based on
the results of the comprehensive evaluation.
REFERENCES
1. E. Agustsson, R. Timofte, “NTIRE 2017 Challenge on Single Image Super-
Resolution: Dataset and Study,” 2017 IEEE Conference on Computer Vision and
Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017.
doi: https://doi.org/10.1109/cvprw.2017.150
2. Z. Wang, J. Chen, S.C.H. Hoi, “Deep Learning for Image Super-resolution: A Sur-
vey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43,
no. 10, pp. 3365–3387, 2020. doi: https://doi.org/10.1109/tpami.2020.2982166
3. R. Timofte et al., “NTIRE 2017 Challenge on Single Image Super-Resolution:
Methods and Results,” 2017 IEEE Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017. doi:
https://doi.org/10.1109/cvprw.2017.149
4. T. Ausare, “Ultimate Guide to Selecting a GPU for Deep Learning. Latest AI, ML &
GPU Updates,” NeevCloud. Available: https://blog.neevcloud.com/ultimate-guide-
to-selecting-a-gpu-for-deep-learning
5. F.A. Fardo, V.H. Conforto, F.C. de Oliveira, P.S. Rodrigues, A Formal Evaluation of
PSNR as Quality Measurement Parameter for Image Segmentation Algorithms.
2016. doi: https://doi.org/10.48550/arXiv.1605.07116
6. Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, Eero P. Simoncelli, “Image Quality
Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on
Image Processing, vol. 13, issue 4, pp. 600–612, 2004. doi: https://doi.org/10.1109/
TIP.2003.819861
7. A. Mittal, A. Moorthy, A. Bovik, “Referenceless image spatial quality evaluation
engine,” in 45th Asilomar Conference on Signals, Systems and Computers, vol. 38,
pp. 53–54, 2011. doi: https://doi.org/10.1109/ACSSC.2011.6190099
8. A. Mittal, R. Soundararajan, A.C. Bovik, “Making a “completely blind” image qual-
ity analyser,” IEEE Signal Process. Lett., vol. 20, issue 3, pp. 209–212, 2013. doi:
https://doi.org/10.1109/LSP.2012.2227726
9. N. Venkatanath, D. Praneeth, Bh. Maruthi Chandrasekhar, S.S. Channappayya, S.S.
Medasani, “Blind image quality evaluation using perception based features,” 2015
Twenty First National Conference on Communications (NCC), Mumbai, India, 2015,
pp. 1–6. doi: https://doi.org/10.1109/NCC.2015.7084843
10. R. Zhang, P. Isola, A.A. Efros, E. Shechtman, O. Wang, “The Unreasonable Effec-
tiveness of Deep Features as a Perceptual Metric,” 2018 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018,
pp. 586–595. doi: https://doi.org/10.1109/CVPR.2018.00068
11. N.I. Nedashkovskaya, “Method for weights calculation based on interval multiplica-
tive pairwise comparison matrix in decision-making models,” Radio Electronics,
Computer Science, Control, no. 3, pp. 155–167, 2022. doi: https://doi.org/10.15588/
1607-3274-2022-3-15
12. N.I. Nedashkovskaya, “Estimation of the accuracy of methods for calculating inter-
val weight vectors based on interval multiplicative preference relations,” IEEE 3rd
International Conference on System Analysis & Intelligent Computing (SAIC), 2022.
doi: https://doi.org/10.1109/SAIC57818.2022.9922977
13. N.I. Nedashkovskaya, “Method for Evaluation of the Uncertainty of the Paired
Comparisons Expert Judgements when Calculating the Decision Alternatives
Weights,” Journal of Automation and Information Sciences, vol. 47, issue 10,
pp. 69–82, 2015. doi: https://doi.org/10.1615/JAutomatInfScien.v47.i10.70
N. Nedashkovskaya, A. Lanko
ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 118
14. N.D. Pankratova, N.I. Nedashkovskaya, “Hybrid Method of Multicriteria Evaluation
of Decision Alternatives,” Cybernetics and Systems Analysis, vol. 50, no. 5, pp. 701–711,
2014. doi: https://doi.org/10.1007/s10559-014-9660-2
15. N.I. Nedashkovskaya, “Investigation of methods for improving consistency of a
pairwise comparison matrix,” Journal of the Operational Research Society, vol. 69,
no. 12, pp. 1947–1956, 2018. doi: https://doi.org/10.1080/01605682.2017.1415640
16. C. Ledig et al., “Photo-Realistic Single Image Super-Resolution Using a Generative
Adversarial Network,” 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Honolulu, HI, 21–26 July 2017, pp. 105–114. doi:
https://doi.org/10.1109/cvpr.2017.19
17. J. Kim, J.K. Lee, K.M. Lee, “Accurate Image Super-Resolution Using Very Deep
Convolutional Networks,” 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016, pp. 1646–1654. doi:
https://doi.org/10.1109/cvpr.2016.182
18. J. Kim, J.K. Lee, K.M. Lee, “Deeply-Recursive Convolutional Network for Image
Super-Resolution,” 2016 IEEE Conference on Computer Vision and Pattern Recog-
nition (CVPR), Las Vegas, NV, USA, 27–30 June 2016, pp. 1637–1645, 2016. doi:
https://doi.org/10.1109/cvpr.2016.181
19. C. Dong et al., “Image Super-Resolution Using Deep Convolutional Networks,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2,
pp. 295–307, 2016. doi: https://doi.org/10.1109/tpami.2015.243928
20. А.А. Lanko, N.I. Nedashkovskaya, “Generative models and methods of deep learn-
ing for the SISR problem,” System sciences and informatics: collection of reports of
the 3rd All-Ukrainian scientific and practical conference “System sciences and informat-
ics”, November 25–29, 2024, Kyiv. K.: IASA KPI, 2024, pp. 176–181. Available:
http://mmsa.kpi.ua/sites/default/files/systemni_nauky_ta_informatyka_2024.pdf
21. K. He et al., “Delving Deep into Rectifiers: Surpassing Human-Level Performance
on ImageNet Classification,” 2015 IEEE International Conference on Computer Vi-
sion (ICCV), Santiago, Chile, 7–13 December 2015, pp. 1026–1034. doi:
https://doi.org/10.1109/iccv.2015.123
22. X. Glorot, Y. Bengio, “Understanding the difficulty of training deep feedforward
neural networks,” Proceedings of the Thirteenth International Conference on Artifi-
cial Intelligence and Statistics (AISTATS), Sardinia, Italy, 13–15 May 2010, PMLR,
vol. 9, pp. 249–256. Available: http://proceedings.mlr.press/v9/glorot10a.html
23. A. Lugmayr et al., “SRFlow: Learning the Super-Resolution Space with Normalizing
Flow,” Computer Vision – ECCV 2020, Cham, 2020, pp. 715–732. doi:
https://doi.org/10.1007/978-3-030-58558-7_42
24. Q. Jiang et al., “Single Image Super-Resolution Quality Assessment: A Real-World
Dataset, Subjective Studies, and an Objective Metric,” IEEE Transactions on Image
Processing, vol. 31, pp. 2279–2294, 2022. doi: https://doi.org/10.1109/tip.2022.3154588
25. N.I. Nedashkovskaya, “A system approach to decision support on basis of hierarchi-
cal and network models,” System Research and Information Technologies, no. 1,
pp. 7–18, 2018. doi: https://doi.org/10.20535/srit.2308-8893.2018.1.01
26. A. Ignatov et al., “PIRM challenge on perceptual image enhancement on smart-
phones: report,” Conference on Computer Vision (ECCV) Workshops, 2019. doi:
https://doi.org/10.1007/978-3-030-11021-5_20
27. Dandan Gao, Dengwen Zhou, “A very lightweight and efficient image super-
resolution network,” Expert Systems with Applications, vol. 213, Part A, 1, March
2023, 118898. doi: https://doi.org/10.1016/j.eswa.2022.118898
28. “GitHub - tensorlayer/SRGAN: Photo-Realistic Single Image Super-Resolution
Using a Generative Adversarial Network,” GitHub. Available: https://github.com/
tensorlayer/SRGAN
29. “GitHub - twtygqyy/pytorch-vdsr: VDSR (CVPR2016) pytorch implementation,”
GitHub. Available: https://github.com/twtygqyy/pytorch-vdsr.
Quality assessment of models and deep learning methods for super-resolution image formation
Системні дослідження та інформаційні технології, 2025, № 4 119
30. “GitHub - Lornatang/SRCNN-PyTorch: Pytorch framework can easily implement
srcnn algorithm with excellent performance,” GitHub. Available: https://github.com/
Lornatang/SRCNN-PyTorch
31. B. Lim, S. Son, H. Kim, S. Nah, K.M. Lee, “Enhanced deep residual networks for
single image super-resolution,” IEEE Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), 2017, pp. 1132–1140. doi: https://doi.org/10.1109/
CVPRW.2017.151
32. X. Wang et al., “ESRGAN: Enhanced super-resolution generative adversarial networks,”
Computer Vision – ECCV 2018 Workshops: Munich, Germany, September 8-14, 2018,
Proceedings, Part V, pp. 63–79. doi: https://doi.org/10.1007/978-3-030-11021-5_5
Received 27.12.2024
INFORMATION ON THE ARTICLE
Anna A. Lanko, ORCID: 0009-0005-8370-5739, Educational and Research Institute for
Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky
Kyiv Polytechnic Institute”, Ukraine, e mail: lanko.anna@lll.kpi.ua
Nadezhda I. Nedashkovskaya, ORCID: 0000-0002-8277-3095, Educational and
Research Institute for Applied System Analysis of the National Technical University
of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e mail:
nedashkovskaya.nadezhda@ lll.kpi.ua
ОЦІНЮВАННЯ ЯКОСТІ МОДЕЛЕЙ ТА МЕТОДІВ ГЛИБОКОГО НАВЧАННЯ
ДЛЯ ФОРМУВАННЯ СУПЕРРОЗДІЛЬНИХ ЗОБРАЖЕНЬ / Н.І. Недашківська,
A.А. Ланько
Анотація. Розглянуто метрику оцінювання результатів генерації суперрозді-
льних зображень під час розв’язання задачі SISR. Дослідження включає два
експерименти: власну реалізацію мережевих архітектур для SRGAN, VDSR і
SRCNN, і точне налаштування попередньо навчених моделей SRGAN, VDSR і
SRCNN. Запропоновано алгоритм оцінювання якості моделей і методів глибо-
кого навчання для генерації суперроздільних зображень. Модель VDSR про-
демонструвала найкращі результати з точки зору піксельного, структурних і
перцептивних показників, а також часу навчання та візуального підтвердження
якості згенерованого зображення людиною, підкреслюючи, що залишкове на-
вчання є більш ефективним, ніж рекурсивне навчання за умов двох проведених
експериментів. Порогові значення для прийнятних і високоякісних результатів
визначено шляхом візуального аналізу багатьох згенерованих зображень і від-
повідних показників якості, включно з тими, про які повідомляли інші дослід-
ники.
Ключові слова: задача SISR, оцінювання якості, генеративні моделі, методи
глибокого навчання, згорткова нейронна мережа, залишкове навчання, рекур-
сивне навчання, тонке налаштування попередньо навчених моделей, перцеп-
тивна метрика, LPIPS, багатокритеріальний аналіз розв’язань, набір даних
DIV2K, порогові значення для прийнятних і високоякісних згенерованих зобра-
жень.
|
| id | journaliasakpiua-article-351424 |
| institution | System research and information technologies |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2026-02-08T08:06:12Z |
| publishDate | 2025 |
| publisher | The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" |
| record_format | ojs |
| resource_txt_mv | journaliasakpiua/cb/d57ede9a694071bd3d052c7b7e33f3cb.pdf |
| spelling | journaliasakpiua-article-3514242026-02-02T20:49:24Z Quality assessment of models and deep learning methods for super-resolution image formation Оцінювання якості моделей та методів глибокого навчання для формування суперроздільних зображень Lanko, Anna Nedashkovskaya, Nadezhda single image super-resolution quality assessment generative models deep learning methods convolutional neural network residual learning recursive learning fine-tuning of pre-trained models perceptual metric LPIPS multicriteria decision analysis DIV2K dataset thresholds for practically acceptable and high-quality generated images задача SISR оцінювання якості генеративні моделі методи глибокого навчання згорткова нейронна мережа залишкове навчання рекурсивне навчання тонке налаштування попередньо навчених моделей перцептивна метрика LPIPS багатокритеріальний аналіз розв’язань набір даних DIV2K порогові значення для прийнятних і високоякісних згенерованих зображень This article examines evaluation metrics for the results of super-resolution image generation in solving the SISR task. The study comprises two experiments: the implementation of custom network architectures for SRGAN, VDSR, and SRCNN, and fine-tuning of pre-trained SRGAN, VDSR, and SRCNN models. An algorithm for assessing the quality of models and deep learning methods for generating super-resolution images is suggested. The VDSR model performed best in terms of pixel, structural, and perceptual metrics, as well as training time and visual confirmation by a human, highlighting that residual learning is more effective than recursive learning under the conditions of the two conducted experiments. Threshold values for practically acceptable and high-quality results were determined through visual analysis of many generated images and their corresponding quality metrics, including those reported by other researchers. Розглянуто метрику оцінювання результатів генерації суперроздільних зображень під час розв’язання задачі SISR. Дослідження включає два експерименти: власну реалізацію мережевих архітектур для SRGAN, VDSR і SRCNN, і точне налаштування попередньо навчених моделей SRGAN, VDSR і SRCNN. Запропоновано алгоритм оцінювання якості моделей і методів глибокого навчання для генерації суперроздільних зображень. Модель VDSR продемонструвала найкращі результати з точки зору піксельного, структурних і перцептивних показників, а також часу навчання та візуального підтвердження якості згенерованого зображення людиною, підкреслюючи, що залишкове навчання є більш ефективним, ніж рекурсивне навчання за умов двох проведених експериментів. Порогові значення для прийнятних і високоякісних результатів визначено шляхом візуального аналізу багатьох згенерованих зображень і відповідних показників якості, включно з тими, про які повідомляли інші дослідники. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025-12-29 Article Article Peer-reviewed Article application/pdf https://journal.iasa.kpi.ua/article/view/351424 10.20535/SRIT.2308-8893.2025.4.06 System research and information technologies; No. 4 (2025); 104-119 Системные исследования и информационные технологии; № 4 (2025); 104-119 Системні дослідження та інформаційні технології; № 4 (2025); 104-119 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/351424/338446 |
| spellingShingle | задача SISR оцінювання якості генеративні моделі методи глибокого навчання згорткова нейронна мережа залишкове навчання рекурсивне навчання тонке налаштування попередньо навчених моделей перцептивна метрика LPIPS багатокритеріальний аналіз розв’язань набір даних DIV2K порогові значення для прийнятних і високоякісних згенерованих зображень Lanko, Anna Nedashkovskaya, Nadezhda Оцінювання якості моделей та методів глибокого навчання для формування суперроздільних зображень |
| title | Оцінювання якості моделей та методів глибокого навчання для формування суперроздільних зображень |
| title_alt | Quality assessment of models and deep learning methods for super-resolution image formation |
| title_full | Оцінювання якості моделей та методів глибокого навчання для формування суперроздільних зображень |
| title_fullStr | Оцінювання якості моделей та методів глибокого навчання для формування суперроздільних зображень |
| title_full_unstemmed | Оцінювання якості моделей та методів глибокого навчання для формування суперроздільних зображень |
| title_short | Оцінювання якості моделей та методів глибокого навчання для формування суперроздільних зображень |
| title_sort | оцінювання якості моделей та методів глибокого навчання для формування суперроздільних зображень |
| topic | задача SISR оцінювання якості генеративні моделі методи глибокого навчання згорткова нейронна мережа залишкове навчання рекурсивне навчання тонке налаштування попередньо навчених моделей перцептивна метрика LPIPS багатокритеріальний аналіз розв’язань набір даних DIV2K порогові значення для прийнятних і високоякісних згенерованих зображень |
| topic_facet | single image super-resolution quality assessment generative models deep learning methods convolutional neural network residual learning recursive learning fine-tuning of pre-trained models perceptual metric LPIPS multicriteria decision analysis DIV2K dataset thresholds for practically acceptable and high-quality generated images задача SISR оцінювання якості генеративні моделі методи глибокого навчання згорткова нейронна мережа залишкове навчання рекурсивне навчання тонке налаштування попередньо навчених моделей перцептивна метрика LPIPS багатокритеріальний аналіз розв’язань набір даних DIV2K порогові значення для прийнятних і високоякісних згенерованих зображень |
| url | https://journal.iasa.kpi.ua/article/view/351424 |
| work_keys_str_mv | AT lankoanna qualityassessmentofmodelsanddeeplearningmethodsforsuperresolutionimageformation AT nedashkovskayanadezhda qualityassessmentofmodelsanddeeplearningmethodsforsuperresolutionimageformation AT lankoanna ocínûvannââkostímodelejtametodívglibokogonavčannâdlâformuvannâsuperrozdílʹnihzobraženʹ AT nedashkovskayanadezhda ocínûvannââkostímodelejtametodívglibokogonavčannâdlâformuvannâsuperrozdílʹnihzobraženʹ |