Багатокритеріальна математична модель кредитного скорингу в задачах data science
A multi-criteria optimization mathematical model of credit scoring is proposed. The model is derived using a nonlinear trade-off scheme to solve multi-criteria optimization problems, allowing for the construction of a Pareto-optimal solution. The proposed approach forms an integrated assessment of a...
Gespeichert in:
| Datum: | 2025 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Artikel |
| Sprache: | Englisch |
| Veröffentlicht: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2025
|
| Schlagworte: | |
| Online Zugang: | https://journal.iasa.kpi.ua/article/view/308786 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | System research and information technologies |
| Завантажити файл: | |
Institution
System research and information technologies| _version_ | 1866302971909767168 |
|---|---|
| author | Pysarchuk, Oleksii Vasylieva, Maria Baran, Danylo Pysarchuk, Illya |
| author_facet | Pysarchuk, Oleksii Vasylieva, Maria Baran, Danylo Pysarchuk, Illya |
| author_sort | Pysarchuk, Oleksii |
| baseUrl_str | http://journal.iasa.kpi.ua/oai |
| collection | OJS |
| datestamp_date | 2025-11-09T00:01:30Z |
| description | A multi-criteria optimization mathematical model of credit scoring is proposed. The model is derived using a nonlinear trade-off scheme to solve multi-criteria optimization problems, allowing for the construction of a Pareto-optimal solution. The proposed approach forms an integrated assessment of a borrower’s creditworthiness based on a structured set of indicators that reflect the financial, credit, and social profile of clients. The model is designed for use in intelligent CRM and ERP systems operating on Big Data and does not rely on labeled training samples, making it applicable to unsupervised learning tasks. It can also serve as a foundational layer for further deep-learning analysis. Methodological steps for implementing the model, from indicator normalization to final decision-making, are described. A technological implementation demonstrates the model’s effectiveness in automated loan decisions and fraud detection. |
| doi_str_mv | 10.20535/SRIT.2308-8893.2025.3.08 |
| first_indexed | 2025-11-09T02:11:02Z |
| format | Article |
| fulltext |
O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk, 2025
Системні дослідження та інформаційні технології, 2025, № 3 99
TIДC
МАТЕМАТИЧНІ МЕТОДИ, МОДЕЛІ,
ПРОБЛЕМИ І ТЕХНОЛОГІЇ ДОСЛІДЖЕННЯ
СКЛАДНИХ СИСТЕМ
UDC 004.5
DOI: 10.20535/SRIT.2308-8893.2025.3.08
MULTI-CRITERIA MATHEMATICAL MODEL OF CREDIT
SCORING IN DATA SCIENCE PROBLEMS
O.O. PYSARCHUK, M.D. VASYLIEVA, D.R. BARAN, І.О. PYSARCHUK
Abstract. A multi-criteria optimization mathematical model of credit scoring is pro-
posed. The model is derived using a nonlinear trade-off scheme to solve multi-
criteria optimization problems, allowing for the construction of a Pareto-optimal so-
lution. The proposed approach forms an integrated assessment of a borrower’s cred-
itworthiness based on a structured set of indicators that reflect the financial, credit,
and social profile of clients. The model is designed for use in intelligent CRM and
ERP systems operating on Big Data and does not rely on labeled training samples,
making it applicable to unsupervised learning tasks. It can also serve as a founda-
tional layer for further deep-learning analysis. Methodological steps for implement-
ing the model, from indicator normalization to final decision-making, are described.
A technological implementation demonstrates the model’s effectiveness in auto-
mated loan decisions and fraud detection.
Keywords: Data Science, Big Data, SCORIG machine learning, decision making,
multi-criteria mathematical models, intelligent СRM, ERP systems.
INTRODUCTION
The development of the modern IT industry determines the methodologies and
technologies of electronic banking. This also affects the automation of intelligent
decision-making processes. One of these directions is making decisions about
granting loans to consumers of lending services (clients) provided by banking
institutions. This process relates to the field of credit scoring (SCORIG). It is
based on the analysis of a set of indicators of the client’s creditworthiness and
establishing an individual integrated assessment (SCORE) in order to make an
informed decision on granting a loan. The practice of scoring analysis is not
limited to a binary yes/no assessment of lending. Scoring analysis should ensure
the formation of an adequate risk assessment that determines a specific credit
program adapted in the loan life cycle to the properties of a specific client. The
process of scoring analysis should ensure high economic performance indicators
of the banking institution. For lending programs, this means maximizing the
number of loans issued, but through programs that are adequate to the risks of
non-repayment of loan funds by the client. Currently, credit scoring is
implemented by automated software tools that have the properties of intelligence
and are organized in the format of distributed CRM or ERP systems.
O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 100
Features of credit scoring in modern CRM, ERP are, in fact, the implementation
of Data Science technologies on Big Data arrays. This imposes rather strict
requirements on the computational complexity of credit scoring models.
Undoubtedly, the quality of automated credit scoring decision-making is
determined by the mathematical models underlying the automated software tools.
Therefore, the pragmatic effort of banking institutions to increase the economic
efficiency makes relevant the task of developing effective mathematical models
of credit scoring in Data Science tasks on Big Data arrays.
ANALYSIS OF RESEARCH AND PUBLICATIONS
The specificity of the credit scoring task consists in considering it within the class
of classification methods and models. Classical methods of machine learning are
most often used [1–5]: discriminant analysis; logistic regression; decision trees;
method of support vectors; naive Bayes classifier; neural networks and others. In
general, the assessment task refers to the theory of system analysis [6].
Performance evaluation is carried out in the following sequence: determining
factors, indicators and criteria; forming the decision-making model; interpreting
the obtained result. Single-criteria and multi-criteria models of efficiency
evaluation are distinguished [7, 8], with the latter being comparatively more
adequate. The main drawback of traditional credit scoring approaches is the
consideration of the assessment task at the level of classification and forming an
integrated assessment on a discrete field of static numerical representations of
many factors of a particular borrower.
Currently, digital twin technologies are rapidly developing, which involve
implementing any business processes into digital virtual reality in order to
automate and optimize them. Digital twin is effective not so much in the aspect of
automation as in the maximum approximation of the “twin” to the real physical
process. This means that the twin must have a high level of adequacy but with
abstraction that allows productive processing of, for example, Big Data arrays.
The main idea of the approach proposed in the article is to approximate the
problem of credit scoring to its real physical essence at the formalization level.
This means that a real client should come as close as possible to the image of an
unattainable ideal based on a set of indicators. In this formulation, the task of
scoring analysis reflects the task of multi-criteria evaluation. This will be applied
to the synthesis of a mathematical model of scoring analysis, as a multi-criteria
optimization mathematical model of evaluation.
Multi-criteria formalization of the decision-making task a priori has a higher
level of adequacy since it mathematically allows describing a specific practical
task of natural language expressed by the scheme “how best...”. Moreover, the
criterion allows describing the entire set of possible indicator values, even though
represented by a set of discrete limited realizations.
Moreover, the criterion allows describing the entire set of possible values of
indicators, although represented by a set of discrete limited implementations.
Thus, the increased adequacy of the evaluation task at the formalization stage
ensures that the “twin” closely approximates the real physical process. Therefore,
we should potentially expect an increase in the efficiency of the final result of the
scoring analysis.
Formulation of the problem. The aim of this article is to synthesize a
multicriteria optimization mathematical model for credit scoring.
Multi-criteria mathematical model of credit scoring in Data Science problems
Системні дослідження та інформаційні технології, 2025, № 3 101
PRESENTATION OF THE MAIN MATERIAL
The achievement of the stated goal is implemented at three levels: model,
methodological, and technological.
І. Multicriteria Optimization Mathematical Model of Credit Scoring. The
synthesis of the mathematical model is implemented in stages: defining factors,
indicators, and criteria; forming the decision-making model; interpreting the
obtained result.
Defining factors, indicators, and criteria are the initial data for evaluation
and represent the scoring card. The scoring card structure is known but may vary
in the number and values of indicators according to the specific conditions of a
particular banking institution.
The classical structure of the scoring card includes the following groups of
indicators[1–5]: information from the banking institution—credit product;
information about the borrower/client—credit history; financial; social (see Table 1).
T a b l e 1 . Scoring card — general structure
Credit product (bank):
Amount;
Term;
The purpose of the loan;
…
Financial (borrower):
Assets;
Obligations;
Monthly income;
Monthly expenses;
…
Credit history (borrower):
In the current bank;
In other banks;
Credit bureau data;
…
Social indicators (borrower):
Work experience;
Time of residence at the current address;
Marital status;
…
The specificity of indicator values in the scoring card is formed as a dynamic
database of client interactions.
In the classic setting, the task of scoring analysis is formalized as the task of
classifying new customers based on information about existing clients.
Let the set of bank customers be given
niZi 1, }{ . (1)
Each client is characterized by a p-dimensional vector of heterogeneouss
features.
T
1 ],[ ipii xxX . (2)
It is known that each client iY belongs to one of two creditworthiness classes
2k :
. 0
,1 worthy is creditthe client
hycreditwortnotisclientthey
y
Y (3)
New clients are characterized by a sample: mjWj 1,}{ .
A sample of clients with known creditworthiness class serves as the training
set — )(1},{ NniZ N
i .
O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 102
It is necessary to implement a scoring algorithm that classifies new clients
, }{ jW 1 mj , based on their feature vectors T
1 ],[ ipii xxX .
The specified scheme corresponds to the strategy of learning with a teacher,
however, in the practice of scoring analysis, the presence of a training sample is a
rather rare phenomenon. In this case, forming a training set becomes a separate,
rather complex task.
Additionally, discrete indicators of the scoring card may not be informative
when considered individually. This necessitates calculating secondary indicators
and comparing them with other clients.
Therefore, the article considers a modified formulation of the problem of
scoring analysis.
Let a set of bank clients (1) be given. Each client is characterized by a p-
dimensional vector of heterogeneous features (2) T
1 ],[ ipii xxX . It is neces-
sary to classify each bank client iY into two classes (3).
The multi-criteria scoring method is proposed to solve the classification
problem formalized in this way. This decision is based on the following
considerations. Scoring analysis is essentially aimed at building a digital twin of
the banking institution’s team, which forms the requirements for the ideal client.
This enables the analysis of alternatives for binary classification of clients based
on a multitude of factors, comparing the ideal image with the real client —
evaluating the degree of closeness between them. This process is accompanied by
considerations/doubts/analysis of many factors—often following the “best–worst”
scheme. It is multicriteria scoring that allows incorporating the decision maker’s
considerations in transforming static indicators into dynamic requirements of criteria.
In addition, the scoring model must meet a number of technical requirements
[1–5]:
1. High adequacy in dividing borrowers into two categories from the
perspective of credit issuance: “positive” and “negative”;
2. The scoring point is a measure of the probability of the borrower
belonging to the “positive” or “negative” class;
3. The scoring model should form the average rating of “negative”
borrowers significantly lower than the average rating of “positive” ones;
4. There should be a ranking of borrowers within the rating of “positive”
decisions;
5. There is a cut-off point when it is unprofitable for the bank to issue loans
to borrowers below a certain scoring point;
6. The scoring model should ensure detection of fraud.
Multi-criteria scoring implements the given list of requirements and has a
number of unique advantages, which will be proved with a computational
example [8].
Based on the structure of the scoring card (Table 1) and setting extremum
requirements for its indicators, we generally obtain a system of criteria for the
categories of the scoring card:
credit product (bank)
pi kipP 1, extreme][ T , (4)
Multi-criteria mathematical model of credit scoring in Data Science problems
Системні дослідження та інформаційні технології, 2025, № 3 103
financial (borrower)
fi kifF 1, extreme][ T ,
credit history (borrower)
ki kikK 1, extreme][ T ,
social (borrower)
si kisS 1, extreme][ T ,
extended vector of criteria
,extreme],,,[ T iiiii skfpwW skfp kkkki 1 .
The directions of the extremum (extreme=min, max) of each indicator of the
scoring card are unique for each banking institution. This effectively reflects the
bank’s understanding of the image of the ideal client and considers the logic of
mental deliberations “best-case scenario – worst-case scenario”.
Analysis of the content and practical significance of indicators in the scoring
card suggests a conflicting nature of criteria for the “ideal” borrower. Therefore,
we have a multicriteria optimization problem in scoring.
The decision-making model is formed by aggregating/integrating partial
criteria vectors (4) into a generalized/integrated assessment score using
convolution through a non-linear trade-off scheme [8].
Compared to other aggregation schemes of partial criteria [9], convolution
has a number of proven advantages [8]. The convolution uses a non-linear trade-
off scheme, which allows obtaining a Pareto-optimal solution with low computa-
tional costs. The optimization problem is solved under constraints, ensuring uni-
modality of the generalized criterion function and guaranteeing a unique solution
in any case. Convolution enables the use of a minimax approach, focusing on
maximizing the dominant partial criterion of optimality. Weight coefficients of
partial criteria allow consideration of subjective factors in dominating their influ-
ence on scoring results.
The convolution criterion for discretely given partial optimality criteria has
the form [8]:
min)1(=)( 1
00
1
0
ll
b
l
Y , (5)
where bl 1 — the number of partial optimality criteria included in the
convolution; l0 — normalized weight coefficient; l0 — normative partial
criterion.
The values of weight coefficients are assigned within a unified rating scale
and normalized according to the expression:
,
1
0
l
b
l
l
l
(6)
where l is the current (non-normalized) value of the weight coefficient.
The normalization of partial criteria aims to bring them to a single scale of
change (0...1) and to the direction of minimization. Therefore, partial criteria that
are minimized and those that are maximized are normalized separately.
O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 104
Normalization can be implemented, for example, relative to the maximum
(minimum) values characterizing the change in partial optimality criteria by
expressions
min
min
min
0
max l
l
l ,
max
max
max
0
min
l
l
l
, (7)
where minmax l , maxmin l — maximum and minimum values of the minimizing
and maximizing criteria in the interval of their consideration; — the reserve
coefficient, which varies between 0.1 and 0.3 and ensures the elimination of the
operation of division by zero for normalization of values minmax l , maxmin l .
Convolution (5) can be presented in matrix form
TGY Φ=)( 0 ,
],,,,[ 321 lG , bl 1 , (8)
T
l ])1(,…,)1(,)1(,)1[(Φ 1
0
1
03
1
02
1
01
, bl 1 .
To form a multi-criteria mathematical model of bank scoring, we formalize
the appearance of the scoring card in the accepted notation (4) — see table 2.
T a b l e 2 . The scoring card — formalized structure
P F K S
1p 2p …
pi kp
1f 2f …
fi kf
1k 2k …
ki kk 1s 2s …
si ks №
1w 2w …
pkw 1 pkw 2 pkw …
p fk kw …
p f k sk k k kw
1 1(1)w 1(2)w … 1( )pkw 1(1 )pkw 1(2 )pkw … 1( )p fk kw … 1( + )p f k sk k k kw
2 2(1)w 2(2)w … 2( )pkw 2(1 )pkw 2(2 )pkw … 2( )p fk kw … 2( + )p f k sk k k kw
… … … … … … … … … … …
ν ν(1)w ν(2)w … ν( )pkw ν(1 )pkw ν(2 )pkw … ν( )p fk kw … ν( + )p f k sk k k kw
Taking into account the given designations, the generalized assessment of
the v-th borrower according to the vector of criterion requirements (4) for the
scoring card of Table 1 in accordance with the convolution (5) is determined by
the expression:
by the extended vector of criteria in scalar form:
min)1(=)( 1
)0v()0v(
1
0
ll
kkkk
l
wwY
skfp
, (9)
by the extended vector of criteria in matrix form:
T
0 Φ=)( GwY ,
],…,,,[ (l)(3)(2))1( G , skfp kkkkl 1 , (10)
T ])w1(,…,)w1(,)w1(,)w1[(Φ 1
l)0v(
1
)03v(
1
)02v(
1
)01v(
,
skfp kkkkl 1 .
Multi-criteria mathematical model of credit scoring in Data Science problems
Системні дослідження та інформаційні технології, 2025, № 3 105
Normalization of weight coefficients, as well as partial criteria included in
(9), (10), is implemented according to expressions (6), (7), taking into account the
direction of the extremum.
To account a significant number of criteria in the generalized multicriteria
assessment, it is advisable to use the technology of nested convolutions. This
approach also allows regulating the influence of groups of scoring card indicators
on the assessment result. This is implemented by sequentially (within the four
groups of partial criteria (4)) reduction of partial criteria to the generalized by
group and to the integrated efficiency criterion in scalar form:
min)1(=)( 1
)0v()0v(
1
0
ll
k
l
ppP
p
,
min)1(=)( 1
)0v()0v(
1
0
ll
k
l
ffF
f
,
min)1(=)( 1
)0v()0v(
1
0
ll
k
l
kkK
k
,
min)1(=)( 1
)0v()0v(
1
0
ll
s
l
ssS ,
1
00)0v(
1
00)0v(0 ))(1())(1()( fFpPwY F
l
P
l
min))(1())(1( 1
000v
1
00)0v(
sSkK S
l
K
l , (11)
11
0v
1
00 ]])[max1([)(
l
k
i
ppP
p
,
11
0v
1
0 ]])[max1([)(
l
k
i
fpF
p
,
11
0v
1
00 ]])[max1([)(
l
k
i
kpK
p
,
11
0v
1
00 ]])[max1([)(
l
k
i
spS
p
. (12)
Normalization (12) is performed relative to the worst assessment — the
maximum value of the normalized indicator, which characterizes the partial
criterion of the scoring card.
Similarly, matrices are formed and generalized group criteria ratings, which
are part of the matrix model of multicriteria scoring (10), are normalized.
1
00)0v(
1
00)0v(0 )1()1()(
T
fvfv
F
l
T
pvpv
P
l GGwY
min)1()1( 1
000v
1
000v T
svsv
S
l
T
kvkv
K
l GG ,
][ionnormalizat,,, )0v(0000 lsvkvfvpv GGGG , skfp kkkkl ,,,1
are the same in structure, but may have different values
O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 106
1T
00)0v(
1T
00)0v(0 )1()1()(
fvfv
F
lpvpv
P
l GGwY
min)1()1( 1T
000v
1T
000v
svsv
S
lkvkv
K
l GG , (13)
][ionnormalizat,,, 0v0000 lsvkvfvpv GGGG ,
1 , , ,p f k sl k k k k are the same in structure, but may have different values
T1
0v0 ])1[(ionnormalizat lpv p , pkl 1 ,
T1
0v0 ])1[(ionnormalizat lfv f , fkl 1 ,
T1
0v0 ])1[(ionnormalizat lkv k , kkl 1 , (14)
T1
0v0 ])1[(ionnormalizat lsv s , skl 1 .
The interpretation of the obtained result involves bringing the value of
the generalized assessment (9), (10), or in the form (11), (13) to a unified scale,
for example, from 0 (the worst rating) to 1 (the best rating). This is achieved by
normalizing the generalized score to the abstract worst customer score according
to the expression
I
I
I
max
10 , 1
5
1
])[max1(max
i
i
FI , (15)
where lFmax — the worst possible value of the partial indicator; — the
reserve coefficient, which ensures the avoidance of incorrect operations during
normalization.
The obtained numerical assessment can be converted to the linguistic
category of the client’s solvency according to the fundamental assessment scale of
the Table 2.
T a b l e 3 . Fundamental rating scale
Integrated performance assessment 0I Linguistic category of efficiency
1,0 – 0,7 High
0,7 – 0,5 Good
0,5 – 0,4 Satisfactory
0,4 – 0,2 Low
0,2 and less Unsatisfactory
The numerical evaluation of the normalized generalized indicator (15) (see
the left column of Table 3) is proportional to the probability of the client returning
the loan. That is, it characterizes the risk of providing a credit loan.
Thus, expressions (9)–(15) form a multi-criteria mathematical model of
credit scoring. The differences, advantages and features of the model are as
follows. The model allows you to consider the indicators of the scoring card in
terms of infological connections: factors, indicators, criteria, which contributes to
increasing the adequacy of the ideal client profile of a banking institution. The
model ensures obtaining a generalized client assessment as a solution to an opti-
mization problem using a minimax approach to image requirements. The obtained
Multi-criteria mathematical model of credit scoring in Data Science problems
Системні дослідження та інформаційні технології, 2025, № 3 107
solution is Pareto-optimal. Subjective priorities of scoring card indicators can be
taken into account in client assessment by adjusting both partial and group
criteriaweights. The model is structurally open to adding scoring card indicators.
The proposed model does not require a priori data on loan issuance/refusal to
clients, that is, it implements an unsupervised learning scheme. Thus, the
proposed model can be a primary superstructure, acting as a highly accurate
binary classifier to deep learning methods based on artificial neural networks.
Undoubtedly, artificial neural networks are designed and capable of accumulating
large segments of labeled data, and the proposed multi-criteria model cannot
compete with these advantages. But in the context of unsupervised learning, the
multi-criteria model has better potential properties than the existing approaches in
the essence of the formalization of the classification task. Research presented
below has proven the model’s capability to detect fraud. The model fully meets
the requirements for bank scoring models, which will also be proved by a
computational example.
ІІ. The methodology of multi-criteria credit scoring determines the sequence
of actions for for performing calculations and obtaining the resulting assessment,
including the following stages:
1. Establishing a set of indicators from the scoring card (Table 1) in the form of (4).
2. Normalizing criteria (4) using expressions (6), (7).
3. Formulating the generalized client assessment, expressions (9)–(14).
4. Interpreting the generalized assessment with normalization (15) and in
accordance with Table 3.
ІІІ. The technology of multicriteria credit scoring involves practical aspects
of implementing the synthesized model (9)–(15) and the methodology of its
application to the architecture of the software system and to a specific script-
based implementation.
The technological processes of multi-criteria credit scoring can be
represented by the structural diagram in Fig. 1, which implements the
architecture of the software script of credit scoring.
Fig. 1. Structural diagram of the software implementation of the mathematical model
2.1. Parsing the file of indicators
(criteria) of the scoring card / data
2.3. Normalization of indicators
(criteria)
2.2. Determination of normalizing
parameters
2.4. Integrated multicriteria assessment
— SCOR
ІІ. Formation of the scsoring model
І. Preparation of Input Data
1.2. Analysis of the
structure of input data
1.5. Data cleaning
1.4. Initial formation of the
scoring table
1.5.1. Analysis of the intersection
of scoring indicators and input
data segment
1.1. Parsing the input
data file
1.3. File parsing and analysis of
the structure of scoring indicators
1.5.3. Clearing the scoring
table from omissions
1.5.2. Formation of a
DataFrame of data taking
into account the missing
indicators of the scoring table
O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 108
Technological processes are divided into two blocks: preparation of input
data and formation of the scoring model. The input data includes two files:
scoring card indicators in “sample_data.xlsx” and indicator descriptions in
“data_description.xlsx” (Fig. 2). In total, there are 121 scoring card indicators and
500 records of potential bank clients.
The implementation of credit scoring technology in script form, according to
the developed model and its application methodology, is carried out using the
Python programming language with libraries such as NumPy and Pandas. The
original project with the code is available at https://github.com/Pysarchuk-
O/scoring.git .
Over the input data files, a series of preparatory stages are implemented,
dictated by the specifics and details of the data: parsing of data files and
a — Scoring card sample_data.xlsx
b — Indicators of the scoring card data description.xlsx
Fig. 2. Structure of input data
Multi-criteria mathematical model of credit scoring in Data Science problems
Системні дослідження та інформаційні технології, 2025, № 3 109
converting them to the pandas – DataFrame format (blocks 1.1, 1.3); analyzing
the structure of the input data by size, data types, and presence of missing values
(block 1.2); initial formation of the scoring card and analysis of its structure for
compliance with the fields of Table 1 (block 1.4); cleaning the input data from
gaps using the rejection strategy in cases of a significant number of them and the
impossibility of recovery based on the essence of the indicator (registration ad-
dress, residence, place of work, etc.); selection of values for the scoring card
based on the intersection of input data (Fig. 2, a) and indicators (Fig. 2, b) with
further control of preservation of the of Table 1’s structure (block 1.5). After
these stages, which are classic, a content analysis of indicators is implemented
(also within Block 1.5’s functionality). This involves selecting objective-
subjective indicators that purely characterize the borrower and are not secondary,
as designated by the system (product status, product profile ID, etc.). To specify
the data in Fig. 2, these are indicators from the scoring card marking fields (see
Fig. 2, b) “borrower indication” and “parameters related to the issued product”.
Analysis showed that the data in Fig. 2 do not have solutions for issuing the
product, there is no data structure of the “training pair” type. Thus, fields related
to “decision-making” (Fig. 2, b) are absent in the scoring card (Fig. 2, a).
Therefore, we are dealing with data oriented towards implementing an
unsupervised machine learning model.
The result of the data preparation stage is a scoring card in general form, as
shown in Fig. 2, a, which contains a list of indicators presented in Fig. 3. Analysis
of the scoring card structure according to Fig. 2 and comparison with Table 1
confirms the correspondence and presence of the main categories for scoring.
Further, a series of stages for script deployment and application of the
proposed multicriteria credit scoring model is implemented. This is implemented
in accordance with the formulated methodology and is reflected in the structural
diagram in Fig. 1 by blocks 2.1–2.4.
The initial step to initiate the model’s operation involves parsing the
indicators (criteria) file of the scoring card (block 2.1). The criteria requirements
for the scoring card indicators are formulated based on the values remaining after
cleaning and balancing (Fig. 3). These criteria should reflect the ideal client
profile and are unique to each banking institution. It is possible, for example, to
establish a system of criteria requirements shown in Fig. 4. They are used for the
calculation example.
Fig. 3. Scoring card indicators after data cleaning and balancing d_segment_data_
description_cleaning.xlsx
O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 110
Further, the selection of minimax indicators indicated in Fig. 4 from the
scoring map of Fig. 1 and the formation of an integrated multicriteria assessment
(scor-py) according to model (9) is implemented. The calculation results are
presented in the graphs of Fig. 5.
The graphs in Fig. 5 show the unnormalized scoring values (scor axis) for
each client included in the scoring card in Fig. 2, a (client axis). Fig. 5, a presents
the scoring evaluation for 150 clients without abnormally high scores. Fig. 5, b
and 5, c show the scoring evaluations for 250 and all 500 clients with abnormally
high score values. The red line characterizes the empirical value of dividing
customers into creditworthy and non-creditworthy. The results shown in Fig. 5, a
demonstrate a strictly binary classification of clients into the mentioned
categories. Differences in the scores of creditworthy clients reflect their internal
distribution, proportional to the risk characteristics of granting credit. Abnormally
high scoring values presented in Fig. 5, b and 5, c indicate possible fraud in the
data or incorrectly filled fields in the scoring card. Fraud analysis involves
reversing the process of analyzing the criteria vector for each indicator
individually and in the multicriteria scoring complex.
Fig. 4. Criterion requirements for scoring card indicators d_segment_data_description_
cleaning_minimax.xlsx
Client
S
co
r
а — The scoring evaluation for 150 clients without abnormally high scores
Fig. 5. Client Creditworthiness Assessment — scor (Beginning)
Multi-criteria mathematical model of credit scoring in Data Science problems
Системні дослідження та інформаційні технології, 2025, № 3 111
Comparison of the above results of calculations obtained in accordance with
the proposed approach was carried out with discriminant analysis on a similar
segment of data. The results of the comparison in more than 80% did not
contradict each other, which indicates the effectiveness of the proposed approach.
CONCLUSIONS
In the research, a multi-criteria mathematical model of credit scoring for Data
Science tasks was developed. The model provides the formation of the ideal client
profile in the format of criterial requirements. The integrated assessment is
formed using convolution according to a nonlinear trade-off scheme and
represents the solution to an optimization problem, belonging to Pareto optimal
solutions. The model operates on data which corresponds to the principles of
unsupervised learning. It can be applied independently or serve as a foundational
layer for deep learning methods. The practical application of the model is facili-
tated by its methodological and technological representations. A computational
example of applying the model to real big data proved its effectiveness for the
decision-making stage of customer lending and fraud detection.
Client
S
co
r
b — The scoring evaluations for 250 clients with abnormally high score values
c — The scoring evaluations for 500 clients with abnormally high score values
Client
S
co
r
Fig. 5. Client Creditworthiness Assessment — scor (Continued)
O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk
ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 112
REFERENCES
1. Mohamed Hani AbdElHamid, Mohamed Tawfik ElMasry, Machine learning approach
for credit score analysis: a case study of predicting mortgage loan defaults. NOVA In-
formation Management School Instituto Superior de Estatística e Gestão de Informação
Universidade Nova de Lisboa, 2019, 60p. Available: https://run.unl.pt/bitstream/
10362/62427/ 4/TEGI0439.pdf
2. Maria Fernandez Vidal, Fernando Barbon, Credit scoring in financial inclusion. How to use
advanced analytics to build credit-scoring models that increase access. Washington: Consul-
tative Group to Assist the Poor: World Bank, 2019, 52 p. Available: https://www.
findevgateway.org/guide-toolkit/2019/07/credit-scoring-financial-inclusion
3. “Credit scoring approaches guidelines,” The World Bank Group, 2019, 64 p. Avail-
able: https://thedocs.worldbank.org/en/doc/935891585869698451-0130022020/ original/
CREDITSCORINGAPPROACHESGUIDELINESFINALWEB.pdf
4. Rory P. Bunker, M. Asif Naeem, Wenjun Zhang, “Improving a Credit Scoring Model by
Incorporating Bank Statement Derived Features,” Preprint submitted to Expert Systems
with Applications, 2017. Available: https://www.researchgate.net/ publication/309606871_
Improving_a_Credit_Scoring_Model_by_Incorporating_Bank_Statement_Derived_Features
5. Engku Muhammad Nazri E.A. Bakar, “Credit scoring models: techniques and issues,”
Journal of Advanced Research in Business and Management Studies, vol. 7, issue 2,
pp. 29–41, 2017. Available: https://www.researchgate.net/publication/323306120_
Credit_scoring_models_techniques_and_issues
6. Gerrit Muller, System Modeling and Analysis: a Practical Approach. University of
South-Eastern Norway-NISE, 2023, 128 p. Available: https://www.gaudisite.nl/ Sys-
temModelingAndAnalysisBook.pdf
7. Nolberto Munier, Mathematical Modelling of Decision Problems. Using the SIMUS
Method for Complex Scenarios. Springer, 2021, 196 p.
8. Albert Voronin, Multi-Criteria Decision Making for the Management of Complex Sys-
tems. IGI Global, 2017, 220 p.
9. Goran Cirovi´c, Dragan Pamuˇcar, Multiple-Criteria Decision Making. MDPI, 2022, 313 p.
Received 10.07.2024
INFORMATION ON THE ARTICLE
Oleksii O. Pysarchuk, ORCID: 0000-0001-5271-0248, National Technical University of Ukraine
“Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: PlatinumPA2212@gmail.com
Maria D. Vasylieva, ORCID: 0000-0003-2217-643X, National Technical University of
Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: mdvasilieva@gmail.com
Danylo R. Baran, ORCID: 0000-0002-3251-8897, National Technical University of Ukraine
“Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: danil.baran15@gmail.com
Illya O. Pysarchuk, ORCID: 0000-0003-4343-0142, National Technical University of
Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: flimka134@gmail.com
БАГАТОКРИТЕРІАЛЬНА МАТЕМАТИЧНА МОДЕЛЬ КРЕДИТНОГО СКОРИНГУ
В ЗАДАЧАХ DATA SCIENCE / O.O. Писарчук, M.Д. Васильєва, Д.Р. Баран, І.О. Писарчук
Анотація. Запропоновано багатокритеріальну оптимізаційну математичну мо-
дель кредитного скорингу. Модель отримано з використанням нелінійної схе-
ми компромісів для вирішення задач багатокритеріальної оптимізації, що до-
зволяє побудувати Парето-оптимальне рішення. Запропонований підхід
формує інтегровану оцінку кредитоспроможності позичальника на основі
структурованого набору показників, що відображають фінансовий, кредитний
та соціальний профіль клієнтів. Модель призначено для використання в інте-
лектуальних CRM та ERP системах, що працюють з великими даними, і не по-
требує розмічених навчальних вибірок, що робить її придатною для задач навчан-
ня без учителя. Вона також може слугувати базовим рівнем для подальшого
аналізу з використанням методів глибинного навчання. Описано методологічні
кроки впровадження моделі, від нормалізації показників до прийняття остаточних
рішень. Технологічна реалізація демонструє ефективність моделі в автоматизова-
ному прийнятті рішень щодо кредитування і виявленні шахрайства.
Ключові слова: Data Science, Big Data, SCORIG машинне навчання, прийняття
рішень, багатокритеріальні математичні моделі, інтелектуальні СRM, ERP системи.
|
| id | journaliasakpiua-article-308786 |
| institution | System research and information technologies |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2025-11-09T02:11:02Z |
| publishDate | 2025 |
| publisher | The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" |
| record_format | ojs |
| resource_txt_mv | journaliasakpiua/1e/0654b8d1cc2e0d4998d221429122ae1e.pdf |
| spelling | journaliasakpiua-article-3087862025-11-09T00:01:30Z Multi-criteria mathematical model of credit scoring in data science problems Багатокритеріальна математична модель кредитного скорингу в задачах data science Pysarchuk, Oleksii Vasylieva, Maria Baran, Danylo Pysarchuk, Illya Data Science Big Data SCORIG machine learning decision making multi-criteria mathematical models intelligent CRM ERP systems Data Science Big Data SCORIG машинне навчання прийняття рішень багатокритеріальні математичні моделі інтелектуальні CRM ERP системи A multi-criteria optimization mathematical model of credit scoring is proposed. The model is derived using a nonlinear trade-off scheme to solve multi-criteria optimization problems, allowing for the construction of a Pareto-optimal solution. The proposed approach forms an integrated assessment of a borrower’s creditworthiness based on a structured set of indicators that reflect the financial, credit, and social profile of clients. The model is designed for use in intelligent CRM and ERP systems operating on Big Data and does not rely on labeled training samples, making it applicable to unsupervised learning tasks. It can also serve as a foundational layer for further deep-learning analysis. Methodological steps for implementing the model, from indicator normalization to final decision-making, are described. A technological implementation demonstrates the model’s effectiveness in automated loan decisions and fraud detection. Запропоновано багатокритеріальну оптимізаційну математичну модель кредитного скорингу. Модель отримано з використанням нелінійної схеми компромісів для вирішення задач багатокритеріальної оптимізації, що дозволяє побудувати Парето-оптимальне рішення. Запропонований підхід формує інтегровану оцінку кредитоспроможності позичальника на основі структурованого набору показників, що відображають фінансовий, кредитний та соціальний профіль клієнтів. Модель призначено для використання в інтелектуальних CRM та ERP системах, що працюють з великими даними, і не потребує розмічених навчальних вибірок, що робить її придатною для задач навчання без учителя. Вона також може слугувати базовим рівнем для подальшого аналізу з використанням методів глибинного навчання. Описано методологічні кроки впровадження моделі, від нормалізації показників до прийняття остаточних рішень. Технологічна реалізація демонструє ефективність моделі в автоматизованому прийнятті рішень щодо кредитування і виявленні шахрайства. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025-09-29 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/308786 10.20535/SRIT.2308-8893.2025.3.08 System research and information technologies; No. 3 (2025); 99-112 Системные исследования и информационные технологии; № 3 (2025); 99-112 Системні дослідження та інформаційні технології; № 3 (2025); 99-112 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/308786/331015 |
| spellingShingle | Data Science Big Data SCORIG машинне навчання прийняття рішень багатокритеріальні математичні моделі інтелектуальні CRM ERP системи Pysarchuk, Oleksii Vasylieva, Maria Baran, Danylo Pysarchuk, Illya Багатокритеріальна математична модель кредитного скорингу в задачах data science |
| title | Багатокритеріальна математична модель кредитного скорингу в задачах data science |
| title_alt | Multi-criteria mathematical model of credit scoring in data science problems |
| title_full | Багатокритеріальна математична модель кредитного скорингу в задачах data science |
| title_fullStr | Багатокритеріальна математична модель кредитного скорингу в задачах data science |
| title_full_unstemmed | Багатокритеріальна математична модель кредитного скорингу в задачах data science |
| title_short | Багатокритеріальна математична модель кредитного скорингу в задачах data science |
| title_sort | багатокритеріальна математична модель кредитного скорингу в задачах data science |
| topic | Data Science Big Data SCORIG машинне навчання прийняття рішень багатокритеріальні математичні моделі інтелектуальні CRM ERP системи |
| topic_facet | Data Science Big Data SCORIG machine learning decision making multi-criteria mathematical models intelligent CRM ERP systems Data Science Big Data SCORIG машинне навчання прийняття рішень багатокритеріальні математичні моделі інтелектуальні CRM ERP системи |
| url | https://journal.iasa.kpi.ua/article/view/308786 |
| work_keys_str_mv | AT pysarchukoleksii multicriteriamathematicalmodelofcreditscoringindatascienceproblems AT vasylievamaria multicriteriamathematicalmodelofcreditscoringindatascienceproblems AT barandanylo multicriteriamathematicalmodelofcreditscoringindatascienceproblems AT pysarchukillya multicriteriamathematicalmodelofcreditscoringindatascienceproblems AT pysarchukoleksii bagatokriteríalʹnamatematičnamodelʹkreditnogoskoringuvzadačahdatascience AT vasylievamaria bagatokriteríalʹnamatematičnamodelʹkreditnogoskoringuvzadačahdatascience AT barandanylo bagatokriteríalʹnamatematičnamodelʹkreditnogoskoringuvzadačahdatascience AT pysarchukillya bagatokriteríalʹnamatematičnamodelʹkreditnogoskoringuvzadačahdatascience |