Багатокритеріальна математична модель кредитного скорингу в задачах data science

A multi-criteria optimization mathematical model of credit scoring is proposed. The model is derived using a nonlinear trade-off scheme to solve multi-criteria optimization problems, allowing for the construction of a Pareto-optimal solution. The proposed approach forms an integrated assessment of a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2025
Hauptverfasser: Pysarchuk, Oleksii, Vasylieva, Maria, Baran, Danylo, Pysarchuk, Illya
Format: Artikel
Sprache:Englisch
Veröffentlicht: The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025
Schlagworte:
Online Zugang:https://journal.iasa.kpi.ua/article/view/308786
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:System research and information technologies
Завантажити файл: Pdf

Institution

System research and information technologies
_version_ 1866302971909767168
author Pysarchuk, Oleksii
Vasylieva, Maria
Baran, Danylo
Pysarchuk, Illya
author_facet Pysarchuk, Oleksii
Vasylieva, Maria
Baran, Danylo
Pysarchuk, Illya
author_sort Pysarchuk, Oleksii
baseUrl_str http://journal.iasa.kpi.ua/oai
collection OJS
datestamp_date 2025-11-09T00:01:30Z
description A multi-criteria optimization mathematical model of credit scoring is proposed. The model is derived using a nonlinear trade-off scheme to solve multi-criteria optimization problems, allowing for the construction of a Pareto-optimal solution. The proposed approach forms an integrated assessment of a borrower’s creditworthiness based on a structured set of indicators that reflect the financial, credit, and social profile of clients. The model is designed for use in intelligent CRM and ERP systems operating on Big Data and does not rely on labeled training samples, making it applicable to unsupervised learning tasks. It can also serve as a foundational layer for further deep-learning analysis. Methodological steps for implementing the model, from indicator normalization to final decision-making, are described. A technological implementation demonstrates the model’s effectiveness in automated loan decisions and fraud detection.
doi_str_mv 10.20535/SRIT.2308-8893.2025.3.08
first_indexed 2025-11-09T02:11:02Z
format Article
fulltext  O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk, 2025 Системні дослідження та інформаційні технології, 2025, № 3 99 TIДC МАТЕМАТИЧНІ МЕТОДИ, МОДЕЛІ, ПРОБЛЕМИ І ТЕХНОЛОГІЇ ДОСЛІДЖЕННЯ СКЛАДНИХ СИСТЕМ UDC 004.5 DOI: 10.20535/SRIT.2308-8893.2025.3.08 MULTI-CRITERIA MATHEMATICAL MODEL OF CREDIT SCORING IN DATA SCIENCE PROBLEMS O.O. PYSARCHUK, M.D. VASYLIEVA, D.R. BARAN, І.О. PYSARCHUK Abstract. A multi-criteria optimization mathematical model of credit scoring is pro- posed. The model is derived using a nonlinear trade-off scheme to solve multi- criteria optimization problems, allowing for the construction of a Pareto-optimal so- lution. The proposed approach forms an integrated assessment of a borrower’s cred- itworthiness based on a structured set of indicators that reflect the financial, credit, and social profile of clients. The model is designed for use in intelligent CRM and ERP systems operating on Big Data and does not rely on labeled training samples, making it applicable to unsupervised learning tasks. It can also serve as a founda- tional layer for further deep-learning analysis. Methodological steps for implement- ing the model, from indicator normalization to final decision-making, are described. A technological implementation demonstrates the model’s effectiveness in auto- mated loan decisions and fraud detection. Keywords: Data Science, Big Data, SCORIG machine learning, decision making, multi-criteria mathematical models, intelligent СRM, ERP systems. INTRODUCTION The development of the modern IT industry determines the methodologies and technologies of electronic banking. This also affects the automation of intelligent decision-making processes. One of these directions is making decisions about granting loans to consumers of lending services (clients) provided by banking institutions. This process relates to the field of credit scoring (SCORIG). It is based on the analysis of a set of indicators of the client’s creditworthiness and establishing an individual integrated assessment (SCORE) in order to make an informed decision on granting a loan. The practice of scoring analysis is not limited to a binary yes/no assessment of lending. Scoring analysis should ensure the formation of an adequate risk assessment that determines a specific credit program adapted in the loan life cycle to the properties of a specific client. The process of scoring analysis should ensure high economic performance indicators of the banking institution. For lending programs, this means maximizing the number of loans issued, but through programs that are adequate to the risks of non-repayment of loan funds by the client. Currently, credit scoring is implemented by automated software tools that have the properties of intelligence and are organized in the format of distributed CRM or ERP systems. O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 100 Features of credit scoring in modern CRM, ERP are, in fact, the implementation of Data Science technologies on Big Data arrays. This imposes rather strict requirements on the computational complexity of credit scoring models. Undoubtedly, the quality of automated credit scoring decision-making is determined by the mathematical models underlying the automated software tools. Therefore, the pragmatic effort of banking institutions to increase the economic efficiency makes relevant the task of developing effective mathematical models of credit scoring in Data Science tasks on Big Data arrays. ANALYSIS OF RESEARCH AND PUBLICATIONS The specificity of the credit scoring task consists in considering it within the class of classification methods and models. Classical methods of machine learning are most often used [1–5]: discriminant analysis; logistic regression; decision trees; method of support vectors; naive Bayes classifier; neural networks and others. In general, the assessment task refers to the theory of system analysis [6]. Performance evaluation is carried out in the following sequence: determining factors, indicators and criteria; forming the decision-making model; interpreting the obtained result. Single-criteria and multi-criteria models of efficiency evaluation are distinguished [7, 8], with the latter being comparatively more adequate. The main drawback of traditional credit scoring approaches is the consideration of the assessment task at the level of classification and forming an integrated assessment on a discrete field of static numerical representations of many factors of a particular borrower. Currently, digital twin technologies are rapidly developing, which involve implementing any business processes into digital virtual reality in order to automate and optimize them. Digital twin is effective not so much in the aspect of automation as in the maximum approximation of the “twin” to the real physical process. This means that the twin must have a high level of adequacy but with abstraction that allows productive processing of, for example, Big Data arrays. The main idea of the approach proposed in the article is to approximate the problem of credit scoring to its real physical essence at the formalization level. This means that a real client should come as close as possible to the image of an unattainable ideal based on a set of indicators. In this formulation, the task of scoring analysis reflects the task of multi-criteria evaluation. This will be applied to the synthesis of a mathematical model of scoring analysis, as a multi-criteria optimization mathematical model of evaluation. Multi-criteria formalization of the decision-making task a priori has a higher level of adequacy since it mathematically allows describing a specific practical task of natural language expressed by the scheme “how best...”. Moreover, the criterion allows describing the entire set of possible indicator values, even though represented by a set of discrete limited realizations. Moreover, the criterion allows describing the entire set of possible values of indicators, although represented by a set of discrete limited implementations. Thus, the increased adequacy of the evaluation task at the formalization stage ensures that the “twin” closely approximates the real physical process. Therefore, we should potentially expect an increase in the efficiency of the final result of the scoring analysis. Formulation of the problem. The aim of this article is to synthesize a multicriteria optimization mathematical model for credit scoring. Multi-criteria mathematical model of credit scoring in Data Science problems Системні дослідження та інформаційні технології, 2025, № 3 101 PRESENTATION OF THE MAIN MATERIAL The achievement of the stated goal is implemented at three levels: model, methodological, and technological. І. Multicriteria Optimization Mathematical Model of Credit Scoring. The synthesis of the mathematical model is implemented in stages: defining factors, indicators, and criteria; forming the decision-making model; interpreting the obtained result. Defining factors, indicators, and criteria are the initial data for evaluation and represent the scoring card. The scoring card structure is known but may vary in the number and values of indicators according to the specific conditions of a particular banking institution. The classical structure of the scoring card includes the following groups of indicators[1–5]: information from the banking institution—credit product; information about the borrower/client—credit history; financial; social (see Table 1). T a b l e 1 . Scoring card — general structure Credit product (bank):  Amount;  Term;  The purpose of the loan;  … Financial (borrower):  Assets;  Obligations;  Monthly income;  Monthly expenses;  … Credit history (borrower):  In the current bank;  In other banks;  Credit bureau data;  … Social indicators (borrower):  Work experience;  Time of residence at the current address;  Marital status;  … The specificity of indicator values in the scoring card is formed as a dynamic database of client interactions. In the classic setting, the task of scoring analysis is formalized as the task of classifying new customers based on information about existing clients. Let the set of bank customers be given niZi 1, }{ . (1) Each client is characterized by a p-dimensional vector of heterogeneouss features. T 1 ],[ ipii xxX  . (2) It is known that each client iY belongs to one of two creditworthiness classes 2k :       . 0 ,1 worthy is creditthe client hycreditwortnotisclientthey y Y (3) New clients are characterized by a sample: mjWj 1,}{ . A sample of clients with known creditworthiness class serves as the training set — )(1},{ NniZ N i  . O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 102 It is necessary to implement a scoring algorithm that classifies new clients , }{ jW 1 mj  , based on their feature vectors T 1 ],[ ipii xxX  . The specified scheme corresponds to the strategy of learning with a teacher, however, in the practice of scoring analysis, the presence of a training sample is a rather rare phenomenon. In this case, forming a training set becomes a separate, rather complex task. Additionally, discrete indicators of the scoring card may not be informative when considered individually. This necessitates calculating secondary indicators and comparing them with other clients. Therefore, the article considers a modified formulation of the problem of scoring analysis. Let a set of bank clients (1) be given. Each client is characterized by a p- dimensional vector of heterogeneous features (2) T 1 ],[ ipii xxX  . It is neces- sary to classify each bank client iY into two classes (3). The multi-criteria scoring method is proposed to solve the classification problem formalized in this way. This decision is based on the following considerations. Scoring analysis is essentially aimed at building a digital twin of the banking institution’s team, which forms the requirements for the ideal client. This enables the analysis of alternatives for binary classification of clients based on a multitude of factors, comparing the ideal image with the real client — evaluating the degree of closeness between them. This process is accompanied by considerations/doubts/analysis of many factors—often following the “best–worst” scheme. It is multicriteria scoring that allows incorporating the decision maker’s considerations in transforming static indicators into dynamic requirements of criteria. In addition, the scoring model must meet a number of technical requirements [1–5]: 1. High adequacy in dividing borrowers into two categories from the perspective of credit issuance: “positive” and “negative”; 2. The scoring point is a measure of the probability of the borrower belonging to the “positive” or “negative” class; 3. The scoring model should form the average rating of “negative” borrowers significantly lower than the average rating of “positive” ones; 4. There should be a ranking of borrowers within the rating of “positive” decisions; 5. There is a cut-off point when it is unprofitable for the bank to issue loans to borrowers below a certain scoring point; 6. The scoring model should ensure detection of fraud. Multi-criteria scoring implements the given list of requirements and has a number of unique advantages, which will be proved with a computational example [8]. Based on the structure of the scoring card (Table 1) and setting extremum requirements for its indicators, we generally obtain a system of criteria for the categories of the scoring card: credit product (bank) pi kipP  1, extreme][ T , (4) Multi-criteria mathematical model of credit scoring in Data Science problems Системні дослідження та інформаційні технології, 2025, № 3 103 financial (borrower) fi kifF  1, extreme][ T , credit history (borrower) ki kikK  1, extreme][ T , social (borrower) si kisS  1, extreme][ T , extended vector of criteria ,extreme],,,[ T iiiii skfpwW skfp kkkki 1 . The directions of the extremum (extreme=min, max) of each indicator of the scoring card are unique for each banking institution. This effectively reflects the bank’s understanding of the image of the ideal client and considers the logic of mental deliberations “best-case scenario – worst-case scenario”. Analysis of the content and practical significance of indicators in the scoring card suggests a conflicting nature of criteria for the “ideal” borrower. Therefore, we have a multicriteria optimization problem in scoring. The decision-making model is formed by aggregating/integrating partial criteria vectors (4) into a generalized/integrated assessment score using convolution through a non-linear trade-off scheme [8]. Compared to other aggregation schemes of partial criteria [9], convolution has a number of proven advantages [8]. The convolution uses a non-linear trade- off scheme, which allows obtaining a Pareto-optimal solution with low computa- tional costs. The optimization problem is solved under constraints, ensuring uni- modality of the generalized criterion function and guaranteeing a unique solution in any case. Convolution enables the use of a minimax approach, focusing on maximizing the dominant partial criterion of optimality. Weight coefficients of partial criteria allow consideration of subjective factors in dominating their influ- ence on scoring results. The convolution criterion for discretely given partial optimality criteria has the form [8]: min)1(=)( 1 00 1 0     ll b l Y , (5) where bl 1 — the number of partial optimality criteria included in the convolution; l0 — normalized weight coefficient; l0 — normative partial criterion. The values of weight coefficients are assigned within a unified rating scale and normalized according to the expression: , 1 0 l b l l l      (6) where l is the current (non-normalized) value of the weight coefficient. The normalization of partial criteria aims to bring them to a single scale of change (0...1) and to the direction of minimization. Therefore, partial criteria that are minimized and those that are maximized are normalized separately. O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 104 Normalization can be implemented, for example, relative to the maximum (minimum) values characterizing the change in partial optimality criteria by expressions    min min min 0 max l l l , max max max 0 min l l l    , (7) where minmax l , maxmin l — maximum and minimum values of the minimizing and maximizing criteria in the interval of their consideration;  — the reserve coefficient, which varies between 0.1 and 0.3 and ensures the elimination of the operation of division by zero for normalization of values minmax l , maxmin l . Convolution (5) can be presented in matrix form TGY Φ=)( 0 , ],,,,[ 321 lG  , bl 1 , (8) T l ])1(,…,)1(,)1(,)1[(Φ 1 0 1 03 1 02 1 01   , bl 1 . To form a multi-criteria mathematical model of bank scoring, we formalize the appearance of the scoring card in the accepted notation (4) — see table 2. T a b l e 2 . The scoring card — formalized structure P F K S 1p 2p … pi kp  1f 2f … fi kf  1k 2k … ki kk  1s 2s … si ks  № 1w 2w … pkw 1 pkw  2 pkw  … p fk kw  … p f k sk k k kw    1 1(1)w 1(2)w … 1( )pkw 1(1 )pkw  1(2 )pkw  … 1( )p fk kw  … 1( + )p f k sk k k kw   2 2(1)w 2(2)w … 2( )pkw 2(1 )pkw  2(2 )pkw  … 2( )p fk kw  … 2( + )p f k sk k k kw   … … … … … … … … … … … ν ν(1)w ν(2)w … ν( )pkw ν(1 )pkw  ν(2 )pkw  … ν( )p fk kw  … ν( + )p f k sk k k kw   Taking into account the given designations, the generalized assessment of the v-th borrower according to the vector of criterion requirements (4) for the scoring card of Table 1 in accordance with the convolution (5) is determined by the expression: by the extended vector of criteria in scalar form: min)1(=)( 1 )0v()0v( 1 0       ll kkkk l wwY skfp , (9) by the extended vector of criteria in matrix form: T 0 Φ=)(  GwY , ],…,,,[ (l)(3)(2))1( G , skfp kkkkl 1 , (10) T ])w1(,…,)w1(,)w1(,)w1[(Φ 1 l)0v( 1 )03v( 1 )02v( 1 )01v(    , skfp kkkkl 1 . Multi-criteria mathematical model of credit scoring in Data Science problems Системні дослідження та інформаційні технології, 2025, № 3 105 Normalization of weight coefficients, as well as partial criteria included in (9), (10), is implemented according to expressions (6), (7), taking into account the direction of the extremum. To account a significant number of criteria in the generalized multicriteria assessment, it is advisable to use the technology of nested convolutions. This approach also allows regulating the influence of groups of scoring card indicators on the assessment result. This is implemented by sequentially (within the four groups of partial criteria (4)) reduction of partial criteria to the generalized by group and to the integrated efficiency criterion in scalar form: min)1(=)( 1 )0v()0v( 1 0      ll k l ppP p , min)1(=)( 1 )0v()0v( 1 0      ll k l ffF f , min)1(=)( 1 )0v()0v( 1 0      ll k l kkK k , min)1(=)( 1 )0v()0v( 1 0      ll s l ssS ,      1 00)0v( 1 00)0v(0 ))(1())(1()( fFpPwY F l P l   min))(1())(1( 1 000v 1 00)0v(      sSkK S l K l , (11)   11 0v 1 00 ]])[max1([)(      l k i ppP p ,   11 0v 1 0 ]])[max1([)(     l k i fpF p ,   11 0v 1 00 ]])[max1([)(      l k i kpK p ,   11 0v 1 00 ]])[max1([)(      l k i spS p . (12) Normalization (12) is performed relative to the worst assessment — the maximum value of the normalized indicator, which characterizes the partial criterion of the scoring card. Similarly, matrices are formed and generalized group criteria ratings, which are part of the matrix model of multicriteria scoring (10), are normalized. 1 00)0v( 1 00)0v(0 )1()1()(    T fvfv F l T pvpv P l GGwY     min)1()1( 1 000v 1 000v   T svsv S l T kvkv K l GG , ][ionnormalizat,,, )0v(0000 lsvkvfvpv GGGG  , skfp kkkkl ,,,1 are the same in structure, but may have different values O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 106 1T 00)0v( 1T 00)0v(0 )1()1()(    fvfv F lpvpv P l GGwY     min)1()1( 1T 000v 1T 000v   svsv S lkvkv K l GG , (13)  ][ionnormalizat,,, 0v0000 lsvkvfvpv GGGG  , 1 , , ,p f k sl k k k k  are the same in structure, but may have different values   T1 0v0 ])1[(ionnormalizat  lpv p , pkl 1 ,   T1 0v0 ])1[(ionnormalizat  lfv f , fkl 1 ,   T1 0v0 ])1[(ionnormalizat  lkv k , kkl 1 , (14)   T1 0v0 ])1[(ionnormalizat  lsv s , skl 1 . The interpretation of the obtained result involves bringing the value of the generalized assessment (9), (10), or in the form (11), (13) to a unified scale, for example, from 0 (the worst rating) to 1 (the best rating). This is achieved by normalizing the generalized score to the abstract worst customer score according to the expression I I I max 10  , 1 5 1 ])[max1(max    i i FI , (15) where lFmax — the worst possible value of the partial indicator;  — the reserve coefficient, which ensures the avoidance of incorrect operations during normalization. The obtained numerical assessment can be converted to the linguistic category of the client’s solvency according to the fundamental assessment scale of the Table 2. T a b l e 3 . Fundamental rating scale Integrated performance assessment 0I Linguistic category of efficiency 1,0 – 0,7 High 0,7 – 0,5 Good 0,5 – 0,4 Satisfactory 0,4 – 0,2 Low 0,2 and less Unsatisfactory The numerical evaluation of the normalized generalized indicator (15) (see the left column of Table 3) is proportional to the probability of the client returning the loan. That is, it characterizes the risk of providing a credit loan. Thus, expressions (9)–(15) form a multi-criteria mathematical model of credit scoring. The differences, advantages and features of the model are as follows. The model allows you to consider the indicators of the scoring card in terms of infological connections: factors, indicators, criteria, which contributes to increasing the adequacy of the ideal client profile of a banking institution. The model ensures obtaining a generalized client assessment as a solution to an opti- mization problem using a minimax approach to image requirements. The obtained Multi-criteria mathematical model of credit scoring in Data Science problems Системні дослідження та інформаційні технології, 2025, № 3 107 solution is Pareto-optimal. Subjective priorities of scoring card indicators can be taken into account in client assessment by adjusting both partial and group criteriaweights. The model is structurally open to adding scoring card indicators. The proposed model does not require a priori data on loan issuance/refusal to clients, that is, it implements an unsupervised learning scheme. Thus, the proposed model can be a primary superstructure, acting as a highly accurate binary classifier to deep learning methods based on artificial neural networks. Undoubtedly, artificial neural networks are designed and capable of accumulating large segments of labeled data, and the proposed multi-criteria model cannot compete with these advantages. But in the context of unsupervised learning, the multi-criteria model has better potential properties than the existing approaches in the essence of the formalization of the classification task. Research presented below has proven the model’s capability to detect fraud. The model fully meets the requirements for bank scoring models, which will also be proved by a computational example. ІІ. The methodology of multi-criteria credit scoring determines the sequence of actions for for performing calculations and obtaining the resulting assessment, including the following stages: 1. Establishing a set of indicators from the scoring card (Table 1) in the form of (4). 2. Normalizing criteria (4) using expressions (6), (7). 3. Formulating the generalized client assessment, expressions (9)–(14). 4. Interpreting the generalized assessment with normalization (15) and in accordance with Table 3. ІІІ. The technology of multicriteria credit scoring involves practical aspects of implementing the synthesized model (9)–(15) and the methodology of its application to the architecture of the software system and to a specific script- based implementation. The technological processes of multi-criteria credit scoring can be represented by the structural diagram in Fig. 1, which implements the architecture of the software script of credit scoring. Fig. 1. Structural diagram of the software implementation of the mathematical model 2.1. Parsing the file of indicators (criteria) of the scoring card / data 2.3. Normalization of indicators (criteria) 2.2. Determination of normalizing parameters 2.4. Integrated multicriteria assessment — SCOR ІІ. Formation of the scsoring model І. Preparation of Input Data 1.2. Analysis of the structure of input data 1.5. Data cleaning 1.4. Initial formation of the scoring table 1.5.1. Analysis of the intersection of scoring indicators and input data segment 1.1. Parsing the input data file 1.3. File parsing and analysis of the structure of scoring indicators 1.5.3. Clearing the scoring table from omissions 1.5.2. Formation of a DataFrame of data taking into account the missing indicators of the scoring table O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 108 Technological processes are divided into two blocks: preparation of input data and formation of the scoring model. The input data includes two files: scoring card indicators in “sample_data.xlsx” and indicator descriptions in “data_description.xlsx” (Fig. 2). In total, there are 121 scoring card indicators and 500 records of potential bank clients. The implementation of credit scoring technology in script form, according to the developed model and its application methodology, is carried out using the Python programming language with libraries such as NumPy and Pandas. The original project with the code is available at https://github.com/Pysarchuk- O/scoring.git . Over the input data files, a series of preparatory stages are implemented, dictated by the specifics and details of the data: parsing of data files and a — Scoring card sample_data.xlsx b — Indicators of the scoring card data description.xlsx Fig. 2. Structure of input data Multi-criteria mathematical model of credit scoring in Data Science problems Системні дослідження та інформаційні технології, 2025, № 3 109 converting them to the pandas – DataFrame format (blocks 1.1, 1.3); analyzing the structure of the input data by size, data types, and presence of missing values (block 1.2); initial formation of the scoring card and analysis of its structure for compliance with the fields of Table 1 (block 1.4); cleaning the input data from gaps using the rejection strategy in cases of a significant number of them and the impossibility of recovery based on the essence of the indicator (registration ad- dress, residence, place of work, etc.); selection of values for the scoring card based on the intersection of input data (Fig. 2, a) and indicators (Fig. 2, b) with further control of preservation of the of Table 1’s structure (block 1.5). After these stages, which are classic, a content analysis of indicators is implemented (also within Block 1.5’s functionality). This involves selecting objective- subjective indicators that purely characterize the borrower and are not secondary, as designated by the system (product status, product profile ID, etc.). To specify the data in Fig. 2, these are indicators from the scoring card marking fields (see Fig. 2, b) “borrower indication” and “parameters related to the issued product”. Analysis showed that the data in Fig. 2 do not have solutions for issuing the product, there is no data structure of the “training pair” type. Thus, fields related to “decision-making” (Fig. 2, b) are absent in the scoring card (Fig. 2, a). Therefore, we are dealing with data oriented towards implementing an unsupervised machine learning model. The result of the data preparation stage is a scoring card in general form, as shown in Fig. 2, a, which contains a list of indicators presented in Fig. 3. Analysis of the scoring card structure according to Fig. 2 and comparison with Table 1 confirms the correspondence and presence of the main categories for scoring. Further, a series of stages for script deployment and application of the proposed multicriteria credit scoring model is implemented. This is implemented in accordance with the formulated methodology and is reflected in the structural diagram in Fig. 1 by blocks 2.1–2.4. The initial step to initiate the model’s operation involves parsing the indicators (criteria) file of the scoring card (block 2.1). The criteria requirements for the scoring card indicators are formulated based on the values remaining after cleaning and balancing (Fig. 3). These criteria should reflect the ideal client profile and are unique to each banking institution. It is possible, for example, to establish a system of criteria requirements shown in Fig. 4. They are used for the calculation example. Fig. 3. Scoring card indicators after data cleaning and balancing d_segment_data_ description_cleaning.xlsx O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 110 Further, the selection of minimax indicators indicated in Fig. 4 from the scoring map of Fig. 1 and the formation of an integrated multicriteria assessment (scor-py) according to model (9) is implemented. The calculation results are presented in the graphs of Fig. 5. The graphs in Fig. 5 show the unnormalized scoring values (scor axis) for each client included in the scoring card in Fig. 2, a (client axis). Fig. 5, a presents the scoring evaluation for 150 clients without abnormally high scores. Fig. 5, b and 5, c show the scoring evaluations for 250 and all 500 clients with abnormally high score values. The red line characterizes the empirical value of dividing customers into creditworthy and non-creditworthy. The results shown in Fig. 5, a demonstrate a strictly binary classification of clients into the mentioned categories. Differences in the scores of creditworthy clients reflect their internal distribution, proportional to the risk characteristics of granting credit. Abnormally high scoring values presented in Fig. 5, b and 5, c indicate possible fraud in the data or incorrectly filled fields in the scoring card. Fraud analysis involves reversing the process of analyzing the criteria vector for each indicator individually and in the multicriteria scoring complex. Fig. 4. Criterion requirements for scoring card indicators d_segment_data_description_ cleaning_minimax.xlsx Client S co r а — The scoring evaluation for 150 clients without abnormally high scores Fig. 5. Client Creditworthiness Assessment — scor (Beginning) Multi-criteria mathematical model of credit scoring in Data Science problems Системні дослідження та інформаційні технології, 2025, № 3 111 Comparison of the above results of calculations obtained in accordance with the proposed approach was carried out with discriminant analysis on a similar segment of data. The results of the comparison in more than 80% did not contradict each other, which indicates the effectiveness of the proposed approach. CONCLUSIONS In the research, a multi-criteria mathematical model of credit scoring for Data Science tasks was developed. The model provides the formation of the ideal client profile in the format of criterial requirements. The integrated assessment is formed using convolution according to a nonlinear trade-off scheme and represents the solution to an optimization problem, belonging to Pareto optimal solutions. The model operates on data which corresponds to the principles of unsupervised learning. It can be applied independently or serve as a foundational layer for deep learning methods. The practical application of the model is facili- tated by its methodological and technological representations. A computational example of applying the model to real big data proved its effectiveness for the decision-making stage of customer lending and fraud detection. Client S co r b — The scoring evaluations for 250 clients with abnormally high score values c — The scoring evaluations for 500 clients with abnormally high score values Client S co r Fig. 5. Client Creditworthiness Assessment — scor (Continued) O.O. Pysarchuk, M.D. Vasylieva, D.R. Baran, І.О. Pysarchuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 3 112 REFERENCES 1. Mohamed Hani AbdElHamid, Mohamed Tawfik ElMasry, Machine learning approach for credit score analysis: a case study of predicting mortgage loan defaults. NOVA In- formation Management School Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa, 2019, 60p. Available: https://run.unl.pt/bitstream/ 10362/62427/ 4/TEGI0439.pdf 2. Maria Fernandez Vidal, Fernando Barbon, Credit scoring in financial inclusion. How to use advanced analytics to build credit-scoring models that increase access. Washington: Consul- tative Group to Assist the Poor: World Bank, 2019, 52 p. Available: https://www. findevgateway.org/guide-toolkit/2019/07/credit-scoring-financial-inclusion 3. “Credit scoring approaches guidelines,” The World Bank Group, 2019, 64 p. Avail- able: https://thedocs.worldbank.org/en/doc/935891585869698451-0130022020/ original/ CREDITSCORINGAPPROACHESGUIDELINESFINALWEB.pdf 4. Rory P. Bunker, M. Asif Naeem, Wenjun Zhang, “Improving a Credit Scoring Model by Incorporating Bank Statement Derived Features,” Preprint submitted to Expert Systems with Applications, 2017. Available: https://www.researchgate.net/ publication/309606871_ Improving_a_Credit_Scoring_Model_by_Incorporating_Bank_Statement_Derived_Features 5. Engku Muhammad Nazri E.A. Bakar, “Credit scoring models: techniques and issues,” Journal of Advanced Research in Business and Management Studies, vol. 7, issue 2, pp. 29–41, 2017. Available: https://www.researchgate.net/publication/323306120_ Credit_scoring_models_techniques_and_issues 6. Gerrit Muller, System Modeling and Analysis: a Practical Approach. University of South-Eastern Norway-NISE, 2023, 128 p. Available: https://www.gaudisite.nl/ Sys- temModelingAndAnalysisBook.pdf 7. Nolberto Munier, Mathematical Modelling of Decision Problems. Using the SIMUS Method for Complex Scenarios. Springer, 2021, 196 p. 8. Albert Voronin, Multi-Criteria Decision Making for the Management of Complex Sys- tems. IGI Global, 2017, 220 p. 9. Goran Cirovi´c, Dragan Pamuˇcar, Multiple-Criteria Decision Making. MDPI, 2022, 313 p. Received 10.07.2024 INFORMATION ON THE ARTICLE Oleksii O. Pysarchuk, ORCID: 0000-0001-5271-0248, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: PlatinumPA2212@gmail.com Maria D. Vasylieva, ORCID: 0000-0003-2217-643X, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: mdvasilieva@gmail.com Danylo R. Baran, ORCID: 0000-0002-3251-8897, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: danil.baran15@gmail.com Illya O. Pysarchuk, ORCID: 0000-0003-4343-0142, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: flimka134@gmail.com БАГАТОКРИТЕРІАЛЬНА МАТЕМАТИЧНА МОДЕЛЬ КРЕДИТНОГО СКОРИНГУ В ЗАДАЧАХ DATA SCIENCE / O.O. Писарчук, M.Д. Васильєва, Д.Р. Баран, І.О. Писарчук Анотація. Запропоновано багатокритеріальну оптимізаційну математичну мо- дель кредитного скорингу. Модель отримано з використанням нелінійної схе- ми компромісів для вирішення задач багатокритеріальної оптимізації, що до- зволяє побудувати Парето-оптимальне рішення. Запропонований підхід формує інтегровану оцінку кредитоспроможності позичальника на основі структурованого набору показників, що відображають фінансовий, кредитний та соціальний профіль клієнтів. Модель призначено для використання в інте- лектуальних CRM та ERP системах, що працюють з великими даними, і не по- требує розмічених навчальних вибірок, що робить її придатною для задач навчан- ня без учителя. Вона також може слугувати базовим рівнем для подальшого аналізу з використанням методів глибинного навчання. Описано методологічні кроки впровадження моделі, від нормалізації показників до прийняття остаточних рішень. Технологічна реалізація демонструє ефективність моделі в автоматизова- ному прийнятті рішень щодо кредитування і виявленні шахрайства. Ключові слова: Data Science, Big Data, SCORIG машинне навчання, прийняття рішень, багатокритеріальні математичні моделі, інтелектуальні СRM, ERP системи.
id journaliasakpiua-article-308786
institution System research and information technologies
keywords_txt_mv keywords
language English
last_indexed 2025-11-09T02:11:02Z
publishDate 2025
publisher The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
record_format ojs
resource_txt_mv journaliasakpiua/1e/0654b8d1cc2e0d4998d221429122ae1e.pdf
spelling journaliasakpiua-article-3087862025-11-09T00:01:30Z Multi-criteria mathematical model of credit scoring in data science problems Багатокритеріальна математична модель кредитного скорингу в задачах data science Pysarchuk, Oleksii Vasylieva, Maria Baran, Danylo Pysarchuk, Illya Data Science Big Data SCORIG machine learning decision making multi-criteria mathematical models intelligent CRM ERP systems Data Science Big Data SCORIG машинне навчання прийняття рішень багатокритеріальні математичні моделі інтелектуальні CRM ERP системи A multi-criteria optimization mathematical model of credit scoring is proposed. The model is derived using a nonlinear trade-off scheme to solve multi-criteria optimization problems, allowing for the construction of a Pareto-optimal solution. The proposed approach forms an integrated assessment of a borrower’s creditworthiness based on a structured set of indicators that reflect the financial, credit, and social profile of clients. The model is designed for use in intelligent CRM and ERP systems operating on Big Data and does not rely on labeled training samples, making it applicable to unsupervised learning tasks. It can also serve as a foundational layer for further deep-learning analysis. Methodological steps for implementing the model, from indicator normalization to final decision-making, are described. A technological implementation demonstrates the model’s effectiveness in automated loan decisions and fraud detection. Запропоновано багатокритеріальну оптимізаційну математичну модель кредитного скорингу. Модель отримано з використанням нелінійної схеми компромісів для вирішення задач багатокритеріальної оптимізації, що дозволяє побудувати Парето-оптимальне рішення. Запропонований підхід формує інтегровану оцінку кредитоспроможності позичальника на основі структурованого набору показників, що відображають фінансовий, кредитний та соціальний профіль клієнтів. Модель призначено для використання в інтелектуальних CRM та ERP системах, що працюють з великими даними, і не потребує розмічених навчальних вибірок, що робить її придатною для задач навчання без учителя. Вона також може слугувати базовим рівнем для подальшого аналізу з використанням методів глибинного навчання. Описано методологічні кроки впровадження моделі, від нормалізації показників до прийняття остаточних рішень. Технологічна реалізація демонструє ефективність моделі в автоматизованому прийнятті рішень щодо кредитування і виявленні шахрайства. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025-09-29 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/308786 10.20535/SRIT.2308-8893.2025.3.08 System research and information technologies; No. 3 (2025); 99-112 Системные исследования и информационные технологии; № 3 (2025); 99-112 Системні дослідження та інформаційні технології; № 3 (2025); 99-112 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/308786/331015
spellingShingle Data Science
Big Data
SCORIG машинне навчання
прийняття рішень
багатокритеріальні математичні моделі
інтелектуальні CRM
ERP системи
Pysarchuk, Oleksii
Vasylieva, Maria
Baran, Danylo
Pysarchuk, Illya
Багатокритеріальна математична модель кредитного скорингу в задачах data science
title Багатокритеріальна математична модель кредитного скорингу в задачах data science
title_alt Multi-criteria mathematical model of credit scoring in data science problems
title_full Багатокритеріальна математична модель кредитного скорингу в задачах data science
title_fullStr Багатокритеріальна математична модель кредитного скорингу в задачах data science
title_full_unstemmed Багатокритеріальна математична модель кредитного скорингу в задачах data science
title_short Багатокритеріальна математична модель кредитного скорингу в задачах data science
title_sort багатокритеріальна математична модель кредитного скорингу в задачах data science
topic Data Science
Big Data
SCORIG машинне навчання
прийняття рішень
багатокритеріальні математичні моделі
інтелектуальні CRM
ERP системи
topic_facet Data Science
Big Data
SCORIG machine learning
decision making
multi-criteria mathematical models
intelligent CRM
ERP systems
Data Science
Big Data
SCORIG машинне навчання
прийняття рішень
багатокритеріальні математичні моделі
інтелектуальні CRM
ERP системи
url https://journal.iasa.kpi.ua/article/view/308786
work_keys_str_mv AT pysarchukoleksii multicriteriamathematicalmodelofcreditscoringindatascienceproblems
AT vasylievamaria multicriteriamathematicalmodelofcreditscoringindatascienceproblems
AT barandanylo multicriteriamathematicalmodelofcreditscoringindatascienceproblems
AT pysarchukillya multicriteriamathematicalmodelofcreditscoringindatascienceproblems
AT pysarchukoleksii bagatokriteríalʹnamatematičnamodelʹkreditnogoskoringuvzadačahdatascience
AT vasylievamaria bagatokriteríalʹnamatematičnamodelʹkreditnogoskoringuvzadačahdatascience
AT barandanylo bagatokriteríalʹnamatematičnamodelʹkreditnogoskoringuvzadačahdatascience
AT pysarchukillya bagatokriteríalʹnamatematičnamodelʹkreditnogoskoringuvzadačahdatascience