Аналіз актуарних ризиків за допомогою узагальнених лінійних моделей

The problem of applying generalized linear models to the analysis of actuarial risks in the context of premium charges to clients was considered. The Monte-Carlo method for Markov chains was applied. Two situations were considered for the computational experiment. For the first one, insurance indica...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2025
Hauptverfasser: Panibratov, Roman, Bidyuk, Petro
Format: Artikel
Sprache:Englisch
Veröffentlicht: The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025
Schlagworte:
Online Zugang:https://journal.iasa.kpi.ua/article/view/351421
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:System research and information technologies
Завантажити файл: Pdf

Institution

System research and information technologies
_version_ 1867334455850434560
author Panibratov, Roman
Bidyuk, Petro
author_facet Panibratov, Roman
Bidyuk, Petro
author_institution_txt_mv [ { "author": "Roman Panibratov", "institution": "National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”" }, { "author": "Petro Bidyuk", "institution": "National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv" } ]
author_sort Panibratov, Roman
baseUrl_str http://journal.iasa.kpi.ua/oai
collection OJS
datestamp_date 2026-02-02T20:49:24Z
description The problem of applying generalized linear models to the analysis of actuarial risks in the context of premium charges to clients was considered. The Monte-Carlo method for Markov chains was applied. Two situations were considered for the computational experiment. For the first one, insurance indicators and the target variable were randomly assigned due to the problem of public data access. To create three datasets, charges were generated from normal, gamma, and Pareto distributions with dynamic variance, and noise was added to stimulate a non-stationary process. In the second situation, actual actuarial data from the Singa-pore Actuarial Society was used. Generalized Linear Models with normal dis-tribution and logarithmic link function, an exponential distribution and loga-rithmic link function, and Laplace distribution with identity link function were constructed. Based on the model-fitting quality metrics, conclusions were drawn about their structure.
doi_str_mv 10.20535/SRIT.2308-8893.2025.4.04
first_indexed 2026-02-08T08:06:11Z
format Article
fulltext  R.S. Panibratov, P.I. Bidyuk, 2025 58 ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 TIДC МЕТОДИ АНАЛІЗУ ТА УПРАВЛІННЯ СИСТЕМАМИ В УМОВАХ РИЗИКУ І НЕВИЗНАЧЕНОСТІ UDC 004.852 DOI: 10.20535/SRIT.2308-8893.2025.4.04 ANALYSIS OF ACTUARIAL RISK WITH GENERALIZED LINEAR MODELS R.S. PANIBRATOV, P.I. BIDYUK Abstract. The problem of applying generalized linear models to the analysis of ac- tuarial risks in the context of premium charges to clients was considered. The Monte-Carlo method for Markov chains was applied. Two situations were consid- ered for the computational experiment. For the first one, insurance indicators and the target variable were randomly assigned due to the problem of public data access. To create three datasets, charges were generated from normal, gamma, and Pareto dis- tributions with dynamic variance, and noise was added to stimulate a non-stationary process. In the second situation, actual actuarial data from the Singapore Actuarial Society was used. Generalized Linear Models with normal distribution and loga- rithmic link function, an exponential distribution and logarithmic link function, and Laplace distribution with identity link function were constructed. Based on the model-fitting quality metrics, conclusions were drawn about their structure. Keywords: actuarial risk, generalized linear models, simulation modeling, exponen- tial family of distributions, Bayesian data analysis, Monte Carlo method for Markov chains. INTRODUCTION Since insurance protects people and organizations financially against a variety of risks, it is seen as a fundamental component of the economy. Because it assists in managing and reducing the risks involved in providing insurance to both consum- ers and businesses: actuarial science is essential to the insurance sector. A thor- ough understanding of mathematics, statistics, finance, and economics is neces- sary to work as an actuary. Actuaries apply their knowledge to assist insurance companies in estimating the cost of possible risks and estimating the probability of future events. In order to reduce the risks and minimize the financial impact of unpredict- able events, the insurance sector is essential. The frequency or timing of these occurrences, however, cannot be predicted. Actuarial risk, or the likelihood of an event happening and the possible financial impact it may have, is a key compo- nent that insurance companies utilize to prevent themselves from financial catas- trophe. Because actuarial risk is a complicated process that calls for certain knowledge and skills, actuaries are important to the insurance sector. Actuarial Analysis of actuarial risk with generalized linear models Системні дослідження та інформаційні технології, 2025, № 4 59 risk is fundamentally about estimating the probability of an unfavorable event happening and the possible financial consequences it may have. Actuaries analyze data and forecast the probability of an event by using complex mathematical models. They then use this data to estimate the event’s financial effect and com- pute the premium needed to cover the risk. The business of insurance companies is risk management. Actuaries are essential in assisting insurance firms in figuring out how much risk they may accept while maintaining their financial stability. They accomplish this via examining historical data and applying statistical tech- niques to forecast the probability that comparable occurrences will take place in the future. The insurance business uses the Generalized Linear Model (GLM), a statisti- cal technique, to calculate insurance policy prices. In order to analyze and fore- cast the anticipated cost of claims based on different risk indicators related to the insured entities, generalized linear models are used. Compared to simpler linear models, these models offer a more complex and precise pricing mechanism by allowing actuaries and analysts to include various data types and variable rela- tionships, such as the linear or exponential relationship between risk factors and claim costs. Linear models are a specific instance of the many models that comprise up GLM. The assumptions of normality, constant variance, and additive effect of that are restricted in linear models are eliminated. Rather, it is assumed that the re- sponse variable belongs to the exponential distribution family. The exponential distributions family consists of the next structure [1]:           ),( )( )( exp);;( i i iii ii yc a by yf , where )(ia , )( ib  and ),( iyc are prior defined functions; i is parameter, associated with mean;  is parameter, associated with variance. Additionally, the variance is allowed to change simultaneously with the dis- tribution mean. Lastly, on a transformed scale, it is believed that the variables’ effects on the response variable are additive [2]. For GLM, the following assumptions are made: 1. Stochastic component: every component of Y comes from the single exponential family distribution and is independent. 2. Systematic component: the linear predictor  is formed from p ex- planatory variables:  X , where X is design matrix;  is vector of estimation parameters. 3. Link function: relationship between stochastic and systematic compo- nent is defined by the link function, which is monotonic and differentiable: ) ( ][ 1  gYE . Problem Statement. The purpose of the study is to apply GLM for analysis of actuarial risks using different distributions and specified link functions and previously applying Bayesian data analysis. R.S. Panibratov, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 60 IMPORTANCE OF GLM Because they offer a versatile framework for modeling the link between the re- sponse variable (say, such as the frequency or cost of a claim) and one or more predictor factors (such as age, vehicle type or geographic area), GLMs are also utilized in insurance pricing. The authors of [3] emphasized that when doing statistical studies with GLMs, non-robustness against outliers is an important consideration. Addition- ally, they demonstrated that there aren’t many reliable options, particularly when performing Bayesian statistical analysis. Focusing on gamma GLM, a widely used tool in actuarial science, they put forth a robust and efficient modeling-based method that can be applied to both frequentists and Bayesian studies. The sug- gested model can be easily estimated, at least on small-to-moderate-sized data sets, and is simple to analyze and comprehend. The authors of [4] presented a brand-new deep learning technique called Deeply-learned Generalized Linear Model with Missing Data (DLGLM), which can make predictions and estimate coefficients even when there is missing not at random (MNAR) data. The creation of the data matrix and the connections be- tween the response variable and the mask of missing values are modeled by DLGLM using deep learning neural network architecture. They were able to gen- eralize the conventional GLM this way, taking into consideration both ignorable and non-ignorable types of missing values in the data, as well as intricate nonlin- ear relationships between the features. Through simulations and actual data analy- ses, the authors also showed that DLGLM outperforms alternative impute-then- regress techniques, such as mean and mouse imputation, in terms of coefficient estimation and prediction when MNAR missing values are present. The problem of GLM transfer learning was studied in [5]. Bounds for esti- mate error and the prediction error measure with fast and slow rates under various scenarios are derived by the authors, who also suggested GLM transfer learning methods. To create confidence intervals for each coefficient component with the- oretical assurances, they took into account the two-step transfer learning ap- proach. At last, they used a real-data research and simulations to show how effec- tive their algorithms were. In the context of claim counts modeling, the authors of [6] suggested a method for identifying the next-best interaction to be added to an arbitrary but fixed benchmark GLM. They started by training a combined actuarial neural net- work (CANN) model, which is essentially a neural network that improves the benchmark GLM. Second, they sorted interactions by their strength and quanti- fied the strength of interactions between each pair of characteristics using a quick model-specific technique called Neural Interaction Detection. Third, they com- pared a few small GLMs that matched the top-ranked interactions to determine the next-best interaction. This technique offers two benefits. First of all, it is com- pletely automatable method of adding the next-best interaction that is absent from the benchmark GLM. Second, according to Friedman’s H-statistic, the authors’ methodology is quicker than alternative strategies. As a result, enormous data sets containing millions of observations and dozens of attributes are particularly well- suited for the proposed technique. Consequently, it can significantly reduce the time that price actuaries spend looking for interactions to enhance their GLMs, which is often time-consuming and visual process. Analysis of actuarial risk with generalized linear models Системні дослідження та інформаційні технології, 2025, № 4 61 It was demonstrated in [7] that GLM is the best choice for estimation of op- erational risk. This approach demonstrated excellent risk estimating quality with minimum errors. Alternative methods of estimating parameters of GLM were analyzed in [8]. MONTE-CARLO METHOD FOR MARKOV CHAIN Finding the posterior distribution is the primary objective of Bayesian data analysis: )( )()|( )|( XP PXP XP   , where X is state space vector;  is a parameter of distribution; )|( XP is the likelihood; )(P is the prior; )(XP is a normalizing constant, also known as the evidence or marginal likelihood. The denominator can be expressed as follows:   *** ) () |(  dPXPXP . The challenge of assessing the integral in the denominator is the computing problem. Markov Chain Monte Carlo (MCMC) is the most significant of the Monte Carlo techniques that may be employed. MCMC is the method that uses a Markov chain mechanism to generate sam- ples )(ix while exploring the state space, X . The purpose of this technique is to increase the amount of time the chain spends in the most crucial areas [9]. It is specifically designed to make the samples )(ix resemble samples generated from the desired distribution, )(xp . Monte Carlo is the method for approximating a desired quantity by sampling from a probability distribution. It estimates a deterministic quantity of interest using randomization. The Monte Carlo approach is used to approximate such numbers by averaging over samples. For example, if there is an expectation or expectations to estimate, s , they may be extremely complicated integrals or per- haps impossible to estimate: )]([ )()( xfEdxxfxps p , )( 1~ 1 i n i n xf n s    , where )(xf is the probability density function. The standard error might be decreased and a reasonably good estimate could be obtained by calculating the average across a large number of samples. One drawback of this approach is that it makes the assumption that sampling from a probability distribution is simple, which isn’t always feasible. In many cases, sampling from the distribution is not even feasible. In these situations, we effi- ciently sample from an intractable probability distribution by using Markov chains. With a modification, MCMC techniques function similarly to normal Monte Carlo methods, but the produced drawings nxx ,...,1 are serially correlated rather R.S. Panibratov, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 62 than independent. Specifically, they are the realizations of a Markov Chain con- sisting of N random variables, nXX ,...,1 . If and only if, for all positive integers k and, n , these future observations niX  are conditionally independent of the previous values kiX  given the present value p iX , then a random sequence }{ iX is Markov chain: )|(),...,,|( 1 inikiiini XxXPXXXxXP   . This condition that sometimes is referred to as Markov property, indicates that the process is memoryless: the probability distribution of the chain’s future values is only dependent on its present value con iX , independent of how the val- ue was arrived at (e. g. the chain’s previous transition). Although MCMC comes in a variety of flavors, the Metropolis–Hastings random walk algorithm is the easiest to implement. Standard uniform distribution, proposal distribution )(xp and the target distribution must be used for applying Metropolis–Hastings algorithm. The following steps how this algorithm works when given an initial predic- tion for  that has a positive probability of being drawn. 1. Select a new suggested value p that equals  p , where  has specific distribution for transition (for example, Normal). 2. Calculate the ratio )|( )|( Xg Xg p    , where g is the posterior probability. 3. To preserve the precise balance of the stationary distribution in the event that the proposal distribution is not symmetrical, the acceptance probability must be weighted and then calculated: )|()|( )|()|(    p pp pXg pXg . Given that ratios are being taken, any distribution proportional to g will likewise be canceled by denominator, therefor it may be utilized as follows: )()|( )()|(    pXp pXp pp . 4. If 1 , then p . If 1 , then p with probability  , else  , where the uniform dis- tribution is used. 5. Repeat earlier steps. Authors in [10] showed that MCMC approaches appear to be quite helpful in a wide range of applications. However, because MCMC methods are imprecise, deviations from the correct findings may occur due to their unpredictability. Be- cause no guaranty can be provided, MCMC should only be utilized in extreme cases and only when there are no other options. As the parameters change over Analysis of actuarial risk with generalized linear models Системні дослідження та інформаційні технології, 2025, № 4 63 time, performance may also be maximized by dynamically modifying the parame- ters, especially the covariance matrix, without changing the distribution. Further- more, for low correlations in higher dimensions, other modifications to Metropo- lis–Hastings are needed. The authors of [11] presented a Poisson–Rayleigh model, which is also known as the PR-distribution, with two parameters. They were able to get a num- ber of distinct features. The parameters of the PR distribution have been estimated using Bayesian methods, maximum likelihood, and maximum product spacing. For Bayesian estimation, the estimators were approximated using point and inter- val estimation using the MCMC approach, which is based on a symmetric loss function. A Bayesian estimator based on gamma priors has been proposed. New diagnostics for evaluating MCMC algorithms efficiency, reliability, and flexibility using control and attainment maps were presented in [12]. The time needed for hyper-parameter adjustment may be shortened by the results of these new diagnostics. The diagnostics themselves can be carried out on computation- ally reasonable test problems with known posteriors, as demonstrated there, but they need a non-trivial computational experiment. The results of these diagnostics may be used to determine the optimal algorithm and matching hyper-parameter setup for calibrating a real-world issue that is more computationally demanding and shares traits with the test problems. The convergence of that particular search procedure may then be evaluated by applying the current MCMC diagnostics to the single calibration run of the real-world issue. In order to increase effectiveness of posterior exploration using MCMC techniques, a Kalman-inspired proposal distribution was presented in [13]. Simi- lar to the analysis stage in the Kalman filter, this novel proposal distribution cre- ates candidate states by taking use the cross covariances of model parameters, measurements, and model outputs. The asymmetric nature of the Kalman-inspired proposal distribution limits its application to a brief burn-in time, following which the chains are evolved using a combination of parallel direction and snooker can- didate states. The sampled chains will converge to the precise target distribution thanks to diminishing adaptability. The new proposal distribution may be easily included into any suitable MCMC technique and is not restricted to any particular MCMC methodology. The authors of [14] investigated Metropolis–Hastings Markov chain conver- gence rates. The validity of appropriate central limit theorems for Markov chains can be ensured by qualitative convergence rates. The impact of growing dimen- sions, data size, and other variables on these algorithms’ efficiency can be better understood by looking at explicit convergence rates. However, a significant amount of work is still needed in this field since explicit quantitative convergence rates are difficult to establish and remain elusive in many situations of relevance. These subjects are crucial for comprehending Metropolis–Hastings behavior in contemporary issues where there may be a lot of data, a lot of dimensions, or both. NUMERICAL EXPERIMENT WITH ARTIFICIAL DATA Due to the case, that actuarial data is not always available, it was decided to simu- late first actuarial insurance data artificially following the next structure. Three datasets for experiment were created. For imitating data of policyholders the next features were used: R.S. Panibratov, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 64 1. Age: numerical variable, which shows age of client and ranges between 19 and 64. 2. Sex: categorical variable, which identifies sex of client and has states ‘M’ for male and ‘F’ for female. 3. BMI: numerical variable, which shows body mass index of client. Uni- form distribution was used for generation. 4. Region: categorical variable, which shows place of client’s residence and has state ‘A’, ‘B’, ‘C’ and ‘D’. 5. Medical History: categorical variable, which identifies history of previ- ous illnesses of clients and has state ‘Diabetes’, ‘High blood pressure’ or ‘None’. 6. Exercise: categorical variable, which shows if client does exercise. It has states ‘Always’, ‘Rarely’ or ‘Never’. 7. Worker Status: categorical variable, which shows working status of cli- ent and has states ‘Employed’, ‘Student’ and ‘Unemployed’. 8. Charges: numerical variable which shows total charges by the insurance company. This is target variable. For the last feature next 3 distributions were use:  Normal;  Gamma;  Pareto. For making charges as non-stationary process, algorithm of mixture distribu- tion was applied, which consist of the next steps: 1. Generate random variable p , which has uniform distribution )1,0( ~ Up . 2. If            , 1 1 1 , i k i i k i ppp , then generate variable with chosen distribution with fixed parameter of centre and randomly generated scale parameter. 3. Repeat until size of the dataset will be reached. After generating target variable, the noise, which has zero mean and variable standard deviation was added. Three GLMs were built for forecasting were implemented with specified link functions: 1. GLM with normal distribution and logarithmic link function. 2. GLM with exponential distribution and logarithmic link function. 3. GLM with Laplace distribution and identity link function. After implementing GLMs by using MCMC method next metrics of models quality were used:  Logarithm of maximized value of a likelihood function.  Akaike information criterion (AIC): ) ~ (ln *2 * 2 LkAIC  , where L ~ is maximized value of likelihood function; k is the number of esti- mated parameters.  Bayesian information criterion (BIC): ) ~ (ln * 2 )(ln * LnkBIC  , where L ~ is maximized value of likelihood function; k is the number of esti- mated parameters; n is the number of data points. Analysis of actuarial risk with generalized linear models Системні дослідження та інформаційні технології, 2025, № 4 65 The metric results of GLM parameters estimation for three distinct datasets are shown in Tables 1–3. T a b l e 1 . Results of GLM construction using simulated actuarial insurance data, where charges have normal distribution Metric GLM Normal GLM Exponential GLM Laplace Log-Likelihood 1.248 2.552 1.943 AIC 15.503 10.896 14.1134.11 BIC 56.464 47.305 55.0735.0 T a b l e 2 . Results of GLM construction for simulated actuarial insurance data, where claim payments have gamma distribution Metric GLM Normal GLM Exponential GLM Laplace Log-Likelihood 1.16 3.3053. 2.232 AIC 15.68 9.39 13.535 BIC 56.64 35.798 54.495 T a b l e 3 . Results of GLM construction for simulated actuarial insurance data, where claim payments have Pareto distribution Metric GLM Normal GLM Exponential GLM Laplace Log-Likelihood 0.261 1.678 0.641 AIC 17.479 12.644 16.718 BIC 58.439 49.053 57.677 From the results of fitting GLMs it can be seen, that GLM with exponential distribution and log link function demonstrated the best results for all datasets. On the other side, GLM with Laplace distribution and identity link function also showed acceptable results for dataset with normal distributions of charges. Results of forecasting for best GLM models using different datasets are shown on Figs. 1–4. 1 2 3 1 — 2 — 3 — Fig. 1. Result of forecasting GLM with exponential distribution and log link function for charges, which have normal distribution R.S. Panibratov, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 66 1 2 3 1 — 2 — 3 — Fig. 2. Result of forecasting GLM with Laplace distribution and identity link function for charges, which have normal distribution 1 2 3 1 — 2 — 3 — Fig. 3. Result of forecasting GLM with exponential distribution and log link function for charges, which have gamma distribution Fig. 4. Result of forecasting GLM with exponential distribution and log link function for charges, which have Pareto distribution 1 2 3 1 — 2 — 3 — Analysis of actuarial risk with generalized linear models Системні дослідження та інформаційні технології, 2025, № 4 67 Tables 4–7 show numerical summaries of posterior parameter estimates for the best GLMs with different datasets, which include mean value, standard deviation and highest density region (3% and 97%). T a b l e 4 . Numerical characteristics of posterior parameter estimates for expo- nential GLM and charges with normal distribution Parameter Mean Std HDI-3% HDI-97% Intercept -1.986 0.147 -2.263 -1.722 Age 0.047 0.124 -0.192 0.283 Sex -0.035 0.076 -0.181 0.097 BMI 0.071 0.131 -0.149 0.338 Region -0.003 0.033 -0.066 0.056 MedHistory 0.001 0.047 -0.079 0.092 Exercise -0.045 0.048 -0.131 0.038 WorkerStatus 0.000 0.046 -0.089 0.088 T a b l e 5 . Numerical characteristics of posterior parameter estimates for La- place GLM and charges with normal distribution Parameter Mean Std HDI-3% HDI-97% b 0.082 0.003 0.077 0.088 Intercept 0.087 0.012 0.062 0.109 Age 0.019 0.012 -0.003 0.043 Sex -0.012 0.007 -0.025 0.001 BMI 0.015 0.012 -0.007 0.038 Region -0.000 0.003 -0.006 0.005 MedHistory -0.007 0.004 -0.015 0.000 Exercise -0.000 0.004 -0.009 0.007 WorkerStatus 0.003 0.004 -0.005 0.010 T a b l e 6 . Numerical characteristics of posterior parameter estimates for expo- nential GLM and charges with gamma distribution Parameter Mean Std HDI-3% HDI-97% Intercept -2.975 0.157 -3.266 -2.659 Age -0.091 0.136 -0.335 0.17 Sex -0.062 0.077 -0.203 0.083 BMI 0.253 0.139 -0.014 0.498 Region 0.150 0.036 0.078 0.212 MedHistory -0.001 0.049 -0.087 0.1 Exercise 0.094 0.047 0.008 0.185 WorkerStatus -0.06 0.046 -0.142 0.027 T a b l e 7 . Numerical characteristics of posterior parameter estimates for expo- nential GLM for claim payments with Pareto distribution Parameter Mean Std HDI-3% HDI-97% Intercept -1.034 0.152 -1.326 -0.774 Age 0.013 0.13 -0.218 0.271 Sex 0.069 0.075 -0.061 0.212 BMI -0.174 0.135 -0.42 0.093 Region 0.067 0.037 -0.004 0.136 MedHistory -0.019 0.047 -0.114 0.059 Exercise -0.087 0.047 -0.17 0.002 WorkerStatus 0.038 0.046 -0.044 0.121 R.S. Panibratov, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 68 NUMERICAL EXPERIMENT WITH ACTUAL DATA For this scenario the actual actuarial data of insurance company were applied for fitting GLM. Dataset was taken from Singapore Actuarial Society. All of the worker compensation insurance policies in this dataset have experienced an acci- dent. The next features were used: 1. Age. 2. Sex. 3. MaritalStatus: categorical variable, which identifies marital status of clients. 4. DependentChildren: numerical variable, which shows number of de- pendent children. 5. DependentOthers: numerical variable, which shows number of depend- ent, excluding children. 6. WeeklyWages: numerical variable, which shows total weekly wage. 7. PartFullTime: categorical variable, which shows working mode. 8. HoursWorkedPerWeek: numerical variable, which shows total hours worked per week. 9. DaysWorkedPerWeek: numerical variable, which shows number of days worked per week. 10. UltimateIncurredClaimCost: numerical variable which shows total claims payments by the insurance company. This is target variable. Results of fitting GLM from previous experiment are shown in Table 8. T a b l e 8 . Results of GLM construction using actual insurance actuarial data Metric GLM Normal GLM Exponential GLM Laplace Log-Likelihood 2.155 6.231 3.593 AIC 17.691 7.538 14.815 BIC 67.753 53.048 64.877 It can be observed that the exponential GLM with logarithmic link demonstrated best results among others for real dataset. Results of forecasting for best GLM model for real dataset are shown in Fig. 5. 1 2 3 1 — 2 — 3 — Fig. 5. Result of forecasting GLM with exponential distribution and log link function for actual actuarial insurance data Analysis of actuarial risk with generalized linear models Системні дослідження та інформаційні технології, 2025, № 4 69 Table 9 show numerical summaries of posterior parameter estimates for best GLM. T a b l e 9 . Numerical characteristics of posterior parameter estimates for expo- nential GLM and claim payments from real dataset Parameter Mean Std HDI-3% HDI-97% Intercept -4.051 0.246 -4.485 -3.594 Age 0.934 0.191 0.577 1.266 Sex -0.326 0.1 -0.515 -0.142 MaritalStatus -0.346 0.064 -0.467 -0.229 DependentChildren 3.440 0.502 2.521 4.403 DependentOthers -0.232 0.505 -1.064 0.776 WeeklyWages 4.722 0.459 3.882 5.582 PartFullTime -0.217 0.199 -0.579 0.146 HourWorkedPerWeek 0.986 0.497 0.1 1.964 DaysWorkedPerWeek -1.799 0.572 -2.864 -0.759 CONCLUSIONS The application of GLM to the analysis of actuarial risks in the context of client claim payments is taken into consideration. For estimation parameters of models the MCMC method was implemented. The insurance indicators and the target variable were created artificially since actuarial insurance data is frequently not made public: age, sex, BMI, region, medical history, exercise, worker status and charges. The last one was generated by applying algorithm of mixture distribu- tion, using normal, gamma and Pareto distribution with adding Gaussian noise, which had zero mean and variable standard deviation to create non-stationary process. Also real actuarial insurance data from Singapore Actuarial Society were used for experiments. Three GLM were implemented for experiments: normal with logarithmic link function, exponential with logarithmic link function and Laplace distribution with identity link function. Based on the experiment findings, it can be said that exponential GLM generally produced the best results for both artificial and real data. For the case of the normal distribution, Laplace GLM also produced positive results for artificial data. In future studies it is planned to automatize the process of insurance data analysis using artificial intelligence and simulation techniques. As far as most of financial processes belong to the class of non-linear and non-stationary the methodology will be proposed for constructing such models. It is also planned to apply the methods of generating alternative managerial decision using Bayesian approach to data and expert estimates analysis. REFERENCES 1. P. McCullagh, J. Nelder, Generalized Linear Models; 2nd edition. Chapman & Hall, 1989, 532 p. 2. D. Anderson et al., A Practitioner’s Guide to Generalized Linear Models – a foundation for theory, interpretation and application; 3rd edition. Towers Watson, 2007, 122 p. 3. P. Gagnon, Y. Wang, “Robust heavy-tailed versions of generalized linear models with applications in actuarial science,” Computational Statistics & Data Analysis, vol. 194, pp. 1–16, 2024. doi: 10.1016/j.csda.2024.107920 4. D.K. Lim et al., “Deeply Learned Generalized Linear Models with Missing Data,” Journal of Computational and Graphical Statistics, vol. 33, no. 2, pp. 638–650, 2024. doi: 10.1080/10618600.2023.2276122 R.S. Panibratov, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2025, № 4 70 5. Y. Tian, Y. Feng, “Transfer learning under high-dimensional generalized linear models,” Journal of the American Statistical Association, vol. 118, no. 544, pp. 2684–2697, 2023. doi: 10.1080/01621459.2022.2071278 6. Y. Havrylenko, J. Heger, “Detection of interacting variables for generalized linear models via neural networks,” European Actuarial Journal, vol. 14, no. 551–580, 2024. doi: 10.1007/s13385-023-00362-4 7. R. Panibratov, P. Bidyuk, “Estimation of the parameters of generalized linear models in the analysis of actuarial risks,” System Research and Information Technologies, no. 2, pp. 139–148, 2023. doi: 10.20535/SRIT.2308-8893.2023.2.10 8. L. Levenchuk, P. Bidyuk, O. Tymoshchuk, “Operational risk estimation using sys- tem analysis methodology,” System Research and Information Technologies, no. 1, pp. 42–61, 2024. doi: 10.20535/SRIT.2308-8893.2024.1.04 9. C. Andrieu et al., “An introduction to MCMC for machine learning,” Machine Learning, vol. 50, pp. 5–43, 2003. doi: 10.1023/A:1020281327116 10. C. Karras et al., “An overview of mcmc methods: From theory to applications,” Pro- ceedings of international conference on artificial intelligence applications and in- novations, IFIP, 2022, Crete, Greece, 17–20 June 2022, pp. 319–332. Springer In- ternational Publishing. doi: 10.1007/978-3-031-08341-9_26 11. N. Alsadat et al, “Bayesian and non-Bayesian analysis with MCMC algorithm of stress-strength for a new two parameters lifetime model with applications,” AIP Ad- vances, vol. 13, no. 9, pp. 1–20, 2023. doi: 10.1063/5.0167295 12. H. Kavianihamedani, J.D. Quinn, J.D. Smith, “New Diagnostic Assessment of MCMC Algorithm Effectiveness, Efficiency, Reliability, and Controllability,” IEEE Access, vol. 12, pp. 42385–42400, 2024. doi: 10.1109/ACCESS.2024.3378752 13. J. Zhang et al, “Improving simulation efficiency of MCMC for inverse modeling of hydrologic systems with a Kalman inspired proposal distribution,” Water Resources Research, vol. 56, no. 3, pp. 1–24, 2020. doi: 10.1029/2019WR025474 14. A. Brown, G.L. Jones, “Convergence rates of Metropolis–Hastings algorithms,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 16, no. 5, pp. 1–15, 2024. doi: 10.1002/wics.70002 Received 09.01.2025 INFORMATION ON THE ARTICLE Roman S. Panibratov, ORCID: 0000-0002-8604-4420, Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: roman.panibratov@gmail.com Petro I. Bidyuk, ORCID: 0000-0002-7421-3565, Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: pbidyuke_00@ukr.net АНАЛІЗ АКТУАРНИХ РИЗИКІВ ЗА ДОПОМОГОЮ УЗАГАЛЬНЕНИХ ЛІНІЙНИХ МОДЕЛЕЙ / Р.С. Панібратов, П.І. Бідюк Анотація. Розглянуто задачу побудови узагальнених лінійних моделей для аналізу актуарних ризиків із ситуацією виплат премій клієнтам. Для цього за- стосовано метод Монте-Карло для Марківських ланцюгів. Для дослідження розглянуто дві ситуації. У першій ситуації страхові показники та цільова змін- на налаштовувалися випадковим чином через проблему вільного доступу до даних. Для створення трьох наборів даних виплати генерувалися за допомогою нормального, гамма та розподілу Парето зі змінною дисперсією та додаванням шуму для імітації нестаціонарного процесу. У другій ситуації використано ре- альні актуарні дані, узяті з Singapore Actuarial Society. Побудовано узагальнені лінійні моделі з нормальним розподілом із логарифмічною функцією зв’язку, експоненційним розподілом із логарифмічною функцією зв’язку і розподіл Лапласа з тотожною функцією зв’язку. За метриками якості побудови моделей зроблено висновки щодо їх структури. Ключові слова: актуарний ризик, узагальнені лінійні моделі, імітаційне моде- лювання, експоненційна множина розподілів, Байєсівський аналіз даних, ме- тод Монте-Карло для Марківських ланцюгів.
id journaliasakpiua-article-351421
institution System research and information technologies
keywords_txt_mv keywords
language English
last_indexed 2026-02-08T08:06:11Z
publishDate 2025
publisher The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
record_format ojs
resource_txt_mv journaliasakpiua/b1/d7118dca98c2e7d45b26f7c07cd883b1.pdf
spelling journaliasakpiua-article-3514212026-02-02T20:49:24Z Analysis of actuarial risk with generalized linear models Аналіз актуарних ризиків за допомогою узагальнених лінійних моделей Panibratov, Roman Bidyuk, Petro актуарний ризик узагальнені лінійні моделі імітаційне моделювання експоненційна множина розподілів Байєсівський аналіз даних метод Монте-Карло для Марківських ланцюгів actuarial risk generalized linear models simulation modeling exponential family of distributions Bayesian data analysis Monte Carlo method for Markov chains The problem of applying generalized linear models to the analysis of actuarial risks in the context of premium charges to clients was considered. The Monte-Carlo method for Markov chains was applied. Two situations were considered for the computational experiment. For the first one, insurance indicators and the target variable were randomly assigned due to the problem of public data access. To create three datasets, charges were generated from normal, gamma, and Pareto distributions with dynamic variance, and noise was added to stimulate a non-stationary process. In the second situation, actual actuarial data from the Singa-pore Actuarial Society was used. Generalized Linear Models with normal dis-tribution and logarithmic link function, an exponential distribution and loga-rithmic link function, and Laplace distribution with identity link function were constructed. Based on the model-fitting quality metrics, conclusions were drawn about their structure. Розглянуто задачу побудови узагальнених лінійних моделей для аналізу актуарних ризиків із ситуацією виплат премій клієнтам. Для цього застосовано метод Монте-Карло для Марківських ланцюгів. Для дослідження розглянуто дві ситуації. У першій ситуації страхові показники та цільова змінна налаштовувалися випадковим чином через проблему вільного доступу до даних. Для створення трьох наборів даних виплати генерувалися за допомогою нормального, гамма та розподілу Парето зі змінною дисперсією та додаванням шуму для імітації нестаціонарного процесу. У другій ситуації використано реальні актуарні дані, узяті з Singapore Actuarial Society. Побудовано узагальнені лінійні моделі з нормальним розподілом із логарифмічною функцією зв’язку, експоненційним розподілом із логарифмічною функцією зв’язку і розподіл Лапласа з тотожною функцією зв’язку. За метриками якості побудови моделей зроблено висновки щодо їх структури. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2025-12-29 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/351421 10.20535/SRIT.2308-8893.2025.4.04 System research and information technologies; No. 4 (2025); 58-70 Системные исследования и информационные технологии; № 4 (2025); 58-70 Системні дослідження та інформаційні технології; № 4 (2025); 58-70 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/351421/338436
spellingShingle актуарний ризик
узагальнені лінійні моделі
імітаційне моделювання
експоненційна множина розподілів
Байєсівський аналіз даних
метод Монте-Карло для Марківських ланцюгів
Panibratov, Roman
Bidyuk, Petro
Аналіз актуарних ризиків за допомогою узагальнених лінійних моделей
title Аналіз актуарних ризиків за допомогою узагальнених лінійних моделей
title_alt Analysis of actuarial risk with generalized linear models
title_full Аналіз актуарних ризиків за допомогою узагальнених лінійних моделей
title_fullStr Аналіз актуарних ризиків за допомогою узагальнених лінійних моделей
title_full_unstemmed Аналіз актуарних ризиків за допомогою узагальнених лінійних моделей
title_short Аналіз актуарних ризиків за допомогою узагальнених лінійних моделей
title_sort аналіз актуарних ризиків за допомогою узагальнених лінійних моделей
topic актуарний ризик
узагальнені лінійні моделі
імітаційне моделювання
експоненційна множина розподілів
Байєсівський аналіз даних
метод Монте-Карло для Марківських ланцюгів
topic_facet актуарний ризик
узагальнені лінійні моделі
імітаційне моделювання
експоненційна множина розподілів
Байєсівський аналіз даних
метод Монте-Карло для Марківських ланцюгів
actuarial risk
generalized linear models
simulation modeling
exponential family of distributions
Bayesian data analysis
Monte Carlo method for Markov chains
url https://journal.iasa.kpi.ua/article/view/351421
work_keys_str_mv AT panibratovroman analysisofactuarialriskwithgeneralizedlinearmodels
AT bidyukpetro analysisofactuarialriskwithgeneralizedlinearmodels
AT panibratovroman analízaktuarnihrizikívzadopomogoûuzagalʹnenihlíníjnihmodelej
AT bidyukpetro analízaktuarnihrizikívzadopomogoûuzagalʹnenihlíníjnihmodelej