Невизначеності в обробленні даних, прогнозування і прийняття рішень

Forecasting, dynamic planning, and current statistical data processing are defined as the process of estimating an enterprise’s current state on the market compared to other competing enterprises and determining further goals as well as sequences of actions and resources necessary for reaching the g...

Повний опис

Збережено в:

Бібліографічні деталі
Дата:	2023
Автори:	Levenchuk, Liudmyla, Tymoshchuk, Oxana, Huskova, Vira, Bidyuk, Petro
Формат:	Стаття
Мова:	Англійська
Опубліковано:	The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2023
Теми:	математична модель невизначеності статистичних даних принципи системного аналізу прогнозування система підтримання прийняття рішень
Онлайн доступ:	https://journal.iasa.kpi.ua/article/view/290369
Теги:	Додати тег Немає тегів, Будьте першим, хто поставить тег для цього запису!
Назва журналу:	System research and information technologies
Завантажити файл:

Репозитарії

System research and information technologies

_version_	1867334439179124736
author	Levenchuk, Liudmyla Tymoshchuk, Oxana Huskova, Vira Bidyuk, Petro
author_facet	Levenchuk, Liudmyla Tymoshchuk, Oxana Huskova, Vira Bidyuk, Petro
author_institution_txt_mv	[ { "author": "Liudmyla Levenchuk", "institution": "Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" }, { "author": "Oxana Tymoshchuk", "institution": "Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" }, { "author": "Vira Huskova", "institution": "Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" }, { "author": "Petro Bidyuk", "institution": "Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" } ]
author_sort	Levenchuk, Liudmyla
baseUrl_str	http://journal.iasa.kpi.ua/oai
collection	OJS
datestamp_date	2023-11-07T22:19:24Z
description	Forecasting, dynamic planning, and current statistical data processing are defined as the process of estimating an enterprise’s current state on the market compared to other competing enterprises and determining further goals as well as sequences of actions and resources necessary for reaching the goals stated. In order to perform high-quality forecasting, it is proposed to identify and consider possible uncertainties associated with data and expert estimates. This is one of the system analysis principles to be hired for achieving high-quality final results. A review of some uncertainties is given, and an illustrative example showing improvement of the final result after considering possible stochastic uncertainty is provided.
doi_str_mv	10.20535/SRIT.2308-8893.2023.3.05
first_indexed	2025-07-17T10:28:22Z
format	Article
fulltext	 L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk, 2023 66 ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 UDC 004.942:519.216.3 DOI: 10.20535/SRIT.2308-8893.2023.3.05 UNCERTAINTIES IN DATA PROCESSING, FORECASTING AND DECISION MAKING L.B. LEVENCHUK, O.L. TYMOSHCHUK, V.H. GUSKOVA, P.I. BIDYUK Abstract. Forecasting, dynamic planning, and current statistical data processing are defined as the process of estimating an enterprise’s current state on the market com- pared to other competing enterprises and determining further goals as well as sequences of actions and resources necessary for reaching the goals stated. In order to perform high-quality forecasting, it is proposed to identify and consider possible uncertainties associated with data and expert estimates. This is one of the system analysis principles to be hired for achieving high-quality final results. A review of some uncertainties is given, and an illustrative example showing improvement of the final result after considering possible stochastic uncertainty is provided. Keywords: mathematical model, statistical data uncertainties, system analysis prin- ciples, forecasting, decision support system. INTRODUCTION Analysis of dynamic processes in forecasting and planning procedures is an ur- gent problem not only for financial organizations and companies but for all indus- trial enterprises, small and medium business, investment and insurance companies etc. Forecasting, dynamic planning (DP) and current data processing could be de- fined as the process of estimation by an enterprise of its current state on the mar- ket in comparison with other competing enterprises, and determining further goals as well as sequences of actions and resources that are necessary for reaching the goals stated. The process of forecasting and planning is performed continuously (or quasi-continuously) with acquiring new information (knowledge) about mar- ket, technologies, forecast estimates of necessary variables, current and future situations. All this knowledge is used for correcting actions and activities of an enterprise and supporting its competitiveness with flow of time. Formally DP could be presented in the form: })(),(,,,),(,,,{ 0 tttDSP RDFTKDRGX  , where 0X is initial state of an enterprise; G are the goals stated by the enterprise management; R are resources that are necessary for reaching the goals stated. )(tD is a sequence of actions that should be performed on the interval of plan- ning; K is a new knowledge about environment; T are new technologies. Sym- bol F designates possible results of forecasting and foresight; )(tD are correc- tions that are to be performed for reaching the goals; )(tR are necessary extra resources. One of the main problems that are to be solved within the DP paradigm is high quality forecasting of relevant processes. Uncertainties in data processing, forecasting and decision making Системні дослідження та інформаційні технології, 2023, № 3 67 Adequate models of the process and the forecasts generated with them are helpful for taking into consideration a set of various influencing factors and make based on objective planning managerial decisions. Another purpose of the studies is in estimating possible risks using forecasts of volatility. There are several types of processes that could be described with mathematical models in the form of ap- propriately constructed equations or probability distributions. Among them are the processes with deterministic and stochastic trends, and heteroscedastic proc- esses. As of today the following mathematical models are widely used for de- scribing nonlinear dynamics of processes relevant to planning: linear and nonlin- ear regression (logit and probit, polynomials, splines), autoregressive integrated moving average (ARIMA) models, autoregressive conditionally heteroscedastic models (ARCH), generalized ARCH (GARCH), dynamic Bayesian networks, support vector machine (SVM) approach, neural networks and neuro-fuzzy tech- niques as well as combinations of the approaches mentioned [1–5]. All types of mathematical modeling usually need to cope with various kinds of uncertainties associated with statistical/experimental data, structure of the pro- cess under study and its model, parameter uncertainty, and uncertainties relevant to the quality of models and forecasts. Reasoning and decision making are very often performed with leaving many facts unknown or rather vaguely represented in processing of data and expert estimates. To avoid or to take into consideration the uncertainties and improve this way quality of the final result (estimates of processes forecasts and planning of decisions based upon them) it is necessary to construct appropriate computer based decision support systems (DSS) for solving multiple specific problems. Selection and application of a specific model for process description and forecasts estimation depends on application area, availability of statisti- cal/experimental data, qualification of personnel, who work on the data analysis problems, and availability of appropriate applied software. Better results for esti- mation of processes forecasts are usually achieved with application of ideologi- cally different techniques combined in the frames of one specialized computer system. Such approach to solving the problems of quality forecasts estimation can be implemented in the frames of modern decision support systems. DSS today (especially intellectual DSS) create a powerful instrument for supporting user’s (managerial) decision making as far as it combines a set of appropriately selected data and expert estimates processing procedures aiming to reach final result of high quality: objective high quality alternatives for a decision making person (DMP). Development of a DSS is based on modern theory and techniques of sys- tem analysis principles, data processing systems, estimation and optimization the- ories, mathematical and statistical modeling and forecasting, decision making theory as well as many other results of theory and practice of processing data and expert estimates [6–8]. The paper considers the problem of adequate models constructing for solv- ing the problems of modeling and estimating forecasts for selected types of dy- namic processes with the possibility for application of alternative data processing techniques, modeling and estimation of parameters and states for the processes under study in conditions of availability possible uncertainties. L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 68 PROBLEM FORMULATION The purpose of the study is as follows: 1) analysis of uncertainty types character- istic for model building and forecasting dynamic processes; 2) selection of tech- niques for taking into consideration the uncertainties detected; 3) selection of mathematical modeling and forecasting techniques for nonstationary and nonlin- ear heteroscedastic processes; 4) illustration of the methodology application to solving selected problem of forecasts estimation using appropriate statistical data. COPING WITH UNCERTAINTIES All types of mathematical modeling with the use of statistical/experimental data usually need to consider various kinds of uncertainties associated with data, in- formational structure of a process under study and its model, parameter estimate uncertainty, and uncertainties relevant to the quality of models and forecasts. In many cases a researcher has to cope with the following basic types of uncertain- ties: structural, statistical and parametric. Structural uncertainties are encountered in the cases when structure of the process under study (and respectively its model) is unknown or not clearly enough defined, in other words known partially only. For example, when the functional approach to model constructing is applied usu- ally we do not know details of an object (or a process) structure and it is estimated with appropriate model structure estimation techniques: correlation analysis, es- timation of mutual information, lags, testing for nonlinearity and nonstationarity, identification of external disturbances etc. Uncertainty could also be introduced by an expert who is studying the process and provides its estimates for model structure, parameter restrictions, selection of computational procedures etc. The sequence of actions necessary for identification, processing and taking into con- sideration of uncertainties could be formulated as follows: – identification and reduction of data uncertainty; – model structure and parameters estimation; – re- duction of uncertainties related to the model structure and parameters estimation; – reduction of uncertainties relevant to expert estimates; – estimation of forecasts and reduction of respective uncertainties; – selection of the best final result using appropriate set of quality statistics. All the tasks mentioned above are usually solved sequentially (in an adaptive loop) with appropriately designed and imple- mented DSS. Here we consider uncertainties as the factors that influence negatively the whole process of mathematical model constructing, forecasts and possible risk estimating and generating of alternative decisions. These factors lead to lower quality of intermediate and final results of computations performed within se- lected or designed system. They are inherent to the process being studied due to incomplete or noise corrupted data, complex stochastic external influences, in- completeness or inexactness of our knowledge regarding the objects (systems) structure, incorrect application of computational procedures etc. The uncertainties very often appear due to incompleteness of data, noisy measurements or they are invoked by sophisticated stochastic external disturbances with complex unknown probability distributions, poor estimates of model structure or by a wrong selec- tion of parameter estimation procedure. The problem of uncertainty identification is solved with application of special statistical tests, visual studying of available data, using appropriate expert estimates. Uncertainties in data processing, forecasting and decision making Системні дослідження та інформаційні технології, 2023, № 3 69 As far as we usually work with stochastic data, correct application of exist- ing statistical techniques provides a possibility for approximate estimation of a system (and its model) structure. To find “the best” model structure it is recom- mended to apply adaptive estimation schemes that provide automatic search in a pre-defined range of possible model structures and parameters (model order, time lags, and possible nonlinearities). It is often possible to perform the search in the class of regression type models with the use of information criterion of the fol- lowing type [2]:          pN pN NVNFPEN N log))ˆ((log)(log , (1) where ̂ is a vector of model parameters estimates; N is a power of time series used; FPE is final prediction error term; )ˆ(NV can be determined by the sum of squared errors; p is a number of model parameters. The value of the criteria (1) is asymptotically equivalent to the Akaike information criterion with N . As the amount of data N may be limited, then an alternative, the minimum descrip- tion length (MDL) criterion N N pVMDL N )(log ))ˆ((log  could be hired to find the model that adequately represents available data with the minimum amount of available information. There are several possibilities for adaptive model structure estimation: 1) application of statistical criteria for detecting possible nonlinearities and the type of nonstationarity (integrated or heteroskedastic process); 2) analysis of partial autocorrelation for determining autoregression order; 3) automatic estimation of the exogenous variable lag (detection of leading indicators); 4) automatic analysis of residual properties; 5) analysis of data distribution type and its use for selecting correct model estimation method; 6) adaptive model parameter estimation with hiring extra data; 7) optimal selection of weighting coefficients for exponential smoothing, nearest neighbor and other techniques. The development and use of a specific adaptation scheme depends on the volume and quality of data, specific problem statement, requirements to forecast estimates etc. The adaptive estimation schemes also help to cope with the model parameter uncertainties. New data are used to re-compute model parameter estimates that correspond to possible changes in the object under study. In the cases when model is nonlinear, alternative parameter estimation techniques (say, MCMC) could be hired to compute alternative (though admissible) sets of parameters and to select the most suitable of them using statistical quality criteria. Processing some types of possible stochastic uncertainties. While per- forming practical modeling very often statistical characteristics (covariance ma- trix) of stochastic external disturbances and measurement noise (errors) are un- known. To eliminate this uncertainty optimal filtering algorithms are usually applied that provide for a possibility of simultaneous estimation of object (sys- tem) states and the covariance matrices. One of the possibilities to solve the prob- lem is application of optimal Kalman filter. Kalman filter is used to find optimal estimates of system states on the bases of a system model represented in a widely used convenient state space form as follows: L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 70 )()1()1,()1()1,()( kkkkkkkk wuBxAx  , (2) where )(kx is n-dimensional vector of system states; ...,2,1,0k is discrete time; )1( ku is m- dimensional vector of deterministic control variables; )(kw is n- dimensional vector of external random disturbances; )1,( kkA is )( nn - matrix of system dynamics; )1,( kkB is )( mn matrix of control coefficients. The double argument )1,( kk means that the variable or parameter is used at the time moment k , but its value is based on the former (earlier) data processing in- cluding moment ( 1k ). Usually the matrices A and B are written with one ar- gument like )(kA and )(kB to simplify the text. Besides the main task, optimal state estimation, Kalman filter can be used to solve the following problems: com- puting of short-term forecasts, estimation of unknown model parameters includ- ing statistics of external disturbances and measurement errors (adaptive extended Kalman filter), estimation of state vector components that cannot be measured directly, and fusion of data coming from various external sources (combining of available data directed towards enhancement of its information content). Obviously stationary system model is described with constant parameters like A and B . As far as matrix A creates a link between two consequent system states, it is also called state transition matrix. Discrete time k and continuous time t are linked to each other via data sampling time sT : sTkt  . In the classic problem statement for optimal filtering the vector sequence of external distur- bances )(kw is supposed to be zero mean white Gaussian noise with covariance matrix Q , i.e. the noise statistics are as follows: ,)()]()([;,0)]([ T kjkjkEkkE  Qwww where kj is Kronecker delta-function:       jk jk jk ,1 ,0 ; )(kQ is positively defined covariance ( nn ) matrix. The diagonal elements of the matrix are vari- ances for the components of disturbance vector )(kw . Initial system state 0x is supposed to be known and the measurement equation for vector )(kz of output variables is described by the equation: )()()()( kkkk vxHz  , (3) where )(kH is )( nr  observation (coefficients) matrix; )(kv is r-dimensional vector of measurement noise with statistics:  )]()([,0)]([ T jkEkE vvv kjk  )(R , where )(kR is )( rr  positively defined measurement noise covari- ance matrix, the diagonal elements of which represent variances of additive noise for each measurable variable. The noise of measurements is also supposed to be zero mean white noise sequence that is not correlated with external disturbance )(kw and initial system state. For the system (2), (3) with state vector )(kx it is necessary to find optimal state estimate )(ˆ kx at arbitrary moment k as a linear combination of estimate )1(ˆ kx at the previous moment )1( k and the last measurement available )(kz . The estimate of state vector )(ˆ kx is computed as an optimal one with minimizing the expectation of the sum of squared errors, i.e.: Uncertainties in data processing, forecasting and decision making Системні дослідження та інформаційні технології, 2023, № 3 71 K T kkkkE min))]()(ˆ())()(ˆ[(  xxxx , (4) where )(kx is an exact value of state vector that can be found using deterministic part of the state equation (2); K is optimal matrix gain that is determined as a result of minimizing quadratic criterion (4). Thus, the filter is constructed to compute optimal state vector )(ˆ kx in condi- tions of influence of external random system disturbances and measurement noise. Here one of possible uncertainties arises when we don’t know estimates of covariance matrices Q and R . To solve the problem an adaptive Kalman filter is to be constructed that allows for computing estimates of Q̂ and R̂ simultane- ously with the state vector )(ˆ kx . Another choice is in constructing separate algo- rithm for computing the values of Q̂ and R̂ . A convenient statistical algorithm for estimating the covariance matrices was proposed in [11]: ])()ˆˆ(ˆ[ 2 1ˆ T1 21 1 1   ABBABR ; T 1 ˆˆˆˆ ARARBQ  , where })]1()([)]1()([{ˆ T 1  kkkkE zAzzAzB ; })]2()([)]2()([{ˆ T22 2  kkkkE zAzzAzB . The matrices Q̂ and R̂ are used in the optimal filtering procedure as fol- lows: QAPAS ˆ)1()( T  kk ; #]ˆ)()[()( RSS  kkk ; )()]([)( kkk SIP  , ...,2,1,0k , where )(kS and )(kP are prior and posterior covariance matrices of estimate errors respectively; the symbol “ # ” denotes pseudo-inverse; TA means matrix transposition; )(k is a matrix of intermediate covariance results. The algorithm was successfully applied to the covariance estimating in many practical applica- tions. The computation experiments showed that the values of )(k become sta- tionary after about 20–25 periods of time (sampling periods) in a scalar case, though this figure is growing substantially with the growth of dimensionality of the system under study. It was also determined that the parameter estimators are very sensitive to the initial conditions of the system. The initial conditions should differ from zero enough to provide stability for the estimates generated. Other appropriate instruments for taking into consideration possible statisti- cal uncertainties are fuzzy logic, neuro-fuzzy models, Bayesian networks, appro- priate types of distributions etc. Some of statistical data uncertainties, such as missing measurements, extreme values and high level jumps of stochastic origin could be processed with appropriately selected statistical procedures. There exists a number of data imputation schemes that help to complete the sets of the data collected with improving its quality. For example, very often missing measure- L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 72 ments for time series could be generated with appropriately selected distributions or in the form of short term forecasts. Appropriate processing of jumps and ex- treme values helps with adjusting data nonstationarity and to estimate correctly the probability distribution for the stochastic processes under study. Processing data with missing observations (data are in the form of time se- ries). As of today for the data in the time series form the most suitable imputation techniques are as follows: simple averaging when it is possible (when only a few values are missing); generation of forecast estimates with the model constructed using available measurements; generation of missing estimates from distributions the form and parameters of which are again determined using available part of data and expert estimates; the use of optimization techniques, say appropriate forms of EM-algorithms (expectation maximization); exponential smoothing etc. It should also be mentioned that optimal Kalman filter can also be used for impu- tation of missing data because it contains “internal” forecasting function that pro- vides a possibility for generating quality short-term forecasts [12]. Besides, it has a feature of fusion the data coming from various external sources and improving this way the quality (information content) of state vector and its forecasts. Further reduction of this uncertainty is possible thanks to application of several forecasting techniques to the same problem with subsequent combining of separate forecasts using appropriate weighting coefficients. The best results of combining the forecasts are achieved when variances of forecasting errors for dif- ferent forecasting techniques do not differ substantially (at any rate the orders of the variances should be the same). Coping with uncertainties of model parameters estimates. Usually uncer- tainties of model parameter estimates such as bias and inconsistency result from low informative data or data do not correspond to normal distribution, what is required in the case of least squares (LS) application for parameter estimation. This situation may also take place in a case of multi-collinearity of independent variables and substantial influence of process nonlinearity that for some reason has not been taken into account when the model structure was estimated. When power of the data sample is not satisfactory for model construction it could be expanded by applying special techniques, or simulation can be hired, or special model building techniques, such as group method for data handling (GMDH), are applied. Very often GMDH produces results of acceptable quality with rather short samples. If data do not correspond to normal distribution, then maximum likelihood technique could be used or appropriate Monte Carlo procedures for generating Markov Chains (MCMC) [13]. The last techniques could be applied with quite acceptable computational expenses when the number of parameters is not very high. Generally, model structure and parameters estimation problems are at the core of modeling and they should be paid appropriate attention. Here several techniques should be applied to generate alternative sets of parameter estimates and this way to get a possibility for selecting the best alternative. Statistical crite- ria indicating model adequacy are helpful to select the best estimates. Also pa- rameter estimates exhibiting the lowest variances are better than others having higher variance. Usually parameter estimation techniques provide the possibility for estimating the variances. Uncertainties in data processing, forecasting and decision making Системні дослідження та інформаційні технології, 2023, № 3 73 Dealing with model structure uncertainties. When considering mathe- matical models it is convenient to use proposed here a unified notion (representa- tion) of a model structure which we define as follows:  lwdnmprS ,,,,,, , where r is model dimensionality (number of equations); p is model order (maximum order of differential or difference equation in a model); m is a num- ber of independent variables in the right hand side of a model; n is a nonlinearity and its type (nonlinearity with respect to variables and parameters); d is a lag or output reaction delay time; w is stochastic external disturbance and its type; l are possible restrictions imposed on a model variables and/or parameters. When using DSS, the model structure can practically always be estimated using data. It means that elements of the model structure accept almost always only approximate values. When a model is constructed on the purpose of forecasting we build several candidates and select the best one of them with a set of model quality statistics. Generally we could define the following techniques to fight structural uncertain- ties: gradual improvement of model order (AR(p) or ARMA(p, q)) applying adap- tive approach to modeling and automatic search for the “best” structure using complex statistical quality criteria; adaptive estimation (improvement) of input delay time (lag) and data distribution type with its parameters; describing detected process nonlinearities with alternative analytical forms with subsequent estima- tion of model adequacy and forecast quality. As another example of complex sta- tistical model adequacy and forecast quality criterion could be the following:   i UMAPEDWkeRJ N k             ˆ 1 22 min1ln2)(ln1 , where 2R is a determination coefficient; DW is Durbin-Watson statistic; MAPE is mean absolute percentage error for estimated forecasts; 2 11 2 ])(ˆ)([)( kykyke N k N k    is the sum of squared model errors; U is Theil coefficient that measures forecasting characteristic of a model; , are appropri- ately selected weighting coefficients (their sum should be equal to 1); i̂ is pa- rameter vector for the i -th candidate model. A criterion of this type is used for automatic selection of the best candidate model. The criterion also allows opera- tion of DSS in an automatic adaptive mode. Obviously, other forms of the com- plex criteria are possible. While constructing the criterion it is important not to overweigh separate members in the right hand side of the expression. As a general recommendation for model structure estimation can be application of appropriate adaptation scheme. Coping with uncertainties of a level (amplitude) type. The use of random (i.e. with random amplitude or a level) and/or non-measurable variables results in necessity of hiring fuzzy sets for describing such situations. The variable with random amplitude can be described with some probability distribution if the measurements are available or they come for analysis in acceptable time span. However, some variables cannot be measured (registered) in principle, say amount of shadow capital that “disappears” every month in offshore, or amount of shadow salaries paid at some company, or a technology parameter, relative to L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 74 control system, that cannot be measures on-line due to absence of appropriate sensor. In such situations we could assign to the variable a set of possible values in the linguistic form as follows: capital amount = { very low, low, medium, high, very high }. There exists a complete necessary set of mathematical operations to be applied to such fuzzy variables. Finally fuzzy value could be transformed into usual exact form using known techniques. Appropriately constructed optimal Kalman filter can also be applied for es- timating non-measurable variables using known covariances between measurable and non-measurable variables. Processing probabilistic uncertainties. To fight probabilistic uncertainties it is possible to hire Bayesian approach that helps to construct models in the form of conditional distributions for the sets of random variables. Usually such models represent the process (under study) variables themselves, stochastic disturbances and measurement errors or noise. The problem of distribution type identification also arises in regression modeling. Each probability distribution is characterized by a set of specific values that random variable could take and the probabilities for these values. The problem is in the distribution type identification and estimat- ing its parameters. The probabilistic uncertainty (will some event happen or not) could be solved with various models of Bayesian type. This approach is known as Bayesian programming or paradigm. The generalized structure of the Bayesian program application includes the following steps: 1) problem description and statement with putting the question regarding estimation of conditional probabil- ity in the form: ),\|( KnDXp i , where iX is the main (goal) variable or event; the probability p should be found as a result of application of some probabilistic in- ference procedure; 2) statistical (experimental) data D and knowledge Kn are to be used for estimating model and parameters of specific type; 3) selected and ap- plied probabilistic inference technique should give an answer to the question put above; 4) analysis of quality of the final result using appropriate statistics. The steps given above are to some extent “standard” regarding model constructing and computing probabilistic inference using statistical data available. This sequence of actions is naturally consistent with the methods of cyclic structural and para- metric model adaptation to the new data and operating modes (and possibly ex- pert estimates). One of the most popular Bayesian approaches today is created by the models in the form of static and dynamic Bayesian networks (BN). Bayesian networks are probabilistic and statistical models graphically represented in the form of directed acyclic graphs (DAG) with vertices as variables of an object (system) under study, and the arcs showing existing causal relations between the variables. Each variable of BN is characterized with complete finite set of mutually excluding states. Formally BN could be represented with the four following components:  TPGVN ,,, , where V stands for the set of model variables; G represents directed acyclic graph; P is joint distribution of probabilities for the graph vari- ables (vertices), }...,,{ 1 nXXV ; and T denotes conditional and unconditional probability tables for the graphical model variables [14; 15]. The relations be- tween the variables are established via expert estimates or applying special statis- tical and probabilistic tests to statistical data (when available) characterizing dy- namics of the variables hired to construct the model. Uncertainties in data processing, forecasting and decision making Системні дослідження та інформаційні технології, 2023, № 3 75 The procedure of constructing BN is generally the same as for models of other types, say regression models. The set of the model variables should satisfy the Markov condition that each variable of the network does not depend on all other variables but for the variable’s parents. In the process of BN constructing first the problem is solved of computing mutual information values between all variables of the net. Then an optimal BN structure is searched using acceptable quality criterion, say well-known minimum description length (MDL) that allows for analyzing and improving the graph (model) structure on each iteration of computing of the learning algorithm applied. Bayesian networks provide the fol- lowing advantages for modeling: the model may include qualitative and quantita- tive variables simultaneously as well as discrete and continuous ones; number of the variables could be very large (thousands); the values for conditional probabil- ity tables could be computed with the use of statistical data and expert estimates; the methodology of BN constructing is directed towards identification of actual causal relations between the variables hired what results in high adequacy of the model; the model is also operable in conditions of missing data. To reduce an influence of probabilistic and statistical uncertainties on models quality and the forecasts based upon them it is also possible to use the models in the form of Bayesian regression based on analysis of actual distributions of model variables and parameters. Consider a simple two variables regression model: nkkukxkxky ...,,1,0),()()(\|)( 21  . It is supposed that of random values nuu ...,,1 are independent and belong, for example, to normal distribution ),0(~})({ 2 uNku  ; here vector of unknown parameters includes three elements T u ),,( 2 21  . The likelihood function for dependent variable T 1 )...,,( nyyy and predictor T 1 )...,,( nxxx without pro- portion coefficient is determined as follows:                2 2 1 1221 ])()([ 2 1 exp 1 ),,,\|( kxkyL N ku N u uxy . Using simplified (non-informative) distributions for the model parameters )()()(),,( 3221121 uu gggg  ; const)( 11 g ; const)( 212 g ; uug  /1)(3 , and Bayes theorem it is possible to find joint posterior distribution for the parame- ters in the form [16]:                  N k Nu kxkyyxh 1 2 21221 )()( 2 1 exp 11 ),\|,,( ,  u0,, 21 . Maximum likelihood estimates for the model parameters are determined as follows:        N k N k N k ykyxkx ykyxkx xy 11 1 221 ])([])([ ])([])([ ˆ;ˆˆ , L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 76 where       N k N k kyNykxNx 1 1 1 1 )(,)( , with unbiased sample estimate of variance: ])(ˆˆ)([ 2 1 ˆ 21 1 22 kxky N s N ku      . Joint posterior density for the model parameters corresponds to two- dimensional Student distribution:     N k kxNsNh 1 22 22 2 11 2 211 )()ˆ()ˆ()2(),\|,( xy  NN k kx 5,0 12211 )()ˆ()ˆ(2   . This way we get a possibility for using more exact distributions of models variables and parameters what is necessary to enhance model quality. Using new observation x and prior information regarding particular model it is possible to determine the forecast interval for the dependent variable y as follows:    dddhxyLxyp ,,),\|),,(),,,\|()\|( 212121 yx . Another useful Bayesian approach is in hierarchical modeling that is based on a set of simple conditional distributions comprising one model. The approach is naturally combined with the theory of computing Bayesian probabilistic infer- ence using modern computational procedures [17]. The hierarchical models be- long to the class of marginal models where the final result is provided in the form of a distribution )(yP , where y is available data vector. The models are formed from the sequence of conditional distributions for selected variables including the hidden ones. The hierarchical representation of parameters usually supposes that data y is situated at the lower (first) level, model parameters (second level) )...,,2,1,( nii  , ),(~ 2 Ni , determine distributions of dependent vari- ables niNy ii ...,,2,1),,(~ 2  , and parameters }{ i are determined by the pair ),( 2 of the third level. Supposing the parameters 2 and 2 accept known finite values, and parameter  is unknown with the prior  , then joint prior density for ),(  could be presented in the form:    i i )\|()( , and the prior for parameter vector  will be defined by the integral:     dp i i )\|()()( . Uncertainties associated with expert estimates. To decrease influence of the expert estimate uncertainties they are to be processed adequately before prac- tical use. Possible uncertainties of the expert estimates can be caused by the fol- lowing reasons: uncertainties associated with input information, and the knowl- edge and experience of an expert; uncertainties associated with the way of thinking used by specific expert and the methodology he hires as well as the in- formation processing “analytic” machine that is functioning in his mind etc. Such uncertainties require application of special techniques to reduce their influence on the quality of final result. Uncertainties in data processing, forecasting and decision making Системні дослідження та інформаційні технології, 2023, № 3 77 DATA, MODEL AND FORECASTS QUALITY CRITERIA To achieve reliable high quality final result of risk estimation and forecasting at each stage of computational hierarchy separate sets of statistical quality criteria have been used. Data quality control is performed with the following criteria: – analysis of database for missing values using developed logical rules, and imputation of missed values with appropriately selected techniques; – analysis of data for availability of outliers with special statistical tests, and processing of outliers to reduce their negative influence on statistical properties of the data available; – normalizing data in the selected range in a case of necessity; – application of low-order digital filters (usually low-pass filters) for separa- tion of observations from measurement noise; application of optimal (very often Kalman) filters for optimal state estima- tion and fighting stochastic uncertainties; – application of principal component method to achieve desirable level of or- thogonalization between the variables selected; – computing of extra indicators for the use in regression and other models (say, moving average processes based upon measurements of dependent vari- ables). It is also useful to test how informative is the data collected. Very formal in- dicator for the data being informative is its sample variance. It is supposed for- mally that the higher is the variance the richer is the data with information. An- other criterion is based on computing derivatives with a polynomial that describes data in the form of a time series. For example, the equation given below can de- scribe rather complex process with nonlinear trend and short-term variations im- posed on the trend curve: )(...)()( 2 21 1 0 kkckckcikyaaky m m p i i    , where )(ky is basic dependent variable; ii ca , are model parameters; ...,2,1,0k is discrete time; )(k is a random process that integrates the influ- ence of external disturbances to the process being modeled as well as model struc- ture and parameters errors. The autoregressive part of model (1) describes the de- viations that are imposed on a trend, and the trend itself is described with the m-th order polynomial of discrete time k . In this case maximum number of derivatives could be m, though in practice actual number of derivatives is defined by the larg- est number i of parameter ic , that is statistically significant. To select the best model constructed the following statistical criteria are used: determination coeffi- cient ( )2R ; Durbin-Watson statistic ( DW ); Fisher F-statistic; Akaike informa- tion criterion (AIC), residual sum of squares (SSE), and some others. The fore- casts quality is estimated with hiring the criteria mentioned above in expressions (1) and (2). To perform automatic model selection the above mentioned combined criteria (1) could be hired. The power of the criterion was tested experimentally and proved with a wide set of models and statistical data. Thus, the three sets of quality criteria are used to insure high quality of final result. L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 78 ILLUSTRATIVE EXAMPLE OF DATA PROCESSING METHODOLOGY APPLICATION AND REDUCTION OF INFLUENCE UNCERTAINTIES Consider closing stock prices in USD for IBM company given by the site Yahoo! Finance. The learning sample was taken in the period from August 8, 2016, to November 23, 2020. The test sample was taken in the period from November 24, 2020 to May 12, 2021. Short-term forecasting was performed with neural net- works MLP and LSTM. For the sake of convenience the networks were desig- nated as follows: MLP (n1, n2, w), where n1 is a number of neurons in the first hidden layer; n2 number of neurons in the second hidden layer; w is a size of window for input data, i.e. number of preceding measurements of a time series that influence current value. The LSTM network has the following representation: LSTM (n, w), where n is a number of neurons in hidden layer; w is size of data window. The best forecasting results were achieved with the networks MLP (32, 16, 5) and LSTM (64, 75). Statistical characteristics of the results are given in Table 1 below. T a b l e 1 . Results of short-term forecasting with ARIMA, MLP and LSTM Statistic ARIMA(5,1,5) MLP(32,16,10) LSTM(64,75) RSME 10.202 5.145 6.123 MAPE 5.702 3.129 4.099 MSE 104.094 26.498 37.621 Better forecasting results were achieved after application of exponential smoothing to initial data (Table 2). This way we reduced noise uncertainties and prepared the statistical data to further use by neural networks. T a b l e 2 . Results of short-term forecasting with ARIMA, MLP and LSTM and preliminary data processing Statistic ARIMA(5,1,5) MLP(32,16,10) LSTM(64,75) RSME 8.974 4.320 5.382 MAPE 4.682 2.935 3.116 MSE 95.169 22.806 33.127 It can be seen that MLP again produced the best result of short-term fore- casting but all statistics are lower (better) than in previous case without prelimi- nary data processing by exponential smoothing. CONCLUSIONS The general methodology was proposed for mathematical modeling and forecast- ing dynamics of economic and financial processes that is based on the system analysis principles. One of the main principles is identification and taking into consideration possible uncertainties associated with data and expert estimates. As instrumentation for fighting possible structural, statistic and parametric uncertain- Uncertainties in data processing, forecasting and decision making Системні дослідження та інформаційні технології, 2023, № 3 79 ties the following techniques are proposed to use: digital filtering, optimal Kal- man filter, various missing data imputation techniques, multiple methods for model parameter estimation, and Bayesian programming approach. The computa- tional experiments carried out by the authors showed that the instrumentation for fighting uncertainties has always provided better results regarding model ade- quacy and quality of forecasts than processing data without these instruments. Thus, it is highly advisable to use these data processing instruments for improving quality of finale results of statistical and experimental data analysis. The illustra- tive example, given above, shows that appropriate preliminary data processing technique results in improvement of model adequacy and short-term forecasts. REFERENCES 1. R.S. Tsay, Analysis of financial time series. Chicago: Wiley & Sons, Ltd., 2010, 715 p. 2. L. Harris, X. Hong, and Q.Gan, Adaptive Modeling, Estimation and Fusion from Data. Berlin: Springer, 2002, 323 p. 3. P. Congdon, Applied Bayesian Modeling. Chichester: John Wiley & Sons, Ltd., 2003, 472 p. 4. S.M. DeLurgio, Forecasting Principles and Applications. Boston: McGraw-Hill, 1998, 802 p. 5. S.J. Taylor, “Modeling stochastic volatility: a review and comparative study,” Mathematical Finance, vol. 4, no. 2, pp. 183–204, 1994. 6. F. Burstein and C.W. Holsapple, Handbook of Decision Support Systems. Berlin: Springer-Verlag, 2008, 908 p. 7. C.W. Hollsapple and A.B. Winston, Decision Support Systems. Saint Paul (MN): West Publishing Company, 1996, 860 p. 8. P.I. Bidyuk, O.P. Gozhiy, Computer decision support systems. Mykolaiv: Petro Mo- hyla Black Sea National University, 2012, 380 p. 9. E. Xekalaki and S.Degiannakis, ARCH Models for Financial Applications. Chiches- ter: Wiley & Sons, Inc., 2010, 550 p. 10. C. Chatfield, Time Series Forecasting. Boca Raton: Chapman & Hall/CRC, 2000, 267 p. 11. W.N. Anderson, G.B. Kleindorfer, P.R. Kleindorfer, and M.B. Woodroofe, “Consis- tent estimates of the parameters of a linear system,” The Annals of Mathematical Statistics, vol. 40, no. 6, pp. 2064–2075, 1969. 12. B.P. Gibbs, Advanced Kalman Filtering, Least-squares and Modeling. Hoboken (New Jersey): John Wiley & Sons, Inc., 2011, 627 p. 13. W.R. Gilks, S. Richardson, and D.J. Spiegelhalter, Markov Chain Monte Carlo in Practice. New York: Chapman & Hall/CRC, 2000, 486 p. 14. F.V. Jensen and Th.D. Nielsen, Bayesian Networks and Decision Graphs. New York: Springer, 2007, 457 p. 15. M.Z. Zgurovsky, P.I. Bidyuk, O.M. Terentyev, and T.I. Prosyankina-Zharova, Bayesian Networks in Decision Support Systems. Kyiv: Edelweiss, 2015, 300 p. 16. J.M. Bernardo and A.F.M. Smith, Bayesian theory. New York: John Wiley & Sons, Ltd., 2000, 586 p. 17. W.M. Bolstad, Understanding Computational Bayesian Statistics. Hoboken (New Jersey): John Wiley & Sons, Ltd, 2010, 334 p. Received 23.06.2023 L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 80 INFORMATION ON THE ARTICLE Liudmyla B. Levenchuk, ORCID: 0000-0002-8600-0890, Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: lusi.levenchuk@gmail.com Oxana L. Tymoshchuk, ORCID: 0000-0003-1863-3095, Educational and Research Insti- tute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: oxana.tim@gmail.com Vira H. Huskova, ORCID: 0000-0001-7637-201Х, Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikor- sky Kyiv Polytechnic Institute”, Ukraine, e-mail: guskovavera2009@gmail.com Petro I. Bidyuk, ORCID: 0000-0002-7421-3565, Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: pbidyuke_00@ukr.net НЕВИЗНАЧЕНОСТІ В ОБРОБЛЕННІ ДАНИХ, ПРОГНОЗУВАННЯ І ПРИЙНЯТТЯ РІШЕНЬ / Л.Б. Левенчук, О.Л. Тимощук, В.Г. Гуськова, П.І. Бідюк Анотація. Прогнозування, динамічне планування та оброблення поточних ста- тистичних даних визначаються як процес оцінювання поточного стану підпри- ємства на ринку порівняно з іншими конкуруючими підприємствами та визна- чення подальших цілей, а також послідовностей дій та ресурсів, необхідних для досягнення визначених цілей. Для здійснення прогнозування високої якос- ті запропоновано визначити та врахувати можливі невизначеності, пов’язані з даними та експертними оцінками. Це один з принципів системного аналізу, який застосовується для досягнення високої якості кінцевого результату. На- ведений огляд деяких невизначеностей та ілюстративний приклад, який пока- зує поліпшення кінцевого результату після врахування можливої стохастичної невизначеності. Ключові слова: математична модель, невизначеності статистичних даних, принципи системного аналізу, прогнозування, система підтримання прийняття рішень.
id	journaliasakpiua-article-290369
institution	System research and information technologies
keywords_txt_mv	keywords
language	English
last_indexed	2025-07-17T10:28:22Z
publishDate	2023
publisher	The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
record_format	ojs
resource_txt_mv	journaliasakpiua/12/eca1b72d5a6ca1ecb1b73ae6b664bb12.pdf
spelling	journaliasakpiua-article-2903692023-11-07T22:19:24Z Uncertainties in data processing, forecasting and decision making Невизначеності в обробленні даних, прогнозування і прийняття рішень Levenchuk, Liudmyla Tymoshchuk, Oxana Huskova, Vira Bidyuk, Petro mathematical model statistical data uncertainties system analysis principles forecasting decision support system математична модель невизначеності статистичних даних принципи системного аналізу прогнозування система підтримання прийняття рішень Forecasting, dynamic planning, and current statistical data processing are defined as the process of estimating an enterprise’s current state on the market compared to other competing enterprises and determining further goals as well as sequences of actions and resources necessary for reaching the goals stated. In order to perform high-quality forecasting, it is proposed to identify and consider possible uncertainties associated with data and expert estimates. This is one of the system analysis principles to be hired for achieving high-quality final results. A review of some uncertainties is given, and an illustrative example showing improvement of the final result after considering possible stochastic uncertainty is provided. Прогнозування, динамічне планування та оброблення поточних статистичних даних визначаються як процес оцінювання поточного стану підприємства на ринку порівняно з іншими конкуруючими підприємствами та визначення подальших цілей, а також послідовностей дій та ресурсів, необхідних для досягнення визначених цілей. Для здійснення прогнозування високої якості запропоновано визначити та врахувати можливі невизначеності, пов’язані з даними та експертними оцінками. Це один з принципів системного аналізу, який застосовується для досягнення високої якості кінцевого результату. Наведений огляд деяких невизначеностей та ілюстративний приклад, який показує поліпшення кінцевого результату після врахування можливої стохастичної невизначеності. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2023-09-29 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/290369 10.20535/SRIT.2308-8893.2023.3.05 System research and information technologies; No. 3 (2023); 66-80 Системные исследования и информационные технологии; № 3 (2023); 66-80 Системні дослідження та інформаційні технології; № 3 (2023); 66-80 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/290369/283956
spellingShingle	математична модель невизначеності статистичних даних принципи системного аналізу прогнозування система підтримання прийняття рішень Levenchuk, Liudmyla Tymoshchuk, Oxana Huskova, Vira Bidyuk, Petro Невизначеності в обробленні даних, прогнозування і прийняття рішень
title	Невизначеності в обробленні даних, прогнозування і прийняття рішень
title_alt	Uncertainties in data processing, forecasting and decision making
title_full	Невизначеності в обробленні даних, прогнозування і прийняття рішень
title_fullStr	Невизначеності в обробленні даних, прогнозування і прийняття рішень
title_full_unstemmed	Невизначеності в обробленні даних, прогнозування і прийняття рішень
title_short	Невизначеності в обробленні даних, прогнозування і прийняття рішень
title_sort	невизначеності в обробленні даних, прогнозування і прийняття рішень
topic	математична модель невизначеності статистичних даних принципи системного аналізу прогнозування система підтримання прийняття рішень
topic_facet	mathematical model statistical data uncertainties system analysis principles forecasting decision support system математична модель невизначеності статистичних даних принципи системного аналізу прогнозування система підтримання прийняття рішень
url	https://journal.iasa.kpi.ua/article/view/290369
work_keys_str_mv	AT levenchukliudmyla uncertaintiesindataprocessingforecastinganddecisionmaking AT tymoshchukoxana uncertaintiesindataprocessingforecastinganddecisionmaking AT huskovavira uncertaintiesindataprocessingforecastinganddecisionmaking AT bidyukpetro uncertaintiesindataprocessingforecastinganddecisionmaking AT levenchukliudmyla neviznačenostívobroblennídanihprognozuvannâíprijnâttâríšenʹ AT tymoshchukoxana neviznačenostívobroblennídanihprognozuvannâíprijnâttâríšenʹ AT huskovavira neviznačenostívobroblennídanihprognozuvannâíprijnâttâríšenʹ AT bidyukpetro neviznačenostívobroblennídanihprognozuvannâíprijnâttâríšenʹ

Невизначеності в обробленні даних, прогнозування і прийняття рішень

Репозитарії

Схожі ресурси