Невизначеності в обробленні даних, прогнозування і прийняття рішень
Forecasting, dynamic planning, and current statistical data processing are defined as the process of estimating an enterprise’s current state on the market compared to other competing enterprises and determining further goals as well as sequences of actions and resources necessary for reaching the g...
Збережено в:
| Дата: | 2023 |
|---|---|
| Автори: | , , , |
| Формат: | Стаття |
| Мова: | Англійська |
| Опубліковано: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2023
|
| Теми: | |
| Онлайн доступ: | https://journal.iasa.kpi.ua/article/view/290369 |
| Теги: |
Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
|
| Назва журналу: | System research and information technologies |
| Завантажити файл: | |
Репозитарії
System research and information technologies| _version_ | 1866302936592678912 |
|---|---|
| author | Levenchuk, Liudmyla Tymoshchuk, Oxana Huskova, Vira Bidyuk, Petro |
| author_facet | Levenchuk, Liudmyla Tymoshchuk, Oxana Huskova, Vira Bidyuk, Petro |
| author_sort | Levenchuk, Liudmyla |
| baseUrl_str | http://journal.iasa.kpi.ua/oai |
| collection | OJS |
| datestamp_date | 2023-11-07T22:19:24Z |
| description | Forecasting, dynamic planning, and current statistical data processing are defined as the process of estimating an enterprise’s current state on the market compared to other competing enterprises and determining further goals as well as sequences of actions and resources necessary for reaching the goals stated. In order to perform high-quality forecasting, it is proposed to identify and consider possible uncertainties associated with data and expert estimates. This is one of the system analysis principles to be hired for achieving high-quality final results. A review of some uncertainties is given, and an illustrative example showing improvement of the final result after considering possible stochastic uncertainty is provided. |
| doi_str_mv | 10.20535/SRIT.2308-8893.2023.3.05 |
| first_indexed | 2025-07-17T10:28:22Z |
| format | Article |
| fulltext |
L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk, 2023
66 ISSN 1681–6048 System Research & Information Technologies, 2023, № 3
UDC 004.942:519.216.3
DOI: 10.20535/SRIT.2308-8893.2023.3.05
UNCERTAINTIES IN DATA PROCESSING, FORECASTING
AND DECISION MAKING
L.B. LEVENCHUK, O.L. TYMOSHCHUK, V.H. GUSKOVA, P.I. BIDYUK
Abstract. Forecasting, dynamic planning, and current statistical data processing are
defined as the process of estimating an enterprise’s current state on the market com-
pared to other competing enterprises and determining further goals as well as
sequences of actions and resources necessary for reaching the goals stated. In order
to perform high-quality forecasting, it is proposed to identify and consider possible
uncertainties associated with data and expert estimates. This is one of the system
analysis principles to be hired for achieving high-quality final results. A review of
some uncertainties is given, and an illustrative example showing improvement of the
final result after considering possible stochastic uncertainty is provided.
Keywords: mathematical model, statistical data uncertainties, system analysis prin-
ciples, forecasting, decision support system.
INTRODUCTION
Analysis of dynamic processes in forecasting and planning procedures is an ur-
gent problem not only for financial organizations and companies but for all indus-
trial enterprises, small and medium business, investment and insurance companies
etc. Forecasting, dynamic planning (DP) and current data processing could be de-
fined as the process of estimation by an enterprise of its current state on the mar-
ket in comparison with other competing enterprises, and determining further goals
as well as sequences of actions and resources that are necessary for reaching the
goals stated. The process of forecasting and planning is performed continuously
(or quasi-continuously) with acquiring new information (knowledge) about mar-
ket, technologies, forecast estimates of necessary variables, current and future
situations. All this knowledge is used for correcting actions and activities of an
enterprise and supporting its competitiveness with flow of time.
Formally DP could be presented in the form:
})(),(,,,),(,,,{ 0 tttDSP RDFTKDRGX ,
where 0X is initial state of an enterprise; G are the goals stated by the enterprise
management; R are resources that are necessary for reaching the goals stated.
)(tD is a sequence of actions that should be performed on the interval of plan-
ning; K is a new knowledge about environment; T are new technologies. Sym-
bol F designates possible results of forecasting and foresight; )(tD are correc-
tions that are to be performed for reaching the goals; )(tR are necessary extra
resources. One of the main problems that are to be solved within the DP paradigm
is high quality forecasting of relevant processes.
Uncertainties in data processing, forecasting and decision making
Системні дослідження та інформаційні технології, 2023, № 3 67
Adequate models of the process and the forecasts generated with them are
helpful for taking into consideration a set of various influencing factors and make
based on objective planning managerial decisions. Another purpose of the studies
is in estimating possible risks using forecasts of volatility. There are several types
of processes that could be described with mathematical models in the form of ap-
propriately constructed equations or probability distributions. Among them are
the processes with deterministic and stochastic trends, and heteroscedastic proc-
esses. As of today the following mathematical models are widely used for de-
scribing nonlinear dynamics of processes relevant to planning: linear and nonlin-
ear regression (logit and probit, polynomials, splines), autoregressive integrated
moving average (ARIMA) models, autoregressive conditionally heteroscedastic
models (ARCH), generalized ARCH (GARCH), dynamic Bayesian networks,
support vector machine (SVM) approach, neural networks and neuro-fuzzy tech-
niques as well as combinations of the approaches mentioned [1–5].
All types of mathematical modeling usually need to cope with various kinds
of uncertainties associated with statistical/experimental data, structure of the pro-
cess under study and its model, parameter uncertainty, and uncertainties relevant
to the quality of models and forecasts. Reasoning and decision making are very
often performed with leaving many facts unknown or rather vaguely represented
in processing of data and expert estimates. To avoid or to take into consideration
the uncertainties and improve this way quality of the final result (estimates of
processes forecasts and planning of decisions based upon them) it is necessary to
construct appropriate computer based decision support systems (DSS) for solving
multiple specific problems.
Selection and application of a specific model for process description and
forecasts estimation depends on application area, availability of statisti-
cal/experimental data, qualification of personnel, who work on the data analysis
problems, and availability of appropriate applied software. Better results for esti-
mation of processes forecasts are usually achieved with application of ideologi-
cally different techniques combined in the frames of one specialized computer
system. Such approach to solving the problems of quality forecasts estimation can
be implemented in the frames of modern decision support systems. DSS today
(especially intellectual DSS) create a powerful instrument for supporting user’s
(managerial) decision making as far as it combines a set of appropriately selected
data and expert estimates processing procedures aiming to reach final result of
high quality: objective high quality alternatives for a decision making person
(DMP). Development of a DSS is based on modern theory and techniques of sys-
tem analysis principles, data processing systems, estimation and optimization the-
ories, mathematical and statistical modeling and forecasting, decision making
theory as well as many other results of theory and practice of processing data and
expert estimates [6–8].
The paper considers the problem of adequate models constructing for solv-
ing the problems of modeling and estimating forecasts for selected types of dy-
namic processes with the possibility for application of alternative data processing
techniques, modeling and estimation of parameters and states for the processes
under study in conditions of availability possible uncertainties.
L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk
ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 68
PROBLEM FORMULATION
The purpose of the study is as follows: 1) analysis of uncertainty types character-
istic for model building and forecasting dynamic processes; 2) selection of tech-
niques for taking into consideration the uncertainties detected; 3) selection of
mathematical modeling and forecasting techniques for nonstationary and nonlin-
ear heteroscedastic processes; 4) illustration of the methodology application to
solving selected problem of forecasts estimation using appropriate statistical data.
COPING WITH UNCERTAINTIES
All types of mathematical modeling with the use of statistical/experimental data
usually need to consider various kinds of uncertainties associated with data, in-
formational structure of a process under study and its model, parameter estimate
uncertainty, and uncertainties relevant to the quality of models and forecasts. In
many cases a researcher has to cope with the following basic types of uncertain-
ties: structural, statistical and parametric. Structural uncertainties are encountered
in the cases when structure of the process under study (and respectively its model)
is unknown or not clearly enough defined, in other words known partially only.
For example, when the functional approach to model constructing is applied usu-
ally we do not know details of an object (or a process) structure and it is estimated
with appropriate model structure estimation techniques: correlation analysis, es-
timation of mutual information, lags, testing for nonlinearity and nonstationarity,
identification of external disturbances etc. Uncertainty could also be introduced
by an expert who is studying the process and provides its estimates for model
structure, parameter restrictions, selection of computational procedures etc. The
sequence of actions necessary for identification, processing and taking into con-
sideration of uncertainties could be formulated as follows: – identification and
reduction of data uncertainty; – model structure and parameters estimation; – re-
duction of uncertainties related to the model structure and parameters estimation;
– reduction of uncertainties relevant to expert estimates; – estimation of forecasts
and reduction of respective uncertainties; – selection of the best final result using
appropriate set of quality statistics. All the tasks mentioned above are usually
solved sequentially (in an adaptive loop) with appropriately designed and imple-
mented DSS.
Here we consider uncertainties as the factors that influence negatively the
whole process of mathematical model constructing, forecasts and possible risk
estimating and generating of alternative decisions. These factors lead to lower
quality of intermediate and final results of computations performed within se-
lected or designed system. They are inherent to the process being studied due to
incomplete or noise corrupted data, complex stochastic external influences, in-
completeness or inexactness of our knowledge regarding the objects (systems)
structure, incorrect application of computational procedures etc. The uncertainties
very often appear due to incompleteness of data, noisy measurements or they are
invoked by sophisticated stochastic external disturbances with complex unknown
probability distributions, poor estimates of model structure or by a wrong selec-
tion of parameter estimation procedure. The problem of uncertainty identification
is solved with application of special statistical tests, visual studying of available
data, using appropriate expert estimates.
Uncertainties in data processing, forecasting and decision making
Системні дослідження та інформаційні технології, 2023, № 3 69
As far as we usually work with stochastic data, correct application of exist-
ing statistical techniques provides a possibility for approximate estimation of a
system (and its model) structure. To find “the best” model structure it is recom-
mended to apply adaptive estimation schemes that provide automatic search in a
pre-defined range of possible model structures and parameters (model order, time
lags, and possible nonlinearities). It is often possible to perform the search in the
class of regression type models with the use of information criterion of the fol-
lowing type [2]:
pN
pN
NVNFPEN N log))ˆ((log)(log , (1)
where ̂ is a vector of model parameters estimates; N is a power of time series
used; FPE is final prediction error term; )ˆ(NV can be determined by the sum of
squared errors; p is a number of model parameters. The value of the criteria (1)
is asymptotically equivalent to the Akaike information criterion with N . As
the amount of data N may be limited, then an alternative, the minimum descrip-
tion length (MDL) criterion
N
N
pVMDL N
)(log
))ˆ((log
could be hired to find the model that adequately represents available data with the
minimum amount of available information.
There are several possibilities for adaptive model structure estimation:
1) application of statistical criteria for detecting possible nonlinearities and the
type of nonstationarity (integrated or heteroskedastic process); 2) analysis of
partial autocorrelation for determining autoregression order; 3) automatic
estimation of the exogenous variable lag (detection of leading indicators);
4) automatic analysis of residual properties; 5) analysis of data distribution type
and its use for selecting correct model estimation method; 6) adaptive model
parameter estimation with hiring extra data; 7) optimal selection of weighting
coefficients for exponential smoothing, nearest neighbor and other techniques.
The development and use of a specific adaptation scheme depends on the volume and
quality of data, specific problem statement, requirements to forecast estimates etc.
The adaptive estimation schemes also help to cope with the model parameter
uncertainties. New data are used to re-compute model parameter estimates that
correspond to possible changes in the object under study. In the cases when model
is nonlinear, alternative parameter estimation techniques (say, MCMC) could be
hired to compute alternative (though admissible) sets of parameters and to select
the most suitable of them using statistical quality criteria.
Processing some types of possible stochastic uncertainties. While per-
forming practical modeling very often statistical characteristics (covariance ma-
trix) of stochastic external disturbances and measurement noise (errors) are un-
known. To eliminate this uncertainty optimal filtering algorithms are usually
applied that provide for a possibility of simultaneous estimation of object (sys-
tem) states and the covariance matrices. One of the possibilities to solve the prob-
lem is application of optimal Kalman filter. Kalman filter is used to find optimal
estimates of system states on the bases of a system model represented in a widely
used convenient state space form as follows:
L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk
ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 70
)()1()1,()1()1,()( kkkkkkkk wuBxAx , (2)
where )(kx is n-dimensional vector of system states; ...,2,1,0k is discrete
time; )1( ku is m- dimensional vector of deterministic control variables; )(kw
is n- dimensional vector of external random disturbances; )1,( kkA is )( nn -
matrix of system dynamics; )1,( kkB is )( mn matrix of control coefficients.
The double argument )1,( kk means that the variable or parameter is used at the
time moment k , but its value is based on the former (earlier) data processing in-
cluding moment ( 1k ). Usually the matrices A and B are written with one ar-
gument like )(kA and )(kB to simplify the text. Besides the main task, optimal
state estimation, Kalman filter can be used to solve the following problems: com-
puting of short-term forecasts, estimation of unknown model parameters includ-
ing statistics of external disturbances and measurement errors (adaptive extended
Kalman filter), estimation of state vector components that cannot be measured
directly, and fusion of data coming from various external sources (combining of
available data directed towards enhancement of its information content).
Obviously stationary system model is described with constant parameters
like A and B . As far as matrix A creates a link between two consequent system
states, it is also called state transition matrix. Discrete time k and continuous
time t are linked to each other via data sampling time sT : sTkt . In the classic
problem statement for optimal filtering the vector sequence of external distur-
bances )(kw is supposed to be zero mean white Gaussian noise with covariance
matrix Q , i.e. the noise statistics are as follows:
,)()]()([;,0)]([ T
kjkjkEkkE Qwww
where kj is Kronecker delta-function:
jk
jk
jk ,1
,0
; )(kQ is positively
defined covariance ( nn ) matrix. The diagonal elements of the matrix are vari-
ances for the components of disturbance vector )(kw . Initial system state 0x is
supposed to be known and the measurement equation for vector )(kz of output
variables is described by the equation:
)()()()( kkkk vxHz , (3)
where )(kH is )( nr observation (coefficients) matrix; )(kv is r-dimensional
vector of measurement noise with statistics: )]()([,0)]([ T jkEkE vvv
kjk )(R , where )(kR is )( rr positively defined measurement noise covari-
ance matrix, the diagonal elements of which represent variances of additive noise
for each measurable variable. The noise of measurements is also supposed to be
zero mean white noise sequence that is not correlated with external disturbance
)(kw and initial system state. For the system (2), (3) with state vector )(kx it is
necessary to find optimal state estimate )(ˆ kx at arbitrary moment k as a linear
combination of estimate )1(ˆ kx at the previous moment )1( k and the last
measurement available )(kz . The estimate of state vector )(ˆ kx is computed as an
optimal one with minimizing the expectation of the sum of squared errors, i.e.:
Uncertainties in data processing, forecasting and decision making
Системні дослідження та інформаційні технології, 2023, № 3 71
K
T kkkkE min))]()(ˆ())()(ˆ[( xxxx , (4)
where )(kx is an exact value of state vector that can be found using deterministic
part of the state equation (2); K is optimal matrix gain that is determined as a
result of minimizing quadratic criterion (4).
Thus, the filter is constructed to compute optimal state vector )(ˆ kx in condi-
tions of influence of external random system disturbances and measurement
noise. Here one of possible uncertainties arises when we don’t know estimates of
covariance matrices Q and R . To solve the problem an adaptive Kalman filter is
to be constructed that allows for computing estimates of Q̂ and R̂ simultane-
ously with the state vector )(ˆ kx . Another choice is in constructing separate algo-
rithm for computing the values of Q̂ and R̂ . A convenient statistical algorithm
for estimating the covariance matrices was proposed in [11]:
])()ˆˆ(ˆ[
2
1ˆ T1
21
1
1
ABBABR ;
T
1
ˆˆˆˆ ARARBQ ,
where
})]1()([)]1()([{ˆ T
1 kkkkE zAzzAzB ;
})]2()([)]2()([{ˆ T22
2 kkkkE zAzzAzB .
The matrices Q̂ and R̂ are used in the optimal filtering procedure as fol-
lows:
QAPAS ˆ)1()( T kk ; #]ˆ)()[()( RSS kkk ;
)()]([)( kkk SIP , ...,2,1,0k ,
where )(kS and )(kP are prior and posterior covariance matrices of estimate
errors respectively; the symbol “ # ” denotes pseudo-inverse; TA means matrix
transposition; )(k is a matrix of intermediate covariance results. The algorithm
was successfully applied to the covariance estimating in many practical applica-
tions. The computation experiments showed that the values of )(k become sta-
tionary after about 20–25 periods of time (sampling periods) in a scalar case,
though this figure is growing substantially with the growth of dimensionality of
the system under study. It was also determined that the parameter estimators are
very sensitive to the initial conditions of the system. The initial conditions should
differ from zero enough to provide stability for the estimates generated.
Other appropriate instruments for taking into consideration possible statisti-
cal uncertainties are fuzzy logic, neuro-fuzzy models, Bayesian networks, appro-
priate types of distributions etc. Some of statistical data uncertainties, such as
missing measurements, extreme values and high level jumps of stochastic origin
could be processed with appropriately selected statistical procedures. There exists
a number of data imputation schemes that help to complete the sets of the data
collected with improving its quality. For example, very often missing measure-
L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk
ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 72
ments for time series could be generated with appropriately selected distributions
or in the form of short term forecasts. Appropriate processing of jumps and ex-
treme values helps with adjusting data nonstationarity and to estimate correctly
the probability distribution for the stochastic processes under study.
Processing data with missing observations (data are in the form of time se-
ries). As of today for the data in the time series form the most suitable imputation
techniques are as follows: simple averaging when it is possible (when only a few
values are missing); generation of forecast estimates with the model constructed
using available measurements; generation of missing estimates from distributions
the form and parameters of which are again determined using available part of
data and expert estimates; the use of optimization techniques, say appropriate
forms of EM-algorithms (expectation maximization); exponential smoothing etc.
It should also be mentioned that optimal Kalman filter can also be used for impu-
tation of missing data because it contains “internal” forecasting function that pro-
vides a possibility for generating quality short-term forecasts [12]. Besides, it has
a feature of fusion the data coming from various external sources and improving
this way the quality (information content) of state vector and its forecasts.
Further reduction of this uncertainty is possible thanks to application of
several forecasting techniques to the same problem with subsequent combining of
separate forecasts using appropriate weighting coefficients. The best results of
combining the forecasts are achieved when variances of forecasting errors for dif-
ferent forecasting techniques do not differ substantially (at any rate the orders of
the variances should be the same).
Coping with uncertainties of model parameters estimates. Usually uncer-
tainties of model parameter estimates such as bias and inconsistency result from
low informative data or data do not correspond to normal distribution, what is
required in the case of least squares (LS) application for parameter estimation.
This situation may also take place in a case of multi-collinearity of independent
variables and substantial influence of process nonlinearity that for some reason
has not been taken into account when the model structure was estimated. When
power of the data sample is not satisfactory for model construction it could be
expanded by applying special techniques, or simulation can be hired, or special
model building techniques, such as group method for data handling (GMDH), are
applied. Very often GMDH produces results of acceptable quality with rather
short samples. If data do not correspond to normal distribution, then maximum
likelihood technique could be used or appropriate Monte Carlo procedures for
generating Markov Chains (MCMC) [13]. The last techniques could be applied
with quite acceptable computational expenses when the number of parameters is
not very high.
Generally, model structure and parameters estimation problems are at the
core of modeling and they should be paid appropriate attention. Here several
techniques should be applied to generate alternative sets of parameter estimates
and this way to get a possibility for selecting the best alternative. Statistical crite-
ria indicating model adequacy are helpful to select the best estimates. Also pa-
rameter estimates exhibiting the lowest variances are better than others having
higher variance. Usually parameter estimation techniques provide the possibility
for estimating the variances.
Uncertainties in data processing, forecasting and decision making
Системні дослідження та інформаційні технології, 2023, № 3 73
Dealing with model structure uncertainties. When considering mathe-
matical models it is convenient to use proposed here a unified notion (representa-
tion) of a model structure which we define as follows: lwdnmprS ,,,,,, ,
where r is model dimensionality (number of equations); p is model order
(maximum order of differential or difference equation in a model); m is a num-
ber of independent variables in the right hand side of a model; n is a nonlinearity
and its type (nonlinearity with respect to variables and parameters); d is a lag or
output reaction delay time; w is stochastic external disturbance and its type; l are
possible restrictions imposed on a model variables and/or parameters. When using
DSS, the model structure can practically always be estimated using data. It means
that elements of the model structure accept almost always only approximate values.
When a model is constructed on the purpose of forecasting we build several
candidates and select the best one of them with a set of model quality statistics.
Generally we could define the following techniques to fight structural uncertain-
ties: gradual improvement of model order (AR(p) or ARMA(p, q)) applying adap-
tive approach to modeling and automatic search for the “best” structure using
complex statistical quality criteria; adaptive estimation (improvement) of input
delay time (lag) and data distribution type with its parameters; describing detected
process nonlinearities with alternative analytical forms with subsequent estima-
tion of model adequacy and forecast quality. As another example of complex sta-
tistical model adequacy and forecast quality criterion could be the following:
i
UMAPEDWkeRJ
N
k
ˆ
1
22 min1ln2)(ln1 ,
where 2R is a determination coefficient; DW is Durbin-Watson statistic; MAPE
is mean absolute percentage error for estimated forecasts;
2
11
2 ])(ˆ)([)( kykyke
N
k
N
k
is the sum of squared model errors; U is Theil
coefficient that measures forecasting characteristic of a model; , are appropri-
ately selected weighting coefficients (their sum should be equal to 1); î is pa-
rameter vector for the i -th candidate model. A criterion of this type is used for
automatic selection of the best candidate model. The criterion also allows opera-
tion of DSS in an automatic adaptive mode. Obviously, other forms of the com-
plex criteria are possible. While constructing the criterion it is important not to
overweigh separate members in the right hand side of the expression. As a general
recommendation for model structure estimation can be application of appropriate
adaptation scheme.
Coping with uncertainties of a level (amplitude) type. The use of random
(i.e. with random amplitude or a level) and/or non-measurable variables results in
necessity of hiring fuzzy sets for describing such situations. The variable with
random amplitude can be described with some probability distribution if the
measurements are available or they come for analysis in acceptable time span.
However, some variables cannot be measured (registered) in principle, say
amount of shadow capital that “disappears” every month in offshore, or amount of
shadow salaries paid at some company, or a technology parameter, relative to
L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk
ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 74
control system, that cannot be measures on-line due to absence of appropriate
sensor. In such situations we could assign to the variable a set of possible values
in the linguistic form as follows: capital amount = { very low, low, medium, high,
very high }. There exists a complete necessary set of mathematical operations to
be applied to such fuzzy variables. Finally fuzzy value could be transformed into
usual exact form using known techniques.
Appropriately constructed optimal Kalman filter can also be applied for es-
timating non-measurable variables using known covariances between measurable
and non-measurable variables.
Processing probabilistic uncertainties. To fight probabilistic uncertainties
it is possible to hire Bayesian approach that helps to construct models in the form
of conditional distributions for the sets of random variables. Usually such models
represent the process (under study) variables themselves, stochastic disturbances
and measurement errors or noise. The problem of distribution type identification
also arises in regression modeling. Each probability distribution is characterized
by a set of specific values that random variable could take and the probabilities
for these values. The problem is in the distribution type identification and estimat-
ing its parameters. The probabilistic uncertainty (will some event happen or not)
could be solved with various models of Bayesian type. This approach is known as
Bayesian programming or paradigm. The generalized structure of the Bayesian
program application includes the following steps: 1) problem description and
statement with putting the question regarding estimation of conditional probabil-
ity in the form: ),|( KnDXp i , where iX is the main (goal) variable or event; the
probability p should be found as a result of application of some probabilistic in-
ference procedure; 2) statistical (experimental) data D and knowledge Kn are to
be used for estimating model and parameters of specific type; 3) selected and ap-
plied probabilistic inference technique should give an answer to the question put
above; 4) analysis of quality of the final result using appropriate statistics. The
steps given above are to some extent “standard” regarding model constructing and
computing probabilistic inference using statistical data available. This sequence
of actions is naturally consistent with the methods of cyclic structural and para-
metric model adaptation to the new data and operating modes (and possibly ex-
pert estimates).
One of the most popular Bayesian approaches today is created by the models
in the form of static and dynamic Bayesian networks (BN). Bayesian networks are
probabilistic and statistical models graphically represented in the form of directed
acyclic graphs (DAG) with vertices as variables of an object (system) under
study, and the arcs showing existing causal relations between the variables. Each
variable of BN is characterized with complete finite set of mutually excluding
states. Formally BN could be represented with the four following components:
TPGVN ,,, , where V stands for the set of model variables; G represents
directed acyclic graph; P is joint distribution of probabilities for the graph vari-
ables (vertices), }...,,{ 1 nXXV ; and T denotes conditional and unconditional
probability tables for the graphical model variables [14; 15]. The relations be-
tween the variables are established via expert estimates or applying special statis-
tical and probabilistic tests to statistical data (when available) characterizing dy-
namics of the variables hired to construct the model.
Uncertainties in data processing, forecasting and decision making
Системні дослідження та інформаційні технології, 2023, № 3 75
The procedure of constructing BN is generally the same as for models of
other types, say regression models. The set of the model variables should satisfy
the Markov condition that each variable of the network does not depend on all
other variables but for the variable’s parents. In the process of BN constructing
first the problem is solved of computing mutual information values between all
variables of the net. Then an optimal BN structure is searched using acceptable
quality criterion, say well-known minimum description length (MDL) that allows
for analyzing and improving the graph (model) structure on each iteration of
computing of the learning algorithm applied. Bayesian networks provide the fol-
lowing advantages for modeling: the model may include qualitative and quantita-
tive variables simultaneously as well as discrete and continuous ones; number of
the variables could be very large (thousands); the values for conditional probabil-
ity tables could be computed with the use of statistical data and expert estimates;
the methodology of BN constructing is directed towards identification of actual
causal relations between the variables hired what results in high adequacy of the
model; the model is also operable in conditions of missing data.
To reduce an influence of probabilistic and statistical uncertainties on models
quality and the forecasts based upon them it is also possible to use the models in
the form of Bayesian regression based on analysis of actual distributions of model
variables and parameters. Consider a simple two variables regression model:
nkkukxkxky ...,,1,0),()()(|)( 21 .
It is supposed that of random values nuu ...,,1 are independent and belong,
for example, to normal distribution ),0(~})({ 2
uNku ; here vector of unknown
parameters includes three elements T
u ),,( 2
21 . The likelihood function for
dependent variable T
1 )...,,( nyyy and predictor T
1 )...,,( nxxx without pro-
portion coefficient is determined as follows:
2
2
1
1221 ])()([
2
1
exp
1
),,,|( kxkyL
N
ku
N
u
uxy .
Using simplified (non-informative) distributions for the model parameters
)()()(),,( 3221121 uu gggg ;
const)( 11 g ; const)( 212 g ; uug /1)(3 ,
and Bayes theorem it is possible to find joint posterior distribution for the parame-
ters in the form [16]:
N
k
Nu kxkyyxh
1
2
21221 )()(
2
1
exp
11
),|,,( ,
u0,, 21 .
Maximum likelihood estimates for the model parameters are determined as
follows:
N
k
N
k
N
k
ykyxkx
ykyxkx
xy
11
1
221
])([])([
])([])([
ˆ;ˆˆ ,
L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk
ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 76
where
N
k
N
k kyNykxNx 1
1
1
1 )(,)( , with unbiased sample estimate
of variance:
])(ˆˆ)([
2
1
ˆ 21 1
22 kxky
N
s N
ku
.
Joint posterior density for the model parameters corresponds to two-
dimensional Student distribution:
N
k kxNsNh 1
22
22
2
11
2
211 )()ˆ()ˆ()2(),|,( xy
NN
k kx
5,0
12211 )()ˆ()ˆ(2
.
This way we get a possibility for using more exact distributions of models
variables and parameters what is necessary to enhance model quality. Using new
observation x and prior information regarding particular model it is possible to
determine the forecast interval for the dependent variable y as follows:
dddhxyLxyp ,,),|),,(),,,|()|( 212121 yx .
Another useful Bayesian approach is in hierarchical modeling that is based
on a set of simple conditional distributions comprising one model. The approach
is naturally combined with the theory of computing Bayesian probabilistic infer-
ence using modern computational procedures [17]. The hierarchical models be-
long to the class of marginal models where the final result is provided in the form
of a distribution )(yP , where y is available data vector. The models are formed
from the sequence of conditional distributions for selected variables including the
hidden ones. The hierarchical representation of parameters usually supposes that
data y is situated at the lower (first) level, model parameters (second level)
)...,,2,1,( nii , ),(~ 2 Ni , determine distributions of dependent vari-
ables niNy ii ...,,2,1),,(~ 2 , and parameters }{ i are determined by the
pair ),( 2 of the third level. Supposing the parameters 2 and 2 accept
known finite values, and parameter is unknown with the prior , then joint
prior density for ),( could be presented in the form: i i )|()( , and
the prior for parameter vector will be defined by the integral:
dp
i
i )|()()( .
Uncertainties associated with expert estimates. To decrease influence of
the expert estimate uncertainties they are to be processed adequately before prac-
tical use. Possible uncertainties of the expert estimates can be caused by the fol-
lowing reasons: uncertainties associated with input information, and the knowl-
edge and experience of an expert; uncertainties associated with the way of
thinking used by specific expert and the methodology he hires as well as the in-
formation processing “analytic” machine that is functioning in his mind etc. Such
uncertainties require application of special techniques to reduce their influence on
the quality of final result.
Uncertainties in data processing, forecasting and decision making
Системні дослідження та інформаційні технології, 2023, № 3 77
DATA, MODEL AND FORECASTS QUALITY CRITERIA
To achieve reliable high quality final result of risk estimation and forecasting at
each stage of computational hierarchy separate sets of statistical quality criteria
have been used. Data quality control is performed with the following criteria:
– analysis of database for missing values using developed logical rules, and
imputation of missed values with appropriately selected techniques;
– analysis of data for availability of outliers with special statistical tests, and
processing of outliers to reduce their negative influence on statistical properties of
the data available;
– normalizing data in the selected range in a case of necessity;
– application of low-order digital filters (usually low-pass filters) for separa-
tion of observations from measurement noise;
application of optimal (very often Kalman) filters for optimal state estima-
tion and fighting stochastic uncertainties;
– application of principal component method to achieve desirable level of or-
thogonalization between the variables selected;
– computing of extra indicators for the use in regression and other models
(say, moving average processes based upon measurements of dependent vari-
ables).
It is also useful to test how informative is the data collected. Very formal in-
dicator for the data being informative is its sample variance. It is supposed for-
mally that the higher is the variance the richer is the data with information. An-
other criterion is based on computing derivatives with a polynomial that describes
data in the form of a time series. For example, the equation given below can de-
scribe rather complex process with nonlinear trend and short-term variations im-
posed on the trend curve:
)(...)()( 2
21
1
0 kkckckcikyaaky m
m
p
i
i
,
where )(ky is basic dependent variable; ii ca , are model parameters;
...,2,1,0k is discrete time; )(k is a random process that integrates the influ-
ence of external disturbances to the process being modeled as well as model struc-
ture and parameters errors. The autoregressive part of model (1) describes the de-
viations that are imposed on a trend, and the trend itself is described with the m-th
order polynomial of discrete time k . In this case maximum number of derivatives
could be m, though in practice actual number of derivatives is defined by the larg-
est number i of parameter ic , that is statistically significant. To select the best
model constructed the following statistical criteria are used: determination coeffi-
cient ( )2R ; Durbin-Watson statistic ( DW ); Fisher F-statistic; Akaike informa-
tion criterion (AIC), residual sum of squares (SSE), and some others. The fore-
casts quality is estimated with hiring the criteria mentioned above in expressions
(1) and (2). To perform automatic model selection the above mentioned combined
criteria (1) could be hired. The power of the criterion was tested experimentally
and proved with a wide set of models and statistical data. Thus, the three sets of
quality criteria are used to insure high quality of final result.
L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk
ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 78
ILLUSTRATIVE EXAMPLE OF DATA PROCESSING METHODOLOGY
APPLICATION AND REDUCTION OF INFLUENCE UNCERTAINTIES
Consider closing stock prices in USD for IBM company given by the site Yahoo!
Finance. The learning sample was taken in the period from August 8, 2016, to
November 23, 2020. The test sample was taken in the period from November 24,
2020 to May 12, 2021. Short-term forecasting was performed with neural net-
works MLP and LSTM. For the sake of convenience the networks were desig-
nated as follows: MLP (n1, n2, w), where n1 is a number of neurons in the first
hidden layer; n2 number of neurons in the second hidden layer; w is a size of
window for input data, i.e. number of preceding measurements of a time series
that influence current value. The LSTM network has the following representation:
LSTM (n, w), where n is a number of neurons in hidden layer; w is size of data
window. The best forecasting results were achieved with the networks MLP
(32, 16, 5) and LSTM (64, 75). Statistical characteristics of the results are given
in Table 1 below.
T a b l e 1 . Results of short-term forecasting with ARIMA, MLP and LSTM
Statistic ARIMA(5,1,5) MLP(32,16,10) LSTM(64,75)
RSME 10.202 5.145 6.123
MAPE 5.702 3.129 4.099
MSE 104.094 26.498 37.621
Better forecasting results were achieved after application of exponential
smoothing to initial data (Table 2). This way we reduced noise uncertainties and
prepared the statistical data to further use by neural networks.
T a b l e 2 . Results of short-term forecasting with ARIMA, MLP and LSTM and
preliminary data processing
Statistic ARIMA(5,1,5) MLP(32,16,10) LSTM(64,75)
RSME 8.974 4.320 5.382
MAPE 4.682 2.935 3.116
MSE 95.169 22.806 33.127
It can be seen that MLP again produced the best result of short-term fore-
casting but all statistics are lower (better) than in previous case without prelimi-
nary data processing by exponential smoothing.
CONCLUSIONS
The general methodology was proposed for mathematical modeling and forecast-
ing dynamics of economic and financial processes that is based on the system
analysis principles. One of the main principles is identification and taking into
consideration possible uncertainties associated with data and expert estimates. As
instrumentation for fighting possible structural, statistic and parametric uncertain-
Uncertainties in data processing, forecasting and decision making
Системні дослідження та інформаційні технології, 2023, № 3 79
ties the following techniques are proposed to use: digital filtering, optimal Kal-
man filter, various missing data imputation techniques, multiple methods for
model parameter estimation, and Bayesian programming approach. The computa-
tional experiments carried out by the authors showed that the instrumentation for
fighting uncertainties has always provided better results regarding model ade-
quacy and quality of forecasts than processing data without these instruments.
Thus, it is highly advisable to use these data processing instruments for improving
quality of finale results of statistical and experimental data analysis. The illustra-
tive example, given above, shows that appropriate preliminary data processing
technique results in improvement of model adequacy and short-term forecasts.
REFERENCES
1. R.S. Tsay, Analysis of financial time series. Chicago: Wiley & Sons, Ltd., 2010, 715 p.
2. L. Harris, X. Hong, and Q.Gan, Adaptive Modeling, Estimation and Fusion from
Data. Berlin: Springer, 2002, 323 p.
3. P. Congdon, Applied Bayesian Modeling. Chichester: John Wiley & Sons, Ltd.,
2003, 472 p.
4. S.M. DeLurgio, Forecasting Principles and Applications. Boston: McGraw-Hill,
1998, 802 p.
5. S.J. Taylor, “Modeling stochastic volatility: a review and comparative study,”
Mathematical Finance, vol. 4, no. 2, pp. 183–204, 1994.
6. F. Burstein and C.W. Holsapple, Handbook of Decision Support Systems. Berlin:
Springer-Verlag, 2008, 908 p.
7. C.W. Hollsapple and A.B. Winston, Decision Support Systems. Saint Paul (MN):
West Publishing Company, 1996, 860 p.
8. P.I. Bidyuk, O.P. Gozhiy, Computer decision support systems. Mykolaiv: Petro Mo-
hyla Black Sea National University, 2012, 380 p.
9. E. Xekalaki and S.Degiannakis, ARCH Models for Financial Applications. Chiches-
ter: Wiley & Sons, Inc., 2010, 550 p.
10. C. Chatfield, Time Series Forecasting. Boca Raton: Chapman & Hall/CRC, 2000, 267 p.
11. W.N. Anderson, G.B. Kleindorfer, P.R. Kleindorfer, and M.B. Woodroofe, “Consis-
tent estimates of the parameters of a linear system,” The Annals of Mathematical
Statistics, vol. 40, no. 6, pp. 2064–2075, 1969.
12. B.P. Gibbs, Advanced Kalman Filtering, Least-squares and Modeling. Hoboken
(New Jersey): John Wiley & Sons, Inc., 2011, 627 p.
13. W.R. Gilks, S. Richardson, and D.J. Spiegelhalter, Markov Chain Monte Carlo in
Practice. New York: Chapman & Hall/CRC, 2000, 486 p.
14. F.V. Jensen and Th.D. Nielsen, Bayesian Networks and Decision Graphs. New
York: Springer, 2007, 457 p.
15. M.Z. Zgurovsky, P.I. Bidyuk, O.M. Terentyev, and T.I. Prosyankina-Zharova,
Bayesian Networks in Decision Support Systems. Kyiv: Edelweiss, 2015, 300 p.
16. J.M. Bernardo and A.F.M. Smith, Bayesian theory. New York: John Wiley & Sons,
Ltd., 2000, 586 p.
17. W.M. Bolstad, Understanding Computational Bayesian Statistics. Hoboken (New
Jersey): John Wiley & Sons, Ltd, 2010, 334 p.
Received 23.06.2023
L.B. Levenchuk, O.L. Tymoshchuk, V.H. Guskova, P.I. Bidyuk
ISSN 1681–6048 System Research & Information Technologies, 2023, № 3 80
INFORMATION ON THE ARTICLE
Liudmyla B. Levenchuk, ORCID: 0000-0002-8600-0890, Educational and Research
Institute for Applied System Analysis of the National Technical University of Ukraine
“Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: lusi.levenchuk@gmail.com
Oxana L. Tymoshchuk, ORCID: 0000-0003-1863-3095, Educational and Research Insti-
tute for Applied System Analysis of the National Technical University of Ukraine “Igor
Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: oxana.tim@gmail.com
Vira H. Huskova, ORCID: 0000-0001-7637-201Х, Educational and Research Institute
for Applied System Analysis of the National Technical University of Ukraine “Igor Sikor-
sky Kyiv Polytechnic Institute”, Ukraine, e-mail: guskovavera2009@gmail.com
Petro I. Bidyuk, ORCID: 0000-0002-7421-3565, Educational and Research Institute for
Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky
Kyiv Polytechnic Institute”, Ukraine, e-mail: pbidyuke_00@ukr.net
НЕВИЗНАЧЕНОСТІ В ОБРОБЛЕННІ ДАНИХ, ПРОГНОЗУВАННЯ І
ПРИЙНЯТТЯ РІШЕНЬ / Л.Б. Левенчук, О.Л. Тимощук, В.Г. Гуськова, П.І. Бідюк
Анотація. Прогнозування, динамічне планування та оброблення поточних ста-
тистичних даних визначаються як процес оцінювання поточного стану підпри-
ємства на ринку порівняно з іншими конкуруючими підприємствами та визна-
чення подальших цілей, а також послідовностей дій та ресурсів, необхідних
для досягнення визначених цілей. Для здійснення прогнозування високої якос-
ті запропоновано визначити та врахувати можливі невизначеності, пов’язані з
даними та експертними оцінками. Це один з принципів системного аналізу,
який застосовується для досягнення високої якості кінцевого результату. На-
ведений огляд деяких невизначеностей та ілюстративний приклад, який пока-
зує поліпшення кінцевого результату після врахування можливої стохастичної
невизначеності.
Ключові слова: математична модель, невизначеності статистичних даних,
принципи системного аналізу, прогнозування, система підтримання прийняття
рішень.
|
| id | journaliasakpiua-article-290369 |
| institution | System research and information technologies |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2025-07-17T10:28:22Z |
| publishDate | 2023 |
| publisher | The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" |
| record_format | ojs |
| resource_txt_mv | journaliasakpiua/12/eca1b72d5a6ca1ecb1b73ae6b664bb12.pdf |
| spelling | journaliasakpiua-article-2903692023-11-07T22:19:24Z Uncertainties in data processing, forecasting and decision making Невизначеності в обробленні даних, прогнозування і прийняття рішень Levenchuk, Liudmyla Tymoshchuk, Oxana Huskova, Vira Bidyuk, Petro mathematical model statistical data uncertainties system analysis principles forecasting decision support system математична модель невизначеності статистичних даних принципи системного аналізу прогнозування система підтримання прийняття рішень Forecasting, dynamic planning, and current statistical data processing are defined as the process of estimating an enterprise’s current state on the market compared to other competing enterprises and determining further goals as well as sequences of actions and resources necessary for reaching the goals stated. In order to perform high-quality forecasting, it is proposed to identify and consider possible uncertainties associated with data and expert estimates. This is one of the system analysis principles to be hired for achieving high-quality final results. A review of some uncertainties is given, and an illustrative example showing improvement of the final result after considering possible stochastic uncertainty is provided. Прогнозування, динамічне планування та оброблення поточних статистичних даних визначаються як процес оцінювання поточного стану підприємства на ринку порівняно з іншими конкуруючими підприємствами та визначення подальших цілей, а також послідовностей дій та ресурсів, необхідних для досягнення визначених цілей. Для здійснення прогнозування високої якості запропоновано визначити та врахувати можливі невизначеності, пов’язані з даними та експертними оцінками. Це один з принципів системного аналізу, який застосовується для досягнення високої якості кінцевого результату. Наведений огляд деяких невизначеностей та ілюстративний приклад, який показує поліпшення кінцевого результату після врахування можливої стохастичної невизначеності. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2023-09-29 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/290369 10.20535/SRIT.2308-8893.2023.3.05 System research and information technologies; No. 3 (2023); 66-80 Системные исследования и информационные технологии; № 3 (2023); 66-80 Системні дослідження та інформаційні технології; № 3 (2023); 66-80 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/290369/283956 |
| spellingShingle | математична модель невизначеності статистичних даних принципи системного аналізу прогнозування система підтримання прийняття рішень Levenchuk, Liudmyla Tymoshchuk, Oxana Huskova, Vira Bidyuk, Petro Невизначеності в обробленні даних, прогнозування і прийняття рішень |
| title | Невизначеності в обробленні даних, прогнозування і прийняття рішень |
| title_alt | Uncertainties in data processing, forecasting and decision making |
| title_full | Невизначеності в обробленні даних, прогнозування і прийняття рішень |
| title_fullStr | Невизначеності в обробленні даних, прогнозування і прийняття рішень |
| title_full_unstemmed | Невизначеності в обробленні даних, прогнозування і прийняття рішень |
| title_short | Невизначеності в обробленні даних, прогнозування і прийняття рішень |
| title_sort | невизначеності в обробленні даних, прогнозування і прийняття рішень |
| topic | математична модель невизначеності статистичних даних принципи системного аналізу прогнозування система підтримання прийняття рішень |
| topic_facet | mathematical model statistical data uncertainties system analysis principles forecasting decision support system математична модель невизначеності статистичних даних принципи системного аналізу прогнозування система підтримання прийняття рішень |
| url | https://journal.iasa.kpi.ua/article/view/290369 |
| work_keys_str_mv | AT levenchukliudmyla uncertaintiesindataprocessingforecastinganddecisionmaking AT tymoshchukoxana uncertaintiesindataprocessingforecastinganddecisionmaking AT huskovavira uncertaintiesindataprocessingforecastinganddecisionmaking AT bidyukpetro uncertaintiesindataprocessingforecastinganddecisionmaking AT levenchukliudmyla neviznačenostívobroblennídanihprognozuvannâíprijnâttâríšenʹ AT tymoshchukoxana neviznačenostívobroblennídanihprognozuvannâíprijnâttâríšenʹ AT huskovavira neviznačenostívobroblennídanihprognozuvannâíprijnâttâríšenʹ AT bidyukpetro neviznačenostívobroblennídanihprognozuvannâíprijnâttâríšenʹ |