Генеративна модель для прогнозування часових рядів на основі архітектури кодувальник-декодувальник

Encoder-decoder neural network models have found widespread use in recent years for solving various machine learning problems. In this paper, we investigate the variety of such models, including the sparse, denoising and variational autoencoders. To predict non-stationary time series, a generative m...

Full description

Saved in:
Bibliographic Details
Date:2022
Main Authors: Nedashkovskaya, Nadezhda, Androsov, Dmytro
Format: Article
Language:English
Published: The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2022
Subjects:
Online Access:https://journal.iasa.kpi.ua/article/view/259236
Tags: Add Tag
No Tags, Be the first to tag this record!
Journal Title:System research and information technologies
Download file: Pdf

Institution

System research and information technologies
_version_ 1867334426183073792
author Nedashkovskaya, Nadezhda
Androsov, Dmytro
author_facet Nedashkovskaya, Nadezhda
Androsov, Dmytro
author_institution_txt_mv [ { "author": "Nadezhda Nedashkovskaya", "institution": "Educational and Scientific Complex \"Institute for Applied System Analysis\" of the National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" }, { "author": "Dmytro Androsov", "institution": "Educational and Scientific Complex \"Institute for Applied System Analysis\" of the National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" } ]
author_sort Nedashkovskaya, Nadezhda
baseUrl_str http://journal.iasa.kpi.ua/oai
collection OJS
datestamp_date 2022-06-21T10:27:50Z
description Encoder-decoder neural network models have found widespread use in recent years for solving various machine learning problems. In this paper, we investigate the variety of such models, including the sparse, denoising and variational autoencoders. To predict non-stationary time series, a generative model is presented and tested, which is based on a variational autoencoder, GRU recurrent networks, and uses elements of neural ordinary differential equations. Based on the constructed model, the system is implemented in the Python3 environment, the TensorFlow2 framework and the Keras library. The developed system can be used for modeling continuous time-dependent processes. The system minimizes a human factor in the process of time series analysis, and presents a high-level modern interface for fast and convenient construction and training of deep models.
doi_str_mv 10.20535/SRIT.2308-8893.2022.1.08
first_indexed 2025-07-17T10:27:53Z
format Article
fulltext  N.I. Nedashkovskaya, D.V. Androsov, 2022 Системні дослідження та інформаційні технології, 2022, № 1 97 UDC 004.85 DOI: 10.20535/SRIT.2308-8893.2022.1.08 GENERATIVE TIME SERIES MODEL BASED ON ENCODER-DECODER ARCHITECTURE N.I. NEDASHKOVSKAYA, D.V. ANDROSOV Abstract. Encoder-decoder neural network models have found widespread use in recent years for solving various machine learning problems. In this paper, we inves- tigate the variety of such models, including the sparse, denoising and variational autoencoders. To predict non-stationary time series, a generative model is presented and tested, which is based on a variational autoencoder, GRU recurrent networks, and uses elements of neural ordinary differential equations. Based on the constructed model, the system is implemented in the Python3 environment, the TensorFlow2 framework and the Keras library. The developed system can be used for modeling continuous time-dependent processes. The system minimizes a human factor in the process of time series analysis, and presents a high-level modern interface for fast and convenient construction and training of deep models. Keywords: prediction, variational autoencoder, GRU recurrent neural network, neu- ral ordinary differential equation, latent space, nonstationary time series. INTRODUCTION Classical methods of autoregression with moving average (ARMA) [1, 2] are used to analyze and predict stationary time series. Autoregressive models with integrated moving average (ARIMA) [1, 3], heteroskedastic (ARCH/GARCH) [1, 4, 5] and other [6] are designed to analyze a wider class of nonstationary proc- esses. GARCH models, in particular, help to provide the volatility analysis of fi- nancial time series [7]. ARIMA models are based on numerical differentiation technique and an operator of finite differences to make time series stationary. Moving variance is applied in GARCH models to model heteroskedasticity. The choice of degree of autoregression and moving average in ARMA, ARIMA, ARCH and GARCH models, when analyzing the autocorrelation, is often carried out manually. Recurrent neural networks (RNNs) of the long short-term memory (LSTM) type have also been used in recent years to predict time series [8–11]. Gated re- current unit (GRU), proposed in 2014 [11], is a simplified version of the LSTM network, probably shows as good results as LSTM [12], and therefore is widely used in recent years. Machine learning and deep learning techniques [8–15] main- ly require scaling of the input data and presentation of the series in the form of “values for previous periods – values for the current period” or “features – the resulting value”. The main problem of such a representation is the invariance of the fixed values of the series with respect to time. This representation of the series assumes that each value of the series is fixed at the same interval, although in practice this is not always the case. The paper aims to develop a generative model on basis of the autoencoder for time series prediction, which will be sensitive to different intervals of fixing N.I. Nedashkovskaya, D.V. Androsov ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 98 values of the series and will be able to find hidden patterns in the data. The goal is also to minimize human interference in the data processing, leaving only the re- quirement to scale the input data. AUTOENCODER MODELS An autoencoder is an artificial neural network that, without a teacher, based on an unmarked data is able to recognize encodings – effective representations of input data [8, 13]. Such encodings often have a much smaller dimension compared to the input data, so autoencoders are also a means to reduce dimensionality. An important feature of the autoencoder is that it can be a generative model, capable of randomly generating new data that is very similar to the input. Goals of the autoencoder are as follows: to reconstruct the input data, as well as to identify features hidden in the input data. The typical autoencoder model consists of two parts (Fig.1): the coder and the decoder networks. The coder has to recognize and convert the input data into a latent space, that is the internal representation of the input data. The decoder, in turn, is seen as a generating network that converts the internal representation into outputs. Typically, the decoder has the same architec- ture as the coder, but symmetrically mapped relative to the layer responsible for creating the latent space (Fig.1). To achieve the first goal, namely, to provide the reconstruction of input data, training of the autoencoder is performed by minimizing the loss function, which is called the reconstruction error: )))((,( xfgxE , (1) where х is an input vector, E is a function that penalizes ))(( xfg for dissimilarity to x , )(hg is the decoder output, )(xfh  is the coder output. In order to better identify the features hidden in the input data, a regulariza- tion is added to the autoencoder model. This allows the model to obtain more properties in addition to the ability to copy input data. The desirable properties of the model are as follows: Fig. 1. An example of a deep autoencoder model for the mnist dataset reconstruction, adapted from [8] Generative time series model based on encoder-decoder architecture Системні дослідження та інформаційні технології, 2022, № 1 99 – presentation of sparsity of data; – resistance to noise in the input data and the absence of part of the inputs; – small values of the derivatives of codings relative to the input data. The regularized autoencoder is a model with the loss function presented in the form [13]: ),())(,( xhhgxE  , (2) where E is the reconstruction loss (1), h is a coder layer, ),( xh is a coder layer penalty. The peculiarity of regularized autoencoders is the absence of an obvious Bayesian interpretation. Thus, other known regularized models, for example the ridge regression and other, are an approximation of the Bayesian maximum of the a posteriori probability with the addition of a regularizing penalty, which corre- sponds to the a priori probability distribution of the model parameters. Regularized autoencoders have a different interpretation, because the ),( xh penalty depends on the x – an input data and therefore cannot be formally considered as a priori distribution. However, it is still believed that the introduction of the ),( xh regu- larizer helps to implicitly prefer certain functions. Let us consider several models of regularized autoencoders depending on how the penalty ),( xh in (2) is defined. Model of sparse autoencoder. One of the key reasons for the high energy ef- ficiency of the human brain is the sparse activation of its neurons: only a small part of the neurons is active in the brain at any given time. To model the sparseness in an artificial neural network, we consider the probabilistic interpretation of neuronal activation. Let the artificial neuron of the hidden layer be a Bernoulli random variable, and the average value of activation of this neuron corresponds to the probability of obtaining a unit in the Bernoulli test. The probability of activation of each such individual neuron should be low to increase the sparseness of the neurons of the hidden (latent) layer. Let the desired probability of neuron activation be equal to  , and let the empirical average value of neuron activation on the basis of train data be equal to ̂ . Sparsity loss [16, 17] is considered as the ),( xh penalty in expression (2), a measure of dissimilarity between distributions and is based on the Kullback–Leibler divergence between the model distribution and the data distribution:    ˆ ln)ˆ||(KL . Learning criterion for the sparse autoencoder can be considered as follows: ),()))((,()1( xhxfgxE  ,      N i i i i N i iiKLxh 11 ˆ ln)ˆ||(),( , where ),( xh is the sparsity loss, N is a number of neurons in a coder layer, ]1,0[ is a sparsity weight, which is a hyperparameter of the model. N.I. Nedashkovskaya, D.V. Androsov ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 100 If  is large, the model will pay more attention to the target sparsity, but will not be able to reconstruct the inputs properly. If, on the contrary, the weight is too low, the model will mostly ignore the sparsity and will not find interesting features in the data. Methods of decision support [18, 19] can be used to deter- mine the most acceptable  value based on quantitative and qualitative decision criteria in a particular practical problem. An important property of the sparse autoencoder is that it can be considered as a generative model with latent variables, which approximates the maximum likelihood. Let us consider a model [13] with an input vector of visible variables x, latent variables h and a common probability distribution )|()(),( modelmodelmodel hxphphxp  . )(model hp is called the a priori distribution of latent variables and represents the a priori belief of the model that it will “see” the input vector x, where h is still the output of the coder. This interpretation differs from the traditional use of the term “a priori”, which denotes the distribution )(wp , which describes the hypotheses about the parameters of the model before reading the training data. The logarithm of the plausibility of the model can be represented as:  h hxpxp ),(ln)(ln modelmodel . The autoencoder is considered as an approximation of this sum by a point es- timate for only one value of h, which has a high probability. With this choice of h, the following function is maximized: )|(ln)(ln),(ln modelmodelmodel hxphphxp  . The term )(ln model hp can cause sparsity. For example, the a priori Laplace distribution |)|(exp 2 )(model ii hhp    corresponds to the sparsity penalty in terms of the 1L norm. Denoising autoencoder. Another way to make the autoencoder show interest- ing features is to add noise to the inputs and teach it to restore the initial not noisy input [8, 13, 20, 21]. There are two ways:  a random variable is added to the input vector, normally distributed with a small variance, which determines the noise level;  part of the input neurons is set to zero. The level of noise is determined by what part it is. This method is more used in image processing problems. In the denoising autoencoder models, a conditional distribution )|ˆ( xxC of noisy examples under the condition of true examples is introduced. Next, the au- toencoder learns the distribution of the reconstruction )ˆ|(reconstr xxp , which is estimated on the basis of training pairs )ˆ,( xx as follows [13]:  select example x from the training set;  select the noisy version x̂ with )|ˆ( xxC ; Generative time series model based on encoder-decoder architecture Системні дослідження та інформаційні технології, 2022, № 1 101  use )ˆ,( xx as a training example to estimate the distribution of reconstruc- tion )|()ˆ|( decoderreconstr hxpxxp  , where h is the output of the coder, and decoderp is determined by the decoder )(hg ;  minimize the following loss function using the mini-batch gradient de- scent: )|(ln)ˆ|(ln decoderreconstr hxpxxp  . If the encoder is deterministic, then the autoencoder is often a feedforward neural network, and the same methods can be used to train it as for any feedfor- ward neural network, for example, the mini-batch stochastic gradient descent. The variational autoencoder (VAE) model was proposed in 2014 and is de- signed to reconstruct the law of distribution of training data for artificial genera- tion of samples from the general distribution [22]. This is a probabilistic model, because its output after training is determined randomly. VAE has a basic archi- tecture common to all autoencoders (Fig. 2): the first part corresponds to the en- coder network (it consists of the hidden layers 1 and 2 in the example in Fig. 2), followed by the decoder network (the hidden layers 3 and 4 in Fig. 2). The differ- ence from deterministic encoders is that the VAE encoder for a given input results in the average encoding μ and the standard deviation σ. The coding is then chosen randomly from the Gaussian distribution with mean μ and standard deviation σ. A standard distribution other than Gaussian can also be used. Next, the decoder de- codes the received encoding in the usual way. Inputs (on the right in Fig. 2) can have a complex distribution. During train- ing, the coding moves inside the latent space, the coding space, and occupies an approximately spherical region similar to a cloud of Gaussian points. The loss function of the variational autoencoder is the sum of two terms [8]: ),())(,( xhLhgxE  , Fig. 2. An example of the variational autoencoder model, VAE [8] N.I. Nedashkovskaya, D.V. Androsov ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 102 where E is the reconstruction error (1) of the input vector x, as before; ),( xhL is the latent loss, which is often the Kullbak–Leibler divergence between the target distribution, such as Gaussian, and the actual coding distribution. It is easy to generate a new example based on a trained variational autoen- coder: you need to choose a random encoding from the Gaussian distribution and decode it. LODE-GRU-VAE VARIATIONAL AUTOENCODER MODEL FOR TIME SERIES PREDICTION Fig. 3 shows a simplified architecture of the proposed model. It consists of an en- coder – a LODE-GRU network, and a decoder network. Two modifications are proposed: with simulation of timestamp distribution and without it. In both cases, the pair  ii tx , – the data tensor together with the corresponding timestamps is the input for the model (Fig. 3). In the second case we suggest the uniform distri- bution of these time slices on the observation interval. The proposed model is based on the ideas of variational encoder and GRU recurrent neural net- works. Features of the GRU unit in comparison with the known LSTM are as follows. Firstly, a single state vector )(th is used. Secondly, the forgetting gateway and the input gateway are controlled by a single con- troller – a fully connected layer with a logistical acti- vation function. If the result of this controller is equal to 1, then the forget gateway opens and the input gateway closes. If the result of the controller is equal to 0, then the opposite action occurs. This means that the place is first cleared before saving a certain memory. Thirdly, there is no output gateway in the GRU, so a complete state vector is the result at each time step. The main fully connected layer analyzes the current inputs and some parts of the previous state, which are determined by the additional gateway controller. The LODE-GRU encoder in Fig. 3 is a modifica- tion of recurrent networks of the GRU type. The modification is to use the ordinary differential equa- tions (ODEs) to predict the values of hidden trajectories. As a result, the LODE- GRU network layers are defined as follows: ),, , ( 11   tthfSolh , (3) )( uhuxu bhWxWu   , (4) )( rhrxr bhWxWr   , (5) tanh ( )( )xh hh hh W x W r h b         , (6) (1 )h u h u h         . (7) LODE-GRU Decoder zi <xi, ti> xi Fig. 3. A simplified LODE-GRU-VAE model Generative time series model based on encoder-decoder architecture Системні дослідження та інформаційні технології, 2022, № 1 103 The block described by equation (3) generates a set of points according to the trajectory of the studied process under the conditions (4)–(7) of process transformations, which project the trajectory to another space called latent or hidden. The combination of neural ordinary differential equations and neural RNN networks began to be studied in 2018 to model irregularly observed time series [23], and was further developed in [24–26]. Equation (3) is called the latent ordinary differential equation (Latent ODE). Model (3)–(7) helps to generate new series values at intermediate points between observations, in contrast to standard neural RNN LSTM networks. This is especially useful when the time interval be- tween adjacent observations is large. Let us modify the described model (3)–(7) to obtain the probability distributions of the hidden trajectories. Let us consider the case when the source vector for the encoder is a multidimensional Gaussian vector with a mathematical expectation equal to 0z and the variance equal to 0z during the last observation  . Therefore, the additional layer described by the next formula is added: ), (~),( 000 zzhgz    . (8) Thus, the LODE-GRU-VAE model is described by formulas (3)–(8) and represents a variation encoder. The coding of the input data x in this model is characterized by a conditional probability distribution )/( xzq , where z is a ran- dom vector in the latent space. The loss function in this case is the mathematical expectation of losses relative to z : ~ ( / ) ~ ( / )( , ) ln ( , ) ( / ) (( ))z q z x z q z x KLL z q p x z D q z x p z    , (9) where ),( zxp is the joint distribution of x and z ,  KLD A B is the Kullbak– Leibler divergence, which determines the degree of “dissimilarity” of distribu- tions A and B . The rationale for the variation encoder is based on the Expectation Maximi- zation (EM) method. For the autoencoder, the input and target are the same vec- tor x. Therefore, the decoder returns the conditional distribution )/( zxp when the code z is an input, and the error function determines the plausibility of the “binding” of the error function to the output of the last layer of the network. The EM algorithm assumes that we can calculate the distribution )/(),/( ),(  xpxzpzxp  . The problem is to maximize )/( xp with respect to the parameters  . Let us take the logarithm of both parts of the equality above and express )/(ln xp : ),/(ln)/,(ln)/(log  xzpzxpxp . Next let us take the mathematical expectation with respect to z :     dzxzpzqdzzxpzqdzxpzq ZZZ ),/(ln)/,(ln)/(ln)(   . After the transformations we get: dz zq xzp zqdz zq zpzxp zqxp zZ                 )( ),/( ln)( )( )/(), /( ln)()/(ln . N.I. Nedashkovskaya, D.V. Androsov ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 104 The last term on the right is the Kullbak–Leibler divergence, which is always non-negative quantity. Therefore, the expression dz zq zpzxp zq Z         )( )/(), /( ln)( can be considered as the lower estimate of the value of )/(ln xp . Let us denote ),( )( )/(), /( ln)(         qELBOdz zq zpzxp zq Z . Then ( ,θ) ln ( /θ) ( )( ) ( / )KLELBO q p x D q z p z x   . (10) In general, maximizing (10) by the parameter  , the approximation ) (zq to )/( xzp is performed. Let us show that the loss function (9) is equal to ),( qELBO (10). Using the formulas of conditional probability, let us rewrite (9) in the form: ~ ( / )( ) ln ( / ) ( / ( ))( )z q z x KLq p x z D q z x p z   . (11) Since ( / ) ( ))(KLD q z x p z is close to zero for the same distributions )/( xzq and )(zp , and )(ln)/(ln xpzxp  , then the upper limit of the loss function will also be )(ln xp . That is, we came to the same result. The described LODE-GRU-VAE model can be trained by Adam, Rprop or similar methods with respect to the sample ));(;()/(~  xfzqxzqz in order to obtain a gradient with respect to  and subsequent generation of distribution pa- rameters. With this approximation, the Monte Carlo method can be used to calcu- late the mathematical expectation for )/(ln zxp and the Kullbak–Leibler diver- gence at a fixed  . Let us consider how the described LODE-GRU-VAE model can be used to generate a new sequence of series. For example, the initial sequence of n values is first fed to the input of the model, and the prediction of one next value is per- formed on the basis of the model. Next, the predicted value is joined to the sequence. The prediction for the next value is calculated on basis of the last n values, which are given to the model. This process can generate a new sequence that is similar to the original time series. STATEMENT AND RESULTS OF THE EXPERIMENT. SELECTION OF MODEL HYPERPARAMETERS For the time series prediction experiment, let us choose a synthetic nonstationary time series, which is a function of ], 0[, ),2(sin Tttx   . To bring the data closer to the real ones, the Gaussian noise is added to each value of the series:  ), , 0(~, )(sin tx . Not all historical data is always available in real forecasting problems. Therefore, additional values are introduced: d is the sampling step and i is the indicator set of points to be left in the sample. The simulated series can be for- mally written as follows: Generative time series model based on encoder-decoder architecture Системні дослідження та інформаційні технології, 2022, № 1 105 ittiXX dkkk T ,1 T ))(sin,(ˆ   . The following parameters were selected for the experiment:  25,0 .  1000d .  Number of points to stay is 150.  The number of layers in the network that simulates the behavior of ODE and their dimension are 1 and 6, respectively.  Learning rate is adaptive, initially equal to 0,01.  The method for solving the differential equation is Dormand–Prince.  The optimization method for learning the neural network is Adamax. Gradients within the ODE-layers are calculated by the method of conjugate equa- tions.  The dimension of the hidden space is 6.  The initial value of the variance is 0,1.  Number of epochs is 200. The next step is the analysis and processing of the experimental results. Met- rics for assessing the quality of the model can be components of the error func- tion. For example, the Kullbak–Leibler divergence can be chosen as a metric. But the limitation of the latter is that it is difficult to interpret. In this work, the mean square error MSE and the coefficient of determination 2R are used. Several experiments have been carried out to prove the efficiency of the model (3) – (8) (Fig. 4, 5). In the first experiment, acceptable values of model quality metrics were obtained, namely MSE 11,0 and 78,02 R . In the second experiment, the values of the model quality metrics were almost the same and Fig. 4. Real and predicted values of the target variable (experiment 1) 2,0 1,5 1,0 0,5 0,0 –0,5 time y N.I. Nedashkovskaya, D.V. Androsov ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 106 also acceptable: MSE 10,0 , 80,02 R . The observed points on which the model was built are marked in Fig. 4, 5 in blue color. The unobserved values of the test set, which were used to assess the quality of the obtained model are marked in orange color. The graph of the predicted values is marked in yellow. Model’s predicted values are of admissible MAPE and RMSE rates. The software in the Python3 environment using the TensorFlow2 framework and the Keras library was developed, which implements the proposed encoder- decoder system. The main arguments in favor of choosing the Python 3 program- ming language were the speed of writing code and the popularity of this language. Tensorflow2 uses the capabilities of the Nvidia CUDA and has an integrated Keras library, and thus was chosen as the best framework in terms of hardware and performance resources. It is a high-level modern API for fast and easy design and learning of deep models. CONCLUSIONS The work is devoted to the research and development of a neural network model based on the encoder-decoder architecture and recurrent blocks for predicting the values of nonstationary time series. Models and methods of machine and deep learning used for sequence processing are studied: LSTM and GRU models of recurrent neural networks, generative models, such as VAE, encoder-decoder models for detecting hidden patterns in data, Adam and Adamax stochastic opti- mization learning methods. The LODE-GRU-VAE model was built and tested to reconstruct the dynam- ics of nonstationary time series. The model allows to generate values at interme- diate points between observations, and therefore it is possible to generate new 2,5 2,0 1,5 1,0 0,5 0,0 time y Fig. 5. Real and predicted values of the target variable (experiment 2) Generative time series model based on encoder-decoder architecture Системні дослідження та інформаційні технології, 2022, № 1 107 values of a series where the time interval between two adjacent observations is large. In standard GRU-type RNN networks, the latent state is updated with each observation and remains constant between them. Conversely, within the frame- work of the LODE-GRU structure, a neural ordinary differential equation learns to model a continuous change in the latent state of a network between two obser- vations. The encoder-decoder system is implemented based on the proposed model in the Python3 environment using the TensorFlow2 framework and the Keras li- brary. Experiments have been carried out to prove the efficiency of this system in the problems of modeling processes that depend on continuous time. REFERENCES 1. P.I. Bidyuk, V.D. Romanenko, and O.L. Timoshchuk, Time series analysis. Kyiv: Polytechnika, NTUU “KPI”, 2013. 2. Terence C. Mills, “Chapter 3 - ARMA Models for Stationary Time Series”, Applied Time Series Analysis. A Practical Guide to Modeling and Forecasting, pp. 31–56, 2019. Available: https://doi.org/10.1016/B978-0-12-813117-6.00003-X. 3. Terence C. Mills, “Chapter 4 - ARIMA Models for Nonstationary Time Series”, Ap- plied Time Series Analysis. A Practical Guide to Modeling and Forecasting, pp. 57–69, 2019. Available: https://doi.org/10.1016/B978-0-12-813117-6.00004-1. 4. Amélie Charles and Olivier Darné, “The accuracy of asymmetric GARCH model es- timation”, International Economics, vol. 157, pp. 179–202, May 2019. Available: https://doi.org/10.1016/j.inteco.2018.11.001 5. O.L. Tymoshchuk, V.H. Huskova, and P.I. Bidyuk, “A combined approach to mod- eling nonstationary heteroscedastic processes”, Radio Electronics, Computer Sci- ence, Control, (2), pp. 80–89, 2019. Available: https://doi.org/10.15588/10.15588/ 1607-3274-2019-2-9. 6. P. Bidyuk, T. Prosyankina-Zharova, and O. Terentiev, “Modelling nonlinear nonsta- tionary processes in macroeconomy and finances”, Advances in Computer Science for Engineering and Education, vol. 754, pp. 735–745, 2019. Available: https://doi.org/10.1007/978-3-319-91008-6_72. 7. Anoop S. Kumar and S. Anandarao, “Volatility spillover in crypto-currency markets: Some evidences from GARCH and wavelet analysis”, Physica A: Statistical Me- chanics and its Applications, vol. 524, pp. 448–458,15 June 2019. Available: https://doi.org/10.1016/j.physa.2019.04.154. 8. Aurelien Geron, Hands-On Machine Learning with Scikit-Learn and TensorFlow. Sebastopol, CA: O’Reilly Media Inc., 2017, 760 p. 9. Alex Sherstinsky, “Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network”, Physica D: Nonlinear Phenomena, vol. 404, 132306, March 2020. Available: https://doi.org/10.1016/j.physd.2019.132306. 10. Mikel Canizo, Isaac Triguero, Angel Conde, and Enrique Onieva, “Multi-head CNN–RNN for multi-time series anomaly detection: an industrial case study”, Neu- rocomputing, vol. 363, pp. 246–260, 21 October 2019. Available: https://doi.org/ 10.1016/j.neucom.2019.07.034. 11. Kyunghyun Cho et al., “Learning Phrase Representations using RNN Encoder– Decoder for Statistical Machine Translation”, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), October 25–29, 2014, pp. 1724–1734, Available: https://aclanthology.org/D14-1179.pdf. 12. Klaus Greff et al., LSTM: A Search Space Odyssey. 2015. Available: arXiv:1503.04069. N.I. Nedashkovskaya, D.V. Androsov ISSN 1681–6048 System Research & Information Technologies, 2022, № 1 108 13. Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning. Massachu- setts London, England: The MIT Press Cambridge, 2016, 802 p. 14. Henrik Brink, Joseph Richards, and Mark Fetherolf, Machine Learning. StPb.: Piter, 2017, 336 p. 15. Jake VanderPlas, Python Data Science Handbook. O’Reilly Media, Inc, 2016, 576 p. 16. Z. Deng et al., “Sparse stacked autoencoder network for complex system monitoring with industrial applications”, Chaos, Solitons & Fractals, vol. 137, August 2020. Available: https://doi.org/10.1016/j.chaos.2020.109838. 17. H. Zhu et al., “Stacked pruning sparse denoising autoencoder based intelligent fault diagnosis of rolling bearings”, Applied Soft Computing, vol. 88, March 2020. Avail- able: https://doi.org/10.1016/j.asoc.2019.106060. 18. N.I. Nedashkovskaya, “Method for Evaluation of the Uncertainty of the Paired Comparisons Expert Judgements when Calculating the Decision Alternatives Weights”, Journal of Automation and Information Sciences, vol. 47, no. 10, pp. 69–82, 2015. Available: https://doi.org/10.1615/JAutomatInfScien.v47.i10.70. 19. N.I. Nedashkovskaya, “A system approach to decision support on basis of hierarchi- cal and network models”, System research and information technologies, no. 1, pp. 7–18, 2018. Available: https://doi.org/10.20535/SRIT.2308-8893.2018.1.01 . 20. J.Yu, “Manifold regularized stacked denoising autoencoders with feature selection”, Neurocomputing, vol. 358, pp. 235–245, 17 September 2019. Available: https://doi.org/10.1016/j.neucom.2019.05.050. 21. N. Abiri et al., “Establishing strong imputation performance of a denoising autoen- coder in a wide range of missing data problems”, Neurocomputing, vol. 365, pp. 137–146, 6 November 2019. Available: https://doi.org/10.1016/j.neucom. 2019.07.065. 22. Diederik P. Kingma and Max Welling, Auto-Encoding Variational Bayes, 2014. Available: arXiv:1312.6114v10. 23. Ricky T.Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud, Neural ordinary differential equations. NeurIPS, 2018. 24. Yulia Rubanova, Ricky T.Q. Chen, and David K. Duvenaud, Latent ordinary differen- tial equations for irregularly-sampled time series. NeurIPS, 2019. 25. Calypso Herrera, Florian Krach, and Josef Teichmann, Neural Jump Ordinary Dif- ferential Equations: Consistent Continuous-Time Prediction and Filtering, 2020. Available: arXiv:2006.04727. 26. J. Lu et al., “Neural-ODE for pharmacokinetics modeling and its advantage to alter- native machine learning models in predicting new dosing regimens”, iScience, vol. 24, issue 7, 23 July 2021. Available: https://doi.org/10.1016/j.isci.2021.102804. Received 09.08.2021 INFORMATION ON THE ARTICLE Nadezhda I. Nedashkovskaya, ORCID: 0000-0002-8277-3095, Institute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: n.nedashkivska@gmail.com Dmytro V. Androsov, Institute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: an- drosovdmitry80@gmail.com ГЕНЕРАТИВНА МОДЕЛЬ ДЛЯ ПРОГНОЗУВАННЯ ЧАСОВИХ РЯДІВ НА ОСНОВІ АРХІТЕКТУРИ КОДУВАЛЬНИК-ДЕКОДУВАЛЬНИК / Н.І. Недашківська, Д.В. Андросов Анотація. Моделі нейронних мереж на основі архітектури кодувальник- деко- дувальник знайшли широке застосування в останні роки для розв’язання різ- номанітних задач машинного навчання. Досліджено різновиди таких моделей, Generative time series model based on encoder-decoder architecture Системні дослідження та інформаційні технології, 2022, № 1 109 серед яких розріджений, шумопригнічувальний та варіаційний автокодуваль- ники. Для прогнозування нестаціонарного часового ряду подано і протестова- но модель, що базується на варіаційному автокодувальнику, блоках рекурент- них мереж типу GRU і використовує елементи нейронних звичайних диференціальних рівнянь. На основі побудованої моделі реалізовано систему у середовищі Рython3 з використанням фреймворку TensorFlow2 та бібліотеки Keras. Розроблена система може використовуватися для моделювання проце- сів, що залежать від неперервного часу. Система мінімізує втручання людини у процес аналізу часових рядів, представляє високорівневий сучасний інтер- фейс для швидкого і зручного конструювання та навчання глибоких моделей. Ключові слова: прогнозування, варіаційний автокодувальник, рекурентна нейронна мережа типу GRU, нейронне звичайне диференціальне рівняння, ла- тентний простір, нестаціонарний часовий ряд. ГЕНЕРАТИВНАЯ МОДЕЛЬ ДЛЯ ПРОГНОЗИРОВАНИЯ ВРЕМЕННЫХ РЯДОВ НА ОСНОВЕ АРХИТЕКТУРЫ КОДИРОВЩИК-ДЕКОДИРОВЩИК / Н.И. Недашковская, Д.В. Андросов Аннотация. Модели нейронных сетей на основе архитектуры кодировщик- декодировщик нашли широкое распространение в последние годы при реше- нии различных задач машинного обучения. Исследованы разновидности таких моделей, среди которых разреженный, шумоподавляющий и вариационный автокодировщики. Для прогнозирования нестационарного временного ряда представлена и протестирована порождающая модель, которая основана на ва- риационном автокодировщике, блоках рекуррентных сетей типа GRU и испо- льзует элементы нейронных обыкновенных дифференциальных уравнений. На основе построенной модели реализована система в среде Рython3 с использо- ванием фреймворка TensorFlow2 и библиотеки Keras. Разработанная система может использоваться для моделирования процессов, зависящих от непрерыв- ного времени. Система минимизирует вмешательство человека в процесс ана- лиза временных рядов, представляет высокоуровневый современный интер- фейс для быстрого и удобного конструирования и обучения глубоких моделей. Ключевые слова: прогнозирование, вариационный автокодировщик, рекур- рентная нейронная сеть типа GRU, нейронное обыкновенное дифференциаль- ное уравнение, латентное пространство, нестационарный временной ряд.
id journaliasakpiua-article-259236
institution System research and information technologies
keywords_txt_mv keywords
language English
last_indexed 2025-07-17T10:27:53Z
publishDate 2022
publisher The National Technical University of Ukraine &quot;Igor Sikorsky Kyiv Polytechnic Institute&quot;
record_format ojs
resource_txt_mv journaliasakpiua/62/b998461a512abab8a7a4fa93b5ba8162.pdf
spelling journaliasakpiua-article-2592362022-06-21T10:27:50Z Generative time series model based on encoder-decoder architecture Генеративная модель для прогнозирования временных рядов на основе архитектуры кодировщик-декодировщик Генеративна модель для прогнозування часових рядів на основі архітектури кодувальник-декодувальник Nedashkovskaya, Nadezhda Androsov, Dmytro прогнозування варіаційний автокодувальник рекурентна нейронна мережа типу GRU нейронне звичайне диференціальне рівняння латентний простір нестаціонарний часовий ряд прогнозирование вариационный автокодировщик рекуррентная нейронная сеть типа GRU нейронное обыкновенное дифференциальное уравнение латентное пространство нестационарный временной ряд prediction variational autoencoder GRU recurrent neural network neural ordinary differential equation latent space nonstationary time series Encoder-decoder neural network models have found widespread use in recent years for solving various machine learning problems. In this paper, we investigate the variety of such models, including the sparse, denoising and variational autoencoders. To predict non-stationary time series, a generative model is presented and tested, which is based on a variational autoencoder, GRU recurrent networks, and uses elements of neural ordinary differential equations. Based on the constructed model, the system is implemented in the Python3 environment, the TensorFlow2 framework and the Keras library. The developed system can be used for modeling continuous time-dependent processes. The system minimizes a human factor in the process of time series analysis, and presents a high-level modern interface for fast and convenient construction and training of deep models. Модели нейронных сетей на основе архитектуры кодировщик- декодировщик нашли широкое распространение в последние годы при решении различных задач машинного обучения. Исследованы разновидности таких моделей, среди которых разреженный, шумоподавляющий и вариационный автокодировщики. Для прогнозирования нестационарного временного ряда представлена и протестирована порождающая модель, которая основана на вариационном автокодировщике, блоках рекуррентных сетей типа GRU и использует элементы нейронных обыкновенных дифференциальных уравнений. На основе построенной модели реализована система в среде Рython3 с использованием фреймворка TensorFlow2 и библиотеки Keras. Разработанная система может использоваться для моделирования процессов, зависящих от непрерывного времени. Система минимизирует вмешательство человека в процесс анализа временных рядов, представляет высокоуровневый современный интерфейс для быстрого и удобного конструирования и обучения глубоких моделей. Моделі нейронних мереж на основі архітектури кодувальник- декодувальник знайшли широке застосування в останні роки для розв’язання різноманітних задач машинного навчання. Досліджено різновиди таких моделей, серед яких розріджений, шумопригнічувальний та варіаційний автокодувальники. Для прогнозування нестаціонарного часового ряду подано і протестовано модель, що базується на варіаційному автокодувальнику, блоках рекурентних мереж типу GRU і використовує елементи нейронних звичайних диференціальних рівнянь. На основі побудованої моделі реалізовано систему у середовищі Рython3 з використанням фреймворку TensorFlow2 та бібліотеки Keras. Розроблена система може використовуватися для моделювання процесів, що залежать від неперервного часу. Система мінімізує втручання людини у процес аналізу часових рядів, представляє високорівневий сучасний інтерфейс для швидкого і зручного конструювання та навчання глибоких моделей. The National Technical University of Ukraine &quot;Igor Sikorsky Kyiv Polytechnic Institute&quot; 2022-04-25 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/259236 10.20535/SRIT.2308-8893.2022.1.08 System research and information technologies; No. 1 (2022); 97-109 Системные исследования и информационные технологии; № 1 (2022); 97-109 Системні дослідження та інформаційні технології; № 1 (2022); 97-109 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/259236/255872
spellingShingle прогнозування
варіаційний автокодувальник
рекурентна нейронна мережа типу GRU
нейронне звичайне диференціальне рівняння
латентний простір
нестаціонарний часовий ряд
Nedashkovskaya, Nadezhda
Androsov, Dmytro
Генеративна модель для прогнозування часових рядів на основі архітектури кодувальник-декодувальник
title Генеративна модель для прогнозування часових рядів на основі архітектури кодувальник-декодувальник
title_alt Generative time series model based on encoder-decoder architecture
Генеративная модель для прогнозирования временных рядов на основе архитектуры кодировщик-декодировщик
title_full Генеративна модель для прогнозування часових рядів на основі архітектури кодувальник-декодувальник
title_fullStr Генеративна модель для прогнозування часових рядів на основі архітектури кодувальник-декодувальник
title_full_unstemmed Генеративна модель для прогнозування часових рядів на основі архітектури кодувальник-декодувальник
title_short Генеративна модель для прогнозування часових рядів на основі архітектури кодувальник-декодувальник
title_sort генеративна модель для прогнозування часових рядів на основі архітектури кодувальник-декодувальник
topic прогнозування
варіаційний автокодувальник
рекурентна нейронна мережа типу GRU
нейронне звичайне диференціальне рівняння
латентний простір
нестаціонарний часовий ряд
topic_facet прогнозування
варіаційний автокодувальник
рекурентна нейронна мережа типу GRU
нейронне звичайне диференціальне рівняння
латентний простір
нестаціонарний часовий ряд
прогнозирование
вариационный автокодировщик
рекуррентная нейронная сеть типа GRU
нейронное обыкновенное дифференциальное уравнение
латентное пространство
нестационарный временной ряд
prediction
variational autoencoder
GRU recurrent neural network
neural ordinary differential equation
latent space
nonstationary time series
url https://journal.iasa.kpi.ua/article/view/259236
work_keys_str_mv AT nedashkovskayanadezhda generativetimeseriesmodelbasedonencoderdecoderarchitecture
AT androsovdmytro generativetimeseriesmodelbasedonencoderdecoderarchitecture
AT nedashkovskayanadezhda generativnaâmodelʹdlâprognozirovaniâvremennyhrâdovnaosnovearhitekturykodirovŝikdekodirovŝik
AT androsovdmytro generativnaâmodelʹdlâprognozirovaniâvremennyhrâdovnaosnovearhitekturykodirovŝikdekodirovŝik
AT nedashkovskayanadezhda generativnamodelʹdlâprognozuvannâčasovihrâdívnaosnovíarhítekturikoduvalʹnikdekoduvalʹnik
AT androsovdmytro generativnamodelʹdlâprognozuvannâčasovihrâdívnaosnovíarhítekturikoduvalʹnikdekoduvalʹnik