Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій

In this paper, we derive a novel method as a generalization over LCEs such as E2C. The method develops the idea of learning a locally linear state space by adding a multi-step prediction, thus allowing for more explicit control over the curvature. We show that the method outperforms E2C without dras...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2022
1. Verfasser: Tytarenko, Andrii
Format: Artikel
Sprache:Englisch
Veröffentlicht: The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2022
Schlagworte:
Online Zugang:https://journal.iasa.kpi.ua/article/view/269583
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:System research and information technologies
Завантажити файл: Pdf

Institution

System research and information technologies
_version_ 1867334430586044416
author Tytarenko, Andrii
author_facet Tytarenko, Andrii
author_institution_txt_mv [ { "author": "Andrii Tytarenko", "institution": "Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv" } ]
author_sort Tytarenko, Andrii
baseUrl_str http://journal.iasa.kpi.ua/oai
collection OJS
datestamp_date 2022-12-21T22:15:21Z
description In this paper, we derive a novel method as a generalization over LCEs such as E2C. The method develops the idea of learning a locally linear state space by adding a multi-step prediction, thus allowing for more explicit control over the curvature. We show that the method outperforms E2C without drastic model changes which come with other works, such as PCC and P3C. We discuss the relation between E2C and the presented method and derive update equations. We provide empirical evidence, which suggests that by considering the multi-step prediction, our method – ms-E2C – allows learning much better latent state spaces in terms of curvature and next state predictability. Finally, we also discuss certain stability challenges we encounter with multi-step predictions and how to mitigate them.
doi_str_mv 10.20535/SRIT.2308-8893.2022.3.09
first_indexed 2025-07-17T10:28:03Z
format Article
fulltext  A. Tytarenko, 2022 Системні дослідження та інформаційні технології, 2022, № 3 139 UDC 004.852 DOI: 10.20535/SRIT.2308-8893.2022.3.09 MULTI-STEP PREDICTION IN LINEARIZED LATENT STATE SPACES FOR REPRESENTATION LEARNING A. TYTARENKO Abstract. In this paper, we derive a novel method as a generalization over LCEs such as E2C. The method develops the idea of learning a locally linear state space by adding a multi-step prediction, thus allowing for more explicit control over the curvature. We show that the method outperforms E2C without drastic model chang- es which come with other works, such as PCC and P3C. We discuss the relation be- tween E2C and the presented method and derive update equations. We provide em- pirical evidence, which suggests that by considering the multi-step prediction, our method – ms-E2C – allows learning much better latent state spaces in terms of cur- vature and next state predictability. Finally, we also discuss certain stability chal- lenges we encounter with multi-step predictions and how to mitigate them. Keywords: representation learning, learning controllable embedding, reinforcement learning, latent state space. INTRODUCTION One of the most challenging problems which the field reinforcement learning fac- es is learning autonomous agents capable of control in Markov Decision Proc- esses (MDP) with complex state and action spaces. For instace, complactions may arise from large action spaces [1], limited ability to interact with an environment [2], partial observability (POMDP) [3, 4], etc. Optimizing a decent policy takes a lot of samples, usually requires online interactive learning and neural networks capable of processing higher dimensional observations with large number of trainable parameters [5, 6]. There are various algorithms which try to deal with the problem of sample inefficiency, or limited amount of data. Model-based reinforcement learning algorithms [7–9] try to achieve sample efficiency by approximating transition dynamics of an MDP in online or offline mode. Offline reinforcement learning methods [2, 10] strive to extract as much useful information from limited offline data as possible, in order to learn a policy applicable to online regimes as well. Another algorithmic framework – Learning Controllable Embedding (LCE) – approaches this problem by learning a lower dimensional latent state space and using simpler control algorithms, like iLQR [11], to perform control in this latent space. The challenge here is to make sure that the learned latent space has simpler structure (i.e. next states are easier to predict). Some particular instances of this framework are described in [9, 12–14]. The idea of E2C [12] is to learn a locally-linear latent space, so that algorithms like LQG could be used for goal-reaching tasks. PCC [13] tries to fix some of the is- sues encountered in E2C by deriving losses which allow for explicit minimization of latent space’s curvature. P3C [14] improves upon PCC mainly by replacing reconstruction loss, needed to make sure the learned state space carries enough A. Tytarenko ISSN 1681–6048 System Research & Information Technologies, 2022, № 3 140 information to generate (i.e. decode) observations from latent states. P3C uses predictive coding instead. In this paper, we seek an alternative approach to enforce lower latent space’s curvature and predictability. We generalize E2C by considering multiple transi- tions at a time, making sure the local linearity is not just preserved between neighbouring states. We inherit the idea of minimization of a joint log likelihood of a transition, generalize it to multiple transitions, and derive a variational bound for further minimization. We then compare the results with LCE approaches and demonstrate a visual representation of learned latent state spaces for a benchmark common among LCE papers. PRELIMINARIES We denote a Markov Decision Process (MDP) M as a tuple ),,,( TrAS , where S — state space; A — action space;  ASr : — reward function; ),|( 1 ttt assPT  — probability of state 1ts given current state ts and action taken  ta . A state of an MDP is a sufficient statistics for a transition kernel, possessing a Markov property. A task of Reinforcement Learning algorithm is for a given MDP M find a policy  , such that it maximizes the expected return. We are interested in a dis- counted return objective:       0 )(~ * ),(maxarg t tt t p asrE , where  denotes a trajectory ,...),,,( 1100 axax obtained by sampling actions using a stochastic policy  . In particular, we consider a specific class of Reinforcement Learning algo- rithms – Model Based Reinforcement Learning [7, 8]. Algorithms of this kind usually posses higher sample efficiency, but they involve some sort of an ap- proximation of a transition kernel. LCE algorithms involve parametric models (i.e. neural nets) to learn a good model with desirable properties, like linearity and predictability. This allows to use even the simplest Model-based control (and RL) algorithms like iLQR [11]. THE MULTI-STEP EMBED TO CONTROL MODEL Consider an internal transition dynamics of an MDP ),,,( TrASM  :   Pasfs ttt ~,),(1 . As we discussed previously, a function f may be highly nonlinear, thus be- ing tricky to optimize with model-based RL or control algorithms. LCE ap- proaches therefore try to learn a mapping from a state space S to some latent space Z such that its latent dynamics  ˆ),(ˆ 1 ttt azfz Multi-step prediction in linearized latent state spaces for representation learning Системні дослідження та інформаційні технології, 2022, № 3 141 has some desired properties like local linearity, low curvature, predictability, etc. In order to learn the mapping ZSQ  : , Variational Inference framework is employed to derive a tractable algorithm of maximization a likelihood of known data points under the mapping we want to learn. Optimization problem As follows from the Fig. 1, we consider a dataset },...,1|),,...,,,,,{( 1211 NisasasasD iKtKtttttt   containing samples from real trajectories gathered before training. To follow and generalize [12], we define )|( tt szPQ   as a generative model which for a given state st specifies a distribution over the latent space Z . Basically, it plays a role of the mapping from S to Z pa- rametrized with a parameter vector  . And ),ˆ|ˆ( 1 ttt azzPQ   as a generative model which for a given latent state tz and an action ta predicts the distribution for the next latest state 1tz . The model is also parametrized with a parameter vector  . Also, we denote ),ˆ|ˆ( 1 titjt j azzPQ   . In order to find  and  , we maximize the likelihood of a dataset of trajec- tory samples of length K with respect to the aforementioned parameter vectors:      N i i Kt i Kt i t i t i t i t i t sasasasP 1 1211 , ** ),,...,,,,,(maxarg, . For the sake of readability, we denote Ktt ss ,..., as Ktts : and Ktt aa ,..., as Ktta : . Thus our objective is: Fig. 1. Graphical models for E2C and ms-E2C(K): dashed lines – state reconstruction process zt zt s t+1 s t+K 1ˆ tz stst s t+1 s t+2 2 Q1 Q 2ˆ tz1ˆ tz 1ˆ tz KQ Qφ P Qφ P Qφ P Qφ P Qφ P Qφ P Q atat at+2 at+K-1 E2C ms-E2C(K) A. Tytarenko ISSN 1681–6048 System Research & Information Technologies, 2022, № 3 142      N i i Ktt i Ktt asP 1 1:: , ** ),(maxarg, . A corresponding graphical model is depicted on Fig. 1. Optimization objective The objective we defined in a previous subsection is known to be intractable and difficult to optimize. Therefore, LCE approaches employ Variational Inference to find a lower bound to the log-likelihood objective. In this section, we derive this bound for the proposed probabilistic model. Variational lower bound:   ),(log 1:: KttKtt asP )](||[)|(log)ˆ|(log ,...,1 ,...,1 ~ˆ ~ ZPQDzsPzsPE KL Kj ttjtjt Kj Qz Qz j jt t                  Here ]||[ QPDKL denotes Kullback–Leibler divergence functional: )( )( log]||[ ~ xQ xP EQPD PxKL  . Proof.     Kttt zz KtttKtttKttKttKttKtt zddzzzasPasxP :1ˆ, :1:11::1:: ˆ)ˆ,,,(log),(log     Kttt zz KttKtttKtttKttKtt azzPzzasP :1ˆ, 1::1:11:: )|ˆ,()ˆ,,|(log KtttKtt zddzaP  :11: ˆ)(     Kttt zz KttKtttKtttKttKtt azzPzzasP :1ˆ, 1::1:11:: )|ˆ,()ˆ,,|(log KtttKtt zddz Q Q aP     :11: ˆ)(       Kttt zz t K j j KtttKttKtt zPQzzasP :1ˆ, 1 :11:: )()()ˆ,,|(log KtttKtt zddz Q Q aP     :11: ˆ)(           Kttt zz KtttKttt K j jtjt j tt zddz Q Q aPzPzsPQzsP :1ˆ, :11: 1 ˆ)()())ˆ|(()|(log                   Q zP aPzsPzsPE t Ktt K j jtjttt Kj Qz Qz j jt t )( )())ˆ|(()|(log 1: 1 ,...,1 ~ˆ ~ Multi-step prediction in linearized latent state spaces for representation learning Системні дослідження та інформаційні технології, 2022, № 3 143                 Q zP zsPzsPE t Kj ttjtjt Kj Qz Qz j jt t )( log)|(log)ˆ|(log ,...,1 ,...,1 ~ˆ ~ )](||[)|(log)ˆ|(log ,...,1 ,...,1 ~ˆ ~ ZPQDzsPzsPE KL Kj ttjtjt Kj Qz Qz j jt t                  . Multi-step embed-to-control model (ms-E2C) In this section we instantiate a model for learning a latent locally-linear state space. We use a previously derived upper bound for negative loglikelihood over the multi-step trajectory samples. A graphical models for both E2C and ms- E2C( K ) are shown on the Fig. 1. First, we instantiate parametric models for encoding and dynamics func- tion as: ))(),(()|( tttt ssNszPQ   — encoder; );()(),(   ttt sNeuralNetss ; );(,,  tttt zNeuralNetoBA – latent dynamics; ),(),ˆ|ˆ( 11   jj jtjtjt j NazzPQ – dynamics; tjtt j t j oaBA     1 1 for Kj ,...,2 ; tttt j oaBA   for 1j ; W T t j t j AA     1 for Kj ,...,2 ; W T tt j AA   for 1j . Thus, given an optimal model would imply a locally linear latent space, in which curvature (i.e. linearity) is explicitly controlled by changing the number of steps per sample. Choosing a large K would recover a globally linear model and setting 1K recovers an E2C model. As it follows from the figure, ms-E2C is a generalization of E2C, which one recovers by setting 1K . We also have to specify a parametrized decoding model, which is needed to compute the upper bound, and to enforce a ”reconstruction” constraint, intro- duced in [12] and generalized for our multi-step model: ))(()|( ttt zpBernoullizsPP   ; );()(  tt zNeuralNetzp ; A. Tytarenko ISSN 1681–6048 System Research & Information Technologies, 2022, № 3 144 ))ˆ(()ˆ|( jtjtt j zpBernoullizsPP   ; );ˆ()ˆ(   jtjt zNeuralNetzp . Bernoulli distribution is chosen for comparability with E2C model on shared benchmark MDPs, i.e. Planar, where the original state space consists of black-and-white images of a grid world with white obstacles and a white circle denoting the position of the agent. Loss function. In order to complete the model’s specification, we have to provide a loss function optimizable via stochastic gradient descent. In msE2C it consists of three terms: an estimation of the derived upper bound, consistency term, and stability term.                 Kj ttjtjt Kj Qz Qziupper zsPzsPEDL j jt t ,...,1 ,...,1 ~ˆ ~ )|(log)ˆ|(log),,;( )](||[ ZPQDKL  The expectation is estimated using a one-sample estimate and a reparametri- zation trick widely used in variational auto-encoders:    K j j KLiyconsistenc QQDDL 1 ]||[),;( ; ))(())((),;( ttttistability zBGershzAGershDL  . Here )(XGersh denotes Gershgorin loss [15, 16]: )||,0max()( 1 ,,     n i ij jiii XXXGersh , where || , jiX denotes a minor of a matrix  X , and 0 is a small constant. According to the Theorem 1 from [15], if the loss value is non-positive, all eigenvalues of a matrix X are guaranteed to have a negative real part, thus ensur- ing dynamical system stability. The usage of Gershgorin loss in composite loss function is mandatory, as ms-E2C( K ) diverges for larger K . Algorithm Now, we summarize an algorithm for fitting the instance of ms-E2C model we described earlier. 1. Sample a dataset of sub-trajectories using a pretrained or random policy: },...,1|),,...,,,,,{( 1211 NisasasasD iKtKtttttt   . 2. Initialize the weights of neural nets  ,, . 3. Repeat for c epochs: a) Retrieve a sample iD from the dataset D Multi-step prediction in linearized latent state spaces for representation learning Системні дослідження та інформаційні технології, 2022, № 3 145 b) Compute updated weights using a stochastic gradient descent step: ))()()((' 21 istabilityiyconsistenciupper DLDLDL   ; ))()()((' 21 istabilityiyconsistenciupper DLDLDL   ; )(' iupper DL ; c) Update neural networks’ parameters:  ,  ,   . Here 21,  are tunable hyperparameters. One might notice that unlike [15] we do not introduce an inner optimization loop to ensure stability of the internal latent space dynamics. Instead, we add the stability loss to the composed loss function. We found that although the difference is apparent during a few first epochs, it becomes negligible after a while. Stability condition does not get violated and the general results are almost the same. EXPERIMENTAL VALIDATION Planar system Following [12–14], we use a Planar benchmark to compare the performance of the algorithms. In it, a state space is represented as a black-and-white image of a grid world with obstacles. In order to collect a dataset, we sample a random initial state and perform a series of random actions to obtain a trajectory of length K . As in [12, 13], we use a deconvolutional network architecture [17] for image reconstruction from the latent state. For the sake of comparability, we chose the same architecture as in other papers on the topic. The visualizations of the obtained latent state spaces are provided on a Fig. 2. The numerical results are summarized in a Table. Comparison of reconstruction and prediction losses Method State Loss )|(log tt zsP Next State Loss ),|(log 1 ttt assP  Non-linear E2C 5.42.9  8.87.11  Global E2C 7.56.7  2.56.10  E2C 3.26.7  7.21.10  ms-E2C(3) 7.13.7  9.17.8  ms-E2C(5) 1.26.7  6.15.7  ms-E2C(7) 0.27.7  9.03.6  State loss is a regular reconstruction loss. As we observe, ms-E2C( K ) give only slight average improvements on it, which is entirely expected. The intro- duced method does not change the architecture of a decoding network nor does it add any improvements to the algorithm regarding this matter. An important thing to notice though, is that our generalization does not make the reconstruction per- A. Tytarenko ISSN 1681–6048 System Research & Information Technologies, 2022, № 3 146 formance much worse, which might be expected as representation is influenced by addition prediction constraints. Next state is computed by encoding the state t Q t zs   , predicting the next latent state 1 1    t Q t zz , and decoding the predicted regular state 11   t P t sz . Now, the results for previous methods were reproduced with slight perturbations, as we used our own codebase for it. It’s worth noting that E2C results coincide with other papers which involved reproduction of E2C [13, 14], while the original paper provides better visuals. A visualization is obtained by transforming all possible environment states with the network Q . See the scheme on the left for details. CONCLUSION In this paper, a novel method had been derived as a generalization over the previ- ous works on LCEs. We demostrate, how the method improves upon E2C without drastic model changes which come with other works, such as PCC and P3C. We empirically show, that by considering the multistep prediction ms-E2C allows to learn a much better latent state spaces in terms of curvature and predictability, by adding a simple yet efficient way to explicitly control the desired curvature of a resulting space. At implementation is available at [18]. Moreover, our work introduces a new dimension to the LCE family of algorithms. Our future work will focus on using the approaches from the state of the art LCE methods, like predictive coding to make LCEs applicable to the higher dimensional real-world MDPs with limited amount of data to learn dynamics embedding from. We will also explore an intriguing possibility to not only encode the state, but also the action space, which sometimes has the complex structure. Lastly, we would like to study various extensions of the method to imitation learning and model-based reinforcement learning. REFERENCES 1. G. Dulac-Arnold et al., “Deep reinforcement learning in large discrete action spac- es,” arXiv preprint arXiv:1512.07679, 2015. doi: 10.48550/arXiv.1512.07679. Fig 2. A comparison of latent state spaces learned by E2C and ms-E2C methods. ms-E2C(3) ms-E2C(5) ms-E2C(7) E2C Multi-step prediction in linearized latent state spaces for representation learning Системні дослідження та інформаційні технології, 2022, № 3 147 2. S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tuto- rial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020. doi: 10.48550/arXiv.2005.01643. 3. E.A. Feinberg, P.O. Kasyanov, and M.Z. Zgurovsky, “Partially observable total-cost markov decision processes with weakly continuous transition probabilities,” Mathe- matics of Operations Research, vol. 41, no. 2, pp. 656–681, 2016. doi: 10.1287/moor.2015.0746. 4. E.A. Feinberg, P.O. Kasyanov, and M.Z. Zgurovsky, “Convergence of probability measures and markov decision models with incomplete information,” Proceedings of the Steklov Institute of Mathematics, vol. 287, no. 1, pp. 96–117, 2014. doi: 10.1134/S0081543814080069. 5. O. Vinyals et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019. doi: 10.1038/s41586-019- 1724-z. 6. S. Reed et al., “A generalist agent,” arXiv preprint arXiv:2205.06175, 2022. 7. D. Ha and J. Schmidhuber, “World models,” arXiv preprint arXiv:1803.10122, 2018. doi: 10.48550/arXiv.1803.10122. 8. T.M. Moerland, J. Broekens, and C.M. Jonker, “Model-based reinforcement learning: A survey,” arXiv preprint arXiv:2006.16712, 2020. doi: 10.48550/arXiv.2006.16712. 9. D. Hafner et al., “Learning latent dynamics for planning from pixels,” in Interna- tional conference on machine learning, PMLR, 2019, pp. 2555–2565. doi: 10.48550/arXiv.1811.04551. 10. R.F. Prudencio, M.R. Maximo, and E.L. Colombini, “A survey on offline reinforce- ment learning: Taxonomy, review, and open problems,” arXiv preprint arXiv:2203.01387, 2022. doi: 10.48550/arXiv.2203.01387. 11. W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear bio- logical movement systems,” in ICINCO (1), Citeseer, 2004, pp. 222–229. doi:10.5220/0001143902220229. 12. M. Watter, J. Springenberg, J. Boedecker, and M. Riedmiller, “Embed to control: A locally linear latent dynamics model for control from raw images,” Advances in neural information processing systems, vol. 28, 2015. 13. N. Levine, Y. Chow, R. Shu, A. Li, M. Ghavamzadeh, and H. Bui, “Prediction, consistency, curvature: Representation learning for locallylinear con- trol,” arXiv preprint arXiv:1909.01506, 2019. 14. R. Shu et al., “Predictive coding for locally-linear control,” in International Conference on Machine Learning, PMLR, 2020, pp. 8862–8871. doi: 10.5555/3524938.3525760. 15. M. Lechner, R. Hasani, D. Rus, and R. Grosu, “Gershgorin loss stabilizes the recur- rent neural network compartment of an end-to-end robot learning scheme,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020, pp. 5446–5452. doi: 10.1109/ICRA40945.2020.9196608. 16. R.A. Horn and C.R. Johnson, Matrix analysis. Cambridge university press, 2012. doi: 10.5555/2422911. 17. M.D. Zeiler, D. Krishnan, G.W. Taylor, and R. Fergus, “Deconvolutional networks,” in 2010 IEEE Computer Society Conference on computer vision and pattern recogni- tion, pp. 2528–2535. doi: 10.1109/CVPR.2010.5539957 18. A. Tytarenko, Rl-research. Available: https://github.com/titardrew/rl-research, 2022. Received 31.08.2022 A. Tytarenko ISSN 1681–6048 System Research & Information Technologies, 2022, № 3 148 INFORMATION ON THE ARTICLE Andrii M. Tytarenko, ORCID: 0000-0002-8265-642X, Educational and Research Insti- tute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: titarenkoan@gmail.com БАГАТОКРОКОВЕ ПРОГНОЗУВАННЯ В ЛІНЕАРИЗОВАНИХ ЛАТЕНТНИХ ПРОСТОРАХ ДЛЯ НАВЧАННЯ РЕПРЕЗИНТАЦІЙ / А.М. Титаренко Анотація. Запропоновано новий метод, що узагальнює підходи LCE, такі як E2C. Метод розвиває ідею вивчення локально-лінійного простору станів шля- хом розглядання багатокрокового прогнозування, що дає змогу чіткіше конт- ролювати кривизну шуканого простору. Продемонстровано, що метод переве- ршує E2C без суттєвих змін загальної моделі, на відміну від інших робіт, таких як PCC і P3C. Розглянуто зв’язок між E2C і запропонованим методом та між їх відповідними рівняннями оновлень. Подано емпіричні докази, які свідчать, що ms-E2C дозволяє набагато краще вивчати простори прихованих станів з точки зору кривизни та прогнозованості наступних станів. Крім того, висвітлено пе- вні проблеми стабільності, пов’язані з багатокроковими прогнозами, та спосо- би їх вирішення. Ключові слова: навчання репрезентацій, навчання керованих просторів, навчання з підкріпленням, латентний простір станів.
id journaliasakpiua-article-269583
institution System research and information technologies
keywords_txt_mv keywords
language English
last_indexed 2025-07-17T10:28:03Z
publishDate 2022
publisher The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
record_format ojs
resource_txt_mv journaliasakpiua/92/b4cabdfa65798843a79f555aecf91292.pdf
spelling journaliasakpiua-article-2695832022-12-21T22:15:21Z Multi-step prediction in linearized latent state spaces for representation learning Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій Tytarenko, Andrii representation learning learning controllable embedding reinforcement learning latent state space навчання репрезентацій навчання керованих просторів навчання з підкріпленням латентний простір станів In this paper, we derive a novel method as a generalization over LCEs such as E2C. The method develops the idea of learning a locally linear state space by adding a multi-step prediction, thus allowing for more explicit control over the curvature. We show that the method outperforms E2C without drastic model changes which come with other works, such as PCC and P3C. We discuss the relation between E2C and the presented method and derive update equations. We provide empirical evidence, which suggests that by considering the multi-step prediction, our method – ms-E2C – allows learning much better latent state spaces in terms of curvature and next state predictability. Finally, we also discuss certain stability challenges we encounter with multi-step predictions and how to mitigate them. Запропоновано новий метод, що узагальнює підходи LCE, такі як E2C. Метод розвиває ідею вивчення локально-лінійного простору станів шляхом розглядання багатокрокового прогнозування, що дає змогу чіткіше контролювати кривизну шуканого простору. Продемонстровано, що метод перевершує E2C без суттєвих змін загальної моделі, на відміну від інших робіт, таких як PCC і P3C. Розглянуто зв’язок між E2C і запропонованим методом та між їх відповідними рівняннями оновлень. Подано емпіричні докази, які свідчать, що ms-E2C дозволяє набагато краще вивчати простори прихованих станів з точки зору кривизни та прогнозованості наступних станів. Крім того, висвітлено певні проблеми стабільності, пов’язані з багатокроковими прогнозами, та способи їх вирішення. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2022-10-30 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/269583 10.20535/SRIT.2308-8893.2022.3.09 System research and information technologies; No. 3 (2022); 139-148 Системные исследования и информационные технологии; № 3 (2022); 139-148 Системні дослідження та інформаційні технології; № 3 (2022); 139-148 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/269583/265053
spellingShingle навчання репрезентацій
навчання керованих просторів
навчання з підкріпленням
латентний простір станів
Tytarenko, Andrii
Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій
title Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій
title_alt Multi-step prediction in linearized latent state spaces for representation learning
title_full Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій
title_fullStr Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій
title_full_unstemmed Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій
title_short Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій
title_sort багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій
topic навчання репрезентацій
навчання керованих просторів
навчання з підкріпленням
латентний простір станів
topic_facet representation learning
learning controllable embedding
reinforcement learning
latent state space
навчання репрезентацій
навчання керованих просторів
навчання з підкріпленням
латентний простір станів
url https://journal.iasa.kpi.ua/article/view/269583
work_keys_str_mv AT tytarenkoandrii multisteppredictioninlinearizedlatentstatespacesforrepresentationlearning
AT tytarenkoandrii bagatokrokoveprognozuvannâvlínearizovanihlatentnihprostorahdlânavčannâreprezintacíj