Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій
In this paper, we derive a novel method as a generalization over LCEs such as E2C. The method develops the idea of learning a locally linear state space by adding a multi-step prediction, thus allowing for more explicit control over the curvature. We show that the method outperforms E2C without dras...
Gespeichert in:
| Datum: | 2022 |
|---|---|
| 1. Verfasser: | |
| Format: | Artikel |
| Sprache: | Englisch |
| Veröffentlicht: |
The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"
2022
|
| Schlagworte: | |
| Online Zugang: | https://journal.iasa.kpi.ua/article/view/269583 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | System research and information technologies |
| Завантажити файл: | |
Institution
System research and information technologies| _version_ | 1867334430586044416 |
|---|---|
| author | Tytarenko, Andrii |
| author_facet | Tytarenko, Andrii |
| author_institution_txt_mv | [
{
"author": "Andrii Tytarenko",
"institution": "Educational and Research Institute for Applied System Analysis of the National Technical University of Ukraine \"Igor Sikorsky Kyiv Polytechnic Institute\", Kyiv"
}
] |
| author_sort | Tytarenko, Andrii |
| baseUrl_str | http://journal.iasa.kpi.ua/oai |
| collection | OJS |
| datestamp_date | 2022-12-21T22:15:21Z |
| description | In this paper, we derive a novel method as a generalization over LCEs such as E2C. The method develops the idea of learning a locally linear state space by adding a multi-step prediction, thus allowing for more explicit control over the curvature. We show that the method outperforms E2C without drastic model changes which come with other works, such as PCC and P3C. We discuss the relation between E2C and the presented method and derive update equations. We provide empirical evidence, which suggests that by considering the multi-step prediction, our method – ms-E2C – allows learning much better latent state spaces in terms of curvature and next state predictability. Finally, we also discuss certain stability challenges we encounter with multi-step predictions and how to mitigate them. |
| doi_str_mv | 10.20535/SRIT.2308-8893.2022.3.09 |
| first_indexed | 2025-07-17T10:28:03Z |
| format | Article |
| fulltext |
A. Tytarenko, 2022
Системні дослідження та інформаційні технології, 2022, № 3 139
UDC 004.852
DOI: 10.20535/SRIT.2308-8893.2022.3.09
MULTI-STEP PREDICTION IN LINEARIZED LATENT STATE
SPACES FOR REPRESENTATION LEARNING
A. TYTARENKO
Abstract. In this paper, we derive a novel method as a generalization over LCEs
such as E2C. The method develops the idea of learning a locally linear state space
by adding a multi-step prediction, thus allowing for more explicit control over the
curvature. We show that the method outperforms E2C without drastic model chang-
es which come with other works, such as PCC and P3C. We discuss the relation be-
tween E2C and the presented method and derive update equations. We provide em-
pirical evidence, which suggests that by considering the multi-step prediction, our
method – ms-E2C – allows learning much better latent state spaces in terms of cur-
vature and next state predictability. Finally, we also discuss certain stability chal-
lenges we encounter with multi-step predictions and how to mitigate them.
Keywords: representation learning, learning controllable embedding, reinforcement
learning, latent state space.
INTRODUCTION
One of the most challenging problems which the field reinforcement learning fac-
es is learning autonomous agents capable of control in Markov Decision Proc-
esses (MDP) with complex state and action spaces. For instace, complactions may
arise from large action spaces [1], limited ability to interact with an environment
[2], partial observability (POMDP) [3, 4], etc. Optimizing a decent policy takes a
lot of samples, usually requires online interactive learning and neural networks
capable of processing higher dimensional observations with large number of
trainable parameters [5, 6].
There are various algorithms which try to deal with the problem of sample
inefficiency, or limited amount of data. Model-based reinforcement learning
algorithms [7–9] try to achieve sample efficiency by approximating transition
dynamics of an MDP in online or offline mode. Offline reinforcement learning
methods [2, 10] strive to extract as much useful information from limited offline
data as possible, in order to learn a policy applicable to online regimes as well.
Another algorithmic framework – Learning Controllable Embedding (LCE)
– approaches this problem by learning a lower dimensional latent state space and
using simpler control algorithms, like iLQR [11], to perform control in this latent
space. The challenge here is to make sure that the learned latent space has simpler
structure (i.e. next states are easier to predict).
Some particular instances of this framework are described in [9, 12–14]. The
idea of E2C [12] is to learn a locally-linear latent space, so that algorithms like
LQG could be used for goal-reaching tasks. PCC [13] tries to fix some of the is-
sues encountered in E2C by deriving losses which allow for explicit minimization
of latent space’s curvature. P3C [14] improves upon PCC mainly by replacing
reconstruction loss, needed to make sure the learned state space carries enough
A. Tytarenko
ISSN 1681–6048 System Research & Information Technologies, 2022, № 3 140
information to generate (i.e. decode) observations from latent states. P3C uses
predictive coding instead.
In this paper, we seek an alternative approach to enforce lower latent space’s
curvature and predictability. We generalize E2C by considering multiple transi-
tions at a time, making sure the local linearity is not just preserved between
neighbouring states. We inherit the idea of minimization of a joint log likelihood
of a transition, generalize it to multiple transitions, and derive a variational bound
for further minimization. We then compare the results with LCE approaches and
demonstrate a visual representation of learned latent state spaces for a benchmark
common among LCE papers.
PRELIMINARIES
We denote a Markov Decision Process (MDP) M as a tuple ),,,( TrAS , where
S — state space; A — action space; ASr : — reward function;
),|( 1 ttt assPT — probability of state 1ts given current state ts and action
taken ta .
A state of an MDP is a sufficient statistics for a transition kernel, possessing
a Markov property.
A task of Reinforcement Learning algorithm is for a given MDP M find a
policy , such that it maximizes the expected return. We are interested in a dis-
counted return objective:
0
)(~
* ),(maxarg
t
tt
t
p asrE ,
where denotes a trajectory ,...),,,( 1100 axax obtained by sampling actions using
a stochastic policy .
In particular, we consider a specific class of Reinforcement Learning algo-
rithms – Model Based Reinforcement Learning [7, 8]. Algorithms of this kind
usually posses higher sample efficiency, but they involve some sort of an ap-
proximation of a transition kernel. LCE algorithms involve parametric models
(i.e. neural nets) to learn a good model with desirable properties, like linearity and
predictability. This allows to use even the simplest Model-based control (and RL)
algorithms like iLQR [11].
THE MULTI-STEP EMBED TO CONTROL MODEL
Consider an internal transition dynamics of an MDP ),,,( TrASM :
Pasfs ttt ~,),(1 .
As we discussed previously, a function f may be highly nonlinear, thus be-
ing tricky to optimize with model-based RL or control algorithms. LCE ap-
proaches therefore try to learn a mapping from a state space S to some latent
space Z such that its latent dynamics
ˆ),(ˆ
1 ttt azfz
Multi-step prediction in linearized latent state spaces for representation learning
Системні дослідження та інформаційні технології, 2022, № 3 141
has some desired properties like local linearity, low curvature, predictability, etc.
In order to learn the mapping ZSQ : , Variational Inference framework
is employed to derive a tractable algorithm of maximization a likelihood of
known data points under the mapping we want to learn.
Optimization problem
As follows from the Fig. 1, we consider a dataset
},...,1|),,...,,,,,{( 1211 NisasasasD iKtKtttttt
containing samples from real trajectories gathered before training. To follow and
generalize [12], we define
)|( tt szPQ
as a generative model which for a given state st specifies a distribution over the
latent space Z . Basically, it plays a role of the mapping from S to Z pa-
rametrized with a parameter vector . And
),ˆ|ˆ( 1 ttt azzPQ
as a generative model which for a given latent state tz and an action ta predicts
the distribution for the next latest state 1tz . The model is also parametrized with
a parameter vector . Also, we denote
),ˆ|ˆ( 1 titjt
j azzPQ .
In order to find and , we maximize the likelihood of a dataset of trajec-
tory samples of length K with respect to the aforementioned parameter vectors:
N
i
i
Kt
i
Kt
i
t
i
t
i
t
i
t
i
t sasasasP
1
1211
,
** ),,...,,,,,(maxarg, .
For the sake of readability, we denote Ktt ss ,..., as Ktts : and Ktt aa ,..., as
Ktta : . Thus our objective is:
Fig. 1. Graphical models for E2C and ms-E2C(K): dashed lines – state reconstruction
process
zt zt
s t+1 s t+K
1ˆ tz
stst s t+1 s t+2
2
Q1
Q
2ˆ tz1ˆ tz 1ˆ tz
KQ
Qφ P Qφ P Qφ P Qφ P Qφ P Qφ P
Q
atat at+2 at+K-1
E2C ms-E2C(K)
A. Tytarenko
ISSN 1681–6048 System Research & Information Technologies, 2022, № 3 142
N
i
i
Ktt
i
Ktt asP
1
1::
,
** ),(maxarg, .
A corresponding graphical model is depicted on Fig. 1.
Optimization objective
The objective we defined in a previous subsection is known to be intractable and
difficult to optimize. Therefore, LCE approaches employ Variational Inference to
find a lower bound to the log-likelihood objective. In this section, we derive this
bound for the proposed probabilistic model. Variational lower bound:
),(log 1:: KttKtt asP
)](||[)|(log)ˆ|(log
,...,1
,...,1
~ˆ
~ ZPQDzsPzsPE KL
Kj
ttjtjt
Kj
Qz
Qz
j
jt
t
Here ]||[ QPDKL denotes Kullback–Leibler divergence functional:
)(
)(
log]||[ ~ xQ
xP
EQPD PxKL .
Proof.
Kttt zz
KtttKtttKttKttKttKtt zddzzzasPasxP
:1ˆ,
:1:11::1:: ˆ)ˆ,,,(log),(log
Kttt zz
KttKtttKtttKttKtt azzPzzasP
:1ˆ,
1::1:11:: )|ˆ,()ˆ,,|(log
KtttKtt zddzaP :11: ˆ)(
Kttt zz
KttKtttKtttKttKtt azzPzzasP
:1ˆ,
1::1:11:: )|ˆ,()ˆ,,|(log
KtttKtt zddz
Q
Q
aP
:11: ˆ)(
Kttt zz
t
K
j
j
KtttKttKtt zPQzzasP
:1ˆ, 1
:11:: )()()ˆ,,|(log
KtttKtt zddz
Q
Q
aP
:11: ˆ)(
Kttt zz
KtttKttt
K
j
jtjt
j
tt zddz
Q
Q
aPzPzsPQzsP
:1ˆ,
:11:
1
ˆ)()())ˆ|(()|(log
Q
zP
aPzsPzsPE t
Ktt
K
j
jtjttt
Kj
Qz
Qz
j
jt
t
)(
)())ˆ|(()|(log 1:
1
,...,1
~ˆ
~
Multi-step prediction in linearized latent state spaces for representation learning
Системні дослідження та інформаційні технології, 2022, № 3 143
Q
zP
zsPzsPE t
Kj
ttjtjt
Kj
Qz
Qz
j
jt
t
)(
log)|(log)ˆ|(log
,...,1
,...,1
~ˆ
~
)](||[)|(log)ˆ|(log
,...,1
,...,1
~ˆ
~ ZPQDzsPzsPE KL
Kj
ttjtjt
Kj
Qz
Qz
j
jt
t
.
Multi-step embed-to-control model (ms-E2C)
In this section we instantiate a model for learning a latent locally-linear state
space. We use a previously derived upper bound for negative loglikelihood over
the multi-step trajectory samples. A graphical models for both E2C and ms-
E2C( K ) are shown on the Fig. 1.
First, we instantiate parametric models for encoding and dynamics func-
tion as:
))(),(()|( tttt ssNszPQ — encoder;
);()(),( ttt sNeuralNetss ;
);(,, tttt zNeuralNetoBA – latent dynamics;
),(),ˆ|ˆ( 11 jj
jtjtjt
j NazzPQ – dynamics;
tjtt
j
t
j oaBA
1
1 for Kj ,...,2 ;
tttt
j oaBA for 1j ;
W
T
t
j
t
j AA
1 for Kj ,...,2 ;
W
T
tt
j AA for 1j .
Thus, given an optimal model would imply a locally linear latent space, in
which curvature (i.e. linearity) is explicitly controlled by changing the number of
steps per sample. Choosing a large K would recover a globally linear model and
setting 1K recovers an E2C model.
As it follows from the figure, ms-E2C is a generalization of E2C, which one
recovers by setting 1K .
We also have to specify a parametrized decoding model, which is needed
to compute the upper bound, and to enforce a ”reconstruction” constraint, intro-
duced in [12] and generalized for our multi-step model:
))(()|( ttt zpBernoullizsPP ;
);()( tt zNeuralNetzp ;
A. Tytarenko
ISSN 1681–6048 System Research & Information Technologies, 2022, № 3 144
))ˆ(()ˆ|( jtjtt
j zpBernoullizsPP ;
);ˆ()ˆ( jtjt zNeuralNetzp .
Bernoulli distribution is chosen for comparability with E2C model on
shared benchmark MDPs, i.e. Planar, where the original state space consists of
black-and-white images of a grid world with white obstacles and a white circle
denoting the position of the agent.
Loss function. In order to complete the model’s specification, we have to
provide a loss function optimizable via stochastic gradient descent. In msE2C it
consists of three terms: an estimation of the derived upper bound, consistency
term, and stability term.
Kj
ttjtjt
Kj
Qz
Qziupper zsPzsPEDL
j
jt
t
,...,1
,...,1
~ˆ
~ )|(log)ˆ|(log),,;(
)](||[ ZPQDKL
The expectation is estimated using a one-sample estimate and a reparametri-
zation trick widely used in variational auto-encoders:
K
j
j
KLiyconsistenc QQDDL
1
]||[),;( ;
))(())((),;( ttttistability zBGershzAGershDL .
Here )(XGersh denotes Gershgorin loss [15, 16]:
)||,0max()(
1
,,
n
i ij
jiii XXXGersh ,
where || , jiX denotes a minor of a matrix X , and 0 is a small constant.
According to the Theorem 1 from [15], if the loss value is non-positive, all
eigenvalues of a matrix X are guaranteed to have a negative real part, thus ensur-
ing dynamical system stability. The usage of Gershgorin loss in composite loss
function is mandatory, as ms-E2C( K ) diverges for larger K .
Algorithm
Now, we summarize an algorithm for fitting the instance of ms-E2C model we
described earlier.
1. Sample a dataset of sub-trajectories using a pretrained or random policy:
},...,1|),,...,,,,,{( 1211 NisasasasD iKtKtttttt .
2. Initialize the weights of neural nets ,, .
3. Repeat for c epochs:
a) Retrieve a sample iD from the dataset D
Multi-step prediction in linearized latent state spaces for representation learning
Системні дослідження та інформаційні технології, 2022, № 3 145
b) Compute updated weights using a stochastic gradient descent step:
))()()((' 21 istabilityiyconsistenciupper DLDLDL ;
))()()((' 21 istabilityiyconsistenciupper DLDLDL ;
)(' iupper DL ;
c) Update neural networks’ parameters:
, , .
Here 21, are tunable hyperparameters.
One might notice that unlike [15] we do not introduce an inner optimization
loop to ensure stability of the internal latent space dynamics. Instead, we add the
stability loss to the composed loss function. We found that although the difference
is apparent during a few first epochs, it becomes negligible after a while. Stability
condition does not get violated and the general results are almost the same.
EXPERIMENTAL VALIDATION
Planar system
Following [12–14], we use a Planar benchmark to compare the performance of
the algorithms. In it, a state space is represented as a black-and-white image of a
grid world with obstacles. In order to collect a dataset, we sample a random initial
state and perform a series of random actions to obtain a trajectory of length K .
As in [12, 13], we use a deconvolutional network architecture [17] for image
reconstruction from the latent state. For the sake of comparability, we chose the
same architecture as in other papers on the topic.
The visualizations of the obtained latent state spaces are provided on a
Fig. 2. The numerical results are summarized in a Table.
Comparison of reconstruction and prediction losses
Method
State Loss
)|(log tt zsP
Next State Loss
),|(log 1 ttt assP
Non-linear E2C 5.42.9 8.87.11
Global E2C 7.56.7 2.56.10
E2C 3.26.7 7.21.10
ms-E2C(3) 7.13.7 9.17.8
ms-E2C(5) 1.26.7 6.15.7
ms-E2C(7) 0.27.7 9.03.6
State loss is a regular reconstruction loss. As we observe, ms-E2C( K ) give
only slight average improvements on it, which is entirely expected. The intro-
duced method does not change the architecture of a decoding network nor does it
add any improvements to the algorithm regarding this matter. An important thing
to notice though, is that our generalization does not make the reconstruction per-
A. Tytarenko
ISSN 1681–6048 System Research & Information Technologies, 2022, № 3 146
formance much worse, which might be expected as representation is influenced
by addition prediction constraints. Next state is computed by encoding the state
t
Q
t zs
, predicting the next latent state 1
1
t
Q
t zz , and decoding the predicted
regular state 11
t
P
t sz . Now, the results for previous methods were reproduced
with slight perturbations, as we used our own codebase for it.
It’s worth noting that E2C results coincide with other papers which involved
reproduction of E2C [13, 14], while the original paper provides better visuals. A
visualization is obtained by transforming all possible environment states with the
network Q . See the scheme on the left for details.
CONCLUSION
In this paper, a novel method had been derived as a generalization over the previ-
ous works on LCEs. We demostrate, how the method improves upon E2C without
drastic model changes which come with other works, such as PCC and P3C. We
empirically show, that by considering the multistep prediction ms-E2C allows to
learn a much better latent state spaces in terms of curvature and predictability, by
adding a simple yet efficient way to explicitly control the desired curvature of a
resulting space. At implementation is available at [18].
Moreover, our work introduces a new dimension to the LCE family of
algorithms. Our future work will focus on using the approaches from the state of
the art LCE methods, like predictive coding to make LCEs applicable to the
higher dimensional real-world MDPs with limited amount of data to learn
dynamics embedding from. We will also explore an intriguing possibility to not
only encode the state, but also the action space, which sometimes has the complex
structure. Lastly, we would like to study various extensions of the method to
imitation learning and model-based reinforcement learning.
REFERENCES
1. G. Dulac-Arnold et al., “Deep reinforcement learning in large discrete action spac-
es,” arXiv preprint arXiv:1512.07679, 2015. doi: 10.48550/arXiv.1512.07679.
Fig 2. A comparison of latent state spaces learned by E2C and ms-E2C methods.
ms-E2C(3) ms-E2C(5) ms-E2C(7)
E2C
Multi-step prediction in linearized latent state spaces for representation learning
Системні дослідження та інформаційні технології, 2022, № 3 147
2. S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tuto-
rial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643,
2020. doi: 10.48550/arXiv.2005.01643.
3. E.A. Feinberg, P.O. Kasyanov, and M.Z. Zgurovsky, “Partially observable total-cost
markov decision processes with weakly continuous transition probabilities,” Mathe-
matics of Operations Research, vol. 41, no. 2, pp. 656–681, 2016. doi:
10.1287/moor.2015.0746.
4. E.A. Feinberg, P.O. Kasyanov, and M.Z. Zgurovsky, “Convergence of probability
measures and markov decision models with incomplete information,” Proceedings of
the Steklov Institute of Mathematics, vol. 287, no. 1, pp. 96–117, 2014. doi:
10.1134/S0081543814080069.
5. O. Vinyals et al., “Grandmaster level in starcraft ii using multi-agent reinforcement
learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019. doi: 10.1038/s41586-019-
1724-z.
6. S. Reed et al., “A generalist agent,” arXiv preprint arXiv:2205.06175, 2022.
7. D. Ha and J. Schmidhuber, “World models,” arXiv preprint arXiv:1803.10122,
2018. doi: 10.48550/arXiv.1803.10122.
8. T.M. Moerland, J. Broekens, and C.M. Jonker, “Model-based reinforcement
learning: A survey,” arXiv preprint arXiv:2006.16712, 2020. doi:
10.48550/arXiv.2006.16712.
9. D. Hafner et al., “Learning latent dynamics for planning from pixels,” in Interna-
tional conference on machine learning, PMLR, 2019, pp. 2555–2565. doi:
10.48550/arXiv.1811.04551.
10. R.F. Prudencio, M.R. Maximo, and E.L. Colombini, “A survey on offline reinforce-
ment learning: Taxonomy, review, and open problems,” arXiv preprint
arXiv:2203.01387, 2022. doi: 10.48550/arXiv.2203.01387.
11. W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear bio-
logical movement systems,” in ICINCO (1), Citeseer, 2004, pp. 222–229.
doi:10.5220/0001143902220229.
12. M. Watter, J. Springenberg, J. Boedecker, and M. Riedmiller, “Embed to control:
A locally linear latent dynamics model for control from raw images,” Advances in
neural information processing systems, vol. 28, 2015.
13. N. Levine, Y. Chow, R. Shu, A. Li, M. Ghavamzadeh, and H. Bui,
“Prediction, consistency, curvature: Representation learning for locallylinear con-
trol,” arXiv preprint arXiv:1909.01506, 2019.
14. R. Shu et al., “Predictive coding for locally-linear control,” in International
Conference on Machine Learning, PMLR, 2020, pp. 8862–8871. doi:
10.5555/3524938.3525760.
15. M. Lechner, R. Hasani, D. Rus, and R. Grosu, “Gershgorin loss stabilizes the recur-
rent neural network compartment of an end-to-end robot learning scheme,” in 2020
IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020,
pp. 5446–5452. doi: 10.1109/ICRA40945.2020.9196608.
16. R.A. Horn and C.R. Johnson, Matrix analysis. Cambridge university press, 2012.
doi: 10.5555/2422911.
17. M.D. Zeiler, D. Krishnan, G.W. Taylor, and R. Fergus, “Deconvolutional networks,”
in 2010 IEEE Computer Society Conference on computer vision and pattern recogni-
tion, pp. 2528–2535. doi: 10.1109/CVPR.2010.5539957
18. A. Tytarenko, Rl-research. Available: https://github.com/titardrew/rl-research, 2022.
Received 31.08.2022
A. Tytarenko
ISSN 1681–6048 System Research & Information Technologies, 2022, № 3 148
INFORMATION ON THE ARTICLE
Andrii M. Tytarenko, ORCID: 0000-0002-8265-642X, Educational and Research Insti-
tute for Applied System Analysis of the National Technical University of Ukraine “Igor
Sikorsky Kyiv Polytechnic Institute”, Ukraine, e-mail: titarenkoan@gmail.com
БАГАТОКРОКОВЕ ПРОГНОЗУВАННЯ В ЛІНЕАРИЗОВАНИХ ЛАТЕНТНИХ
ПРОСТОРАХ ДЛЯ НАВЧАННЯ РЕПРЕЗИНТАЦІЙ / А.М. Титаренко
Анотація. Запропоновано новий метод, що узагальнює підходи LCE, такі як
E2C. Метод розвиває ідею вивчення локально-лінійного простору станів шля-
хом розглядання багатокрокового прогнозування, що дає змогу чіткіше конт-
ролювати кривизну шуканого простору. Продемонстровано, що метод переве-
ршує E2C без суттєвих змін загальної моделі, на відміну від інших робіт, таких
як PCC і P3C. Розглянуто зв’язок між E2C і запропонованим методом та між їх
відповідними рівняннями оновлень. Подано емпіричні докази, які свідчать, що
ms-E2C дозволяє набагато краще вивчати простори прихованих станів з точки
зору кривизни та прогнозованості наступних станів. Крім того, висвітлено пе-
вні проблеми стабільності, пов’язані з багатокроковими прогнозами, та спосо-
би їх вирішення.
Ключові слова: навчання репрезентацій, навчання керованих просторів,
навчання з підкріпленням, латентний простір станів.
|
| id | journaliasakpiua-article-269583 |
| institution | System research and information technologies |
| keywords_txt_mv | keywords |
| language | English |
| last_indexed | 2025-07-17T10:28:03Z |
| publishDate | 2022 |
| publisher | The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" |
| record_format | ojs |
| resource_txt_mv | journaliasakpiua/92/b4cabdfa65798843a79f555aecf91292.pdf |
| spelling | journaliasakpiua-article-2695832022-12-21T22:15:21Z Multi-step prediction in linearized latent state spaces for representation learning Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій Tytarenko, Andrii representation learning learning controllable embedding reinforcement learning latent state space навчання репрезентацій навчання керованих просторів навчання з підкріпленням латентний простір станів In this paper, we derive a novel method as a generalization over LCEs such as E2C. The method develops the idea of learning a locally linear state space by adding a multi-step prediction, thus allowing for more explicit control over the curvature. We show that the method outperforms E2C without drastic model changes which come with other works, such as PCC and P3C. We discuss the relation between E2C and the presented method and derive update equations. We provide empirical evidence, which suggests that by considering the multi-step prediction, our method – ms-E2C – allows learning much better latent state spaces in terms of curvature and next state predictability. Finally, we also discuss certain stability challenges we encounter with multi-step predictions and how to mitigate them. Запропоновано новий метод, що узагальнює підходи LCE, такі як E2C. Метод розвиває ідею вивчення локально-лінійного простору станів шляхом розглядання багатокрокового прогнозування, що дає змогу чіткіше контролювати кривизну шуканого простору. Продемонстровано, що метод перевершує E2C без суттєвих змін загальної моделі, на відміну від інших робіт, таких як PCC і P3C. Розглянуто зв’язок між E2C і запропонованим методом та між їх відповідними рівняннями оновлень. Подано емпіричні докази, які свідчать, що ms-E2C дозволяє набагато краще вивчати простори прихованих станів з точки зору кривизни та прогнозованості наступних станів. Крім того, висвітлено певні проблеми стабільності, пов’язані з багатокроковими прогнозами, та способи їх вирішення. The National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" 2022-10-30 Article Article application/pdf https://journal.iasa.kpi.ua/article/view/269583 10.20535/SRIT.2308-8893.2022.3.09 System research and information technologies; No. 3 (2022); 139-148 Системные исследования и информационные технологии; № 3 (2022); 139-148 Системні дослідження та інформаційні технології; № 3 (2022); 139-148 2308-8893 1681-6048 en https://journal.iasa.kpi.ua/article/view/269583/265053 |
| spellingShingle | навчання репрезентацій навчання керованих просторів навчання з підкріпленням латентний простір станів Tytarenko, Andrii Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій |
| title | Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій |
| title_alt | Multi-step prediction in linearized latent state spaces for representation learning |
| title_full | Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій |
| title_fullStr | Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій |
| title_full_unstemmed | Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій |
| title_short | Багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій |
| title_sort | багатокрокове прогнозування в лінеаризованих латентних просторах для навчання репрезинтацій |
| topic | навчання репрезентацій навчання керованих просторів навчання з підкріпленням латентний простір станів |
| topic_facet | representation learning learning controllable embedding reinforcement learning latent state space навчання репрезентацій навчання керованих просторів навчання з підкріпленням латентний простір станів |
| url | https://journal.iasa.kpi.ua/article/view/269583 |
| work_keys_str_mv | AT tytarenkoandrii multisteppredictioninlinearizedlatentstatespacesforrepresentationlearning AT tytarenkoandrii bagatokrokoveprognozuvannâvlínearizovanihlatentnihprostorahdlânavčannâreprezintacíj |