Contiguity and Dynamic Programming
Some aspects of the principle of optimality [1, p.83] are considered, and a modification is proposed for the derivation of the main functional equations of dynamic programming to demonstrate that those equations are valid also in the case of non-optimal remaining trajectories under certain contiguit...
Gespeichert in:
| Datum: | 2007 |
|---|---|
| 1. Verfasser: | |
| Format: | Artikel |
| Sprache: | English |
| Veröffentlicht: |
Інститут проблем моделювання в енергетиці ім. Г.Є. Пухова НАН України
2007
|
| Schriftenreihe: | Электронное моделирование |
| Schlagworte: | |
| Online Zugang: | https://nasplib.isofts.kiev.ua/handle/123456789/101822 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| Zitieren: | Contiguity and Dynamic Programming / E.A. Galperin // Электронное моделирование. — 2007. — Т. 29, № 6. — С. 23-36. — Бібліогр.: 11 назв. — англ. |
Institution
Digital Library of Periodicals of National Academy of Sciences of Ukraine| id |
nasplib_isofts_kiev_ua-123456789-101822 |
|---|---|
| record_format |
dspace |
| spelling |
nasplib_isofts_kiev_ua-123456789-1018222025-02-09T18:03:47Z Contiguity and Dynamic Programming Galperin, E.A. Математические методы и модели Some aspects of the principle of optimality [1, p.83] are considered, and a modification is proposed for the derivation of the main functional equations of dynamic programming to demonstrate that those equations are valid also in the case of non-optimal remaining trajectories under certain contiguity condition that is defined and analyzed in the paper. Control systems with incomplete information or structural limitations on controls do not, in general, satisfy the contiguity condition. Control problems for such systems may have optimal solutions which, however, cannot be obtained by dynamic programming. This fact is shown on example of a widely used engineering system for which optimal trajectories have all its remaining parts non optimal and non contiguous to the optimal trajectories. The paper presents theoretical justification of dynamic programming for contiguous systems without the principle of optimality, and discussion of some problems important for applications. Рассмотрены некоторые аспекты принципа оптимальности [1, с. 83] и предложена его модификация для вывода основных функциональных уравнений динамического программирования, чтобы показать, что эти уравнения справедливы также в случае неоптимальных остаточных траекторий при некотором условии смежности. Приведены определение и анализ этого условия. Системы управления с неполной информацией или структурными ограничениями управления не удовлетворяют условию смежности в общем случае. Задачи управления для таких систем могут иметь оптимальные решения, но их нельзя получить методом динамического программирования. Этот факт продемонстрирован на примере широко применяемой технической системы, для которой все остаточные части оптимальных траекторий не оптимальны и не смежны с оптимальными траекториями. Дано теоретическое обоснование динамического программирования для смежных систем без принципа оптимальности и рассмотрены некоторые важные прикладные задачи. Розглянуто деякі аспекти принципу оптимальності [1, c.83] і запропоновано його модифікацію для виведення основних функціональних рівнянь динамічного програмування, щоб показати, що ці рівняння є справедливими також у випадку неоптимальних залишкових траєкторій за певною умовою суміжності. Наведено визначення та аналіз цієї умови. Системи управління з неповною інформацією або структурними обмеженнями управління не задовольняють умові суміжності у загальному випадку. Задачі управління для таких систем можуть мати оптимальні розв’язки, але їх не можна отримати методом динамічного програмування. Цей факт показано на прикладі технічної системи, що широко використовується, для якої всі залишкові частини оптимальних траєкторій не оптимальні та не суміжні з оптимальними траєкторіями. Дано теоретичне обґрунтування динамічного програмування для суміжних систем без принципу оптимальності та розглянуто деякі важливі прикладні задачі. 2007 Article Contiguity and Dynamic Programming / E.A. Galperin // Электронное моделирование. — 2007. — Т. 29, № 6. — С. 23-36. — Бібліогр.: 11 назв. — англ. 0204-3572 https://nasplib.isofts.kiev.ua/handle/123456789/101822 en Электронное моделирование application/pdf Інститут проблем моделювання в енергетиці ім. Г.Є. Пухова НАН України |
| institution |
Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| collection |
DSpace DC |
| language |
English |
| topic |
Математические методы и модели Математические методы и модели |
| spellingShingle |
Математические методы и модели Математические методы и модели Galperin, E.A. Contiguity and Dynamic Programming Электронное моделирование |
| description |
Some aspects of the principle of optimality [1, p.83] are considered, and a modification is proposed for the derivation of the main functional equations of dynamic programming to demonstrate that those equations are valid also in the case of non-optimal remaining trajectories under certain contiguity condition that is defined and analyzed in the paper. Control systems with incomplete information or structural limitations on controls do not, in general, satisfy the contiguity condition. Control problems for such systems may have optimal solutions which, however, cannot be obtained by dynamic programming. This fact is shown on example of a widely used engineering system for which optimal trajectories have all its remaining parts non optimal and non contiguous to the optimal trajectories. The paper presents theoretical justification of dynamic programming for contiguous systems without the principle of optimality, and discussion of some problems important for applications. |
| format |
Article |
| author |
Galperin, E.A. |
| author_facet |
Galperin, E.A. |
| author_sort |
Galperin, E.A. |
| title |
Contiguity and Dynamic Programming |
| title_short |
Contiguity and Dynamic Programming |
| title_full |
Contiguity and Dynamic Programming |
| title_fullStr |
Contiguity and Dynamic Programming |
| title_full_unstemmed |
Contiguity and Dynamic Programming |
| title_sort |
contiguity and dynamic programming |
| publisher |
Інститут проблем моделювання в енергетиці ім. Г.Є. Пухова НАН України |
| publishDate |
2007 |
| topic_facet |
Математические методы и модели |
| url |
https://nasplib.isofts.kiev.ua/handle/123456789/101822 |
| citation_txt |
Contiguity and Dynamic Programming / E.A. Galperin // Электронное моделирование. — 2007. — Т. 29, № 6. — С. 23-36. — Бібліогр.: 11 назв. — англ. |
| series |
Электронное моделирование |
| work_keys_str_mv |
AT galperinea contiguityanddynamicprogramming |
| first_indexed |
2025-11-29T06:51:16Z |
| last_indexed |
2025-11-29T06:51:16Z |
| _version_ |
1850106534253559808 |
| fulltext |
E. A. Galperin
Department of Mathematics Universite du Quebec a Montreal
(C.P. 8888, Succ. Centre Ville, Montreal, Quebec H3C 3P8, Canada
E-mail:galperin.efim@uqam.ca)
Contiguity and Dynamic Programming
(Recommended by Prof. A. Martynyuk)
Some aspects of the principle of optimality [1, p.83] are considered, and a modification is pro-
posed for the derivation of the main functional equations of dynamic programming to demon-
strate that those equations are valid also in the case of non-optimal remaining trajectories under
certain contiguity condition that is defined and analyzed in the paper. Control systems with in-
complete information or structural limitations on controls do not, in general, satisfy the contiguity
condition. Control problems for such systems may have optimal solutions which, however, can-
not be obtained by dynamic programming. This fact is shown on example of a widely used engi-
neering system for which optimal trajectories have all its remaining parts non optimal and non
contiguous to the optimal trajectories. The paper presents theoretical justification of dynamic
programming for contiguous systems without the principle of optimality, and discussion of some
problems important for applications.
Ðàññìîòðåíû íåêîòîðûå àñïåêòû ïðèíöèïà îïòèìàëüíîñòè [1, ñ. 83] è ïðåäëîæåíà åãî
ìîäèôèêàöèÿ äëÿ âûâîäà îñíîâíûõ ôóíêöèîíàëüíûõ óðàâíåíèé äèíàìè÷åñêîãî ïðîãðàì-
ìèðîâàíèÿ, ÷òîáû ïîêàçàòü, ÷òî ýòè óðàâíåíèÿ ñïðàâåäëèâû òàêæå â ñëó÷àå íåîïòèìàëü-
íûõ îñòàòî÷íûõ òðàåêòîðèé ïðè íåêîòîðîì óñëîâèè ñìåæíîñòè. Ïðèâåäåíû îïðåäåëåíèå
è àíàëèç ýòîãî óñëîâèÿ. Ñèñòåìû óïðàâëåíèÿ ñ íåïîëíîé èíôîðìàöèåé èëè ñòðóêòóðíûìè
îãðàíè÷åíèÿìè óïðàâëåíèÿ íå óäîâëåòâîðÿþò óñëîâèþ ñìåæíîñòè â îáùåì ñëó÷àå. Çàäà-
÷è óïðàâëåíèÿ äëÿ òàêèõ ñèñòåì ìîãóò èìåòü îïòèìàëüíûå ðåøåíèÿ, íî èõ íåëüçÿ ïîëó-
÷èòü ìåòîäîì äèíàìè÷åñêîãî ïðîãðàììèðîâàíèÿ. Ýòîò ôàêò ïðîäåìîíñòðèðîâàí íà ïðè-
ìåðå øèðîêî ïðèìåíÿåìîé òåõíè÷åñêîé ñèñòåìû, äëÿ êîòîðîé âñå îñòàòî÷íûå ÷àñòè
îïòèìàëüíûõ òðàåêòîðèé íå îïòèìàëüíû è íå ñìåæíû ñ îïòèìàëüíûìè òðàåêòîðèÿìè.
Äàíî òåîðåòè÷åñêîå îáîñíîâàíèå äèíàìè÷åñêîãî ïðîãðàììèðîâàíèÿ äëÿ ñìåæíûõ ñèñòåì
áåç ïðèíöèïà îïòèìàëüíîñòè è ðàññìîòðåíû íåêîòîðûå âàæíûå ïðèêëàäíûå çàäà÷è.
K e y w o r d s: dynamic programming, total optimality, contiguity.
1. Introduction. In the ten line section from [1, § 3, p. 83] we read: «In each pro-
cess, the functional equation governing the process was obtained by an applica-
tion of the following intuitive:
Principle of optimality. An optimal policy has the property that whatever
the initial state and initial decision are, the remaining decisions must constitute
an optimal policy with regard to the state resulting from the first decision.
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 23
The mathematical transliteration of this simple principle will yield all the
functional equations we shall encounter throughout the remainder of the book.
A proof by contradiction is immediate».
In control literature, the mathematical transliteration of the above principle
for deterministic processes is related to certain types of functional equations,
see, e.g., [1—8]. Now we reproduce some of them in authors’ notations from [1,
Ch. IX, § 13] where a new formalization of the calculus of variations is proposed
for non-autonomous problems
f a c J u F x u t dt
u u
a
T
( , ) max ( ) max ( , , )� � � ,
dx
dt
G x u t� ( , , ), 0 � � �a t T, x a c( ) � ,
(1)
parameterized by a, c with variable a T�[ , )0 . Splitting the integral over two seg-
ments [a, a + S] and [a + S, T], we have, according to the principle of optimality,
f a c F x u t dt f a S c a S
u a a S
a
a S
( , ) max ( , , ) ( , ( ))
[ , ]
� � � �
�
�
�
�
�
�
�
�
, (2)
for all a and all S such that 0 �
� �a a S T. For small S this yields
f a c F c u a a S f a c Sf Sf dc S
u a a S
a c( , ) max [ ( , ( ), ) ( , ) (
[ , ]
� � � �
�
) / ( )]dS o S� . (3)
In (3) the terms f (a, c) cancel out and, dividing by S we get, to the first order as
S � 0
0 � � �max [ ( , ( ), ) ( , ( ), )]
( )u a
a cF c u a a f f G c u a a , (4)
� � �f F c v a f G c v aa
v
cmax [ ( , , ) ( , , )] , v u a� ( ) , (5)
and for optimal v = vo in the interior of the region, we get, cf. [1, Ch. IX, § 13, (8)]:
� � �f F c v a f G c v aa
o
c
o( , , ) ( , , ) , (6)
0 � �F c v a f G c v av
o
c v
o( , , ) ( , , ) . (7)
Solving this system for fa , fc and equating fac = fca yields a partial differential
equation for vo(a, c) which is the extremal control uo(t, x (t)) .
Remark 1. If the extremal (supposed to be optimal) vo of (5), is not in the
interior of the admissible region, then PDEs (6), (7), must be replaced by
extremality (stationarity) conditions of the Karush — Kuhn — Tucker type. Mo-
reover, in this case the classical theorem on mixed partial derivatives to find uo,
vo may be inapplicable. In engineering and economics, optimal policies (con-
trols) usually belong to boundaries of admissible regions for controls and states
(trajectories), which necessitates the replacement of those PDEs (not the Bell-
man equation (5)). This extension is beyond the scope of the paper.
E. A. Galperin
24 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6
There is vast literature on the subject, see, e.g., [1—8] and references
therein, mainly concerned with the application of the optimality principle to var-
ious kinds of problems, derivation of the corresponding functional equations,
their solution by different methods under certain assumptions and conditions, in-
vestigation of the existence of solutions and stability properties, numerical
studies and computational experiments, etc.
However, the principle of optimality is an important statement in its own
right and it is interesting to study it as such, without an accent on its numerous
applications, particular functional equations, technical details or connections
with other theories.
The statement of the principle of optimality is very general. It does not specify
a notion of optimality to which the principle is referred. In [1], it is applied to
max (min) optimality for certain classes of functionals, and to optimality in
games with a saddle point. The statement does not specify a set of possible de-
cisions (admissible controls), nor a class of problems to which it is applicable,
nor allowable restrictions, etc., leaving one to believe that it is a universally
valid principle, i.e., «a comprehensive and fundamental law, doctrine, or as-
sumption» (Webster’s dictionary), under the sole hypothesis that an optimal
policy exists. It also does not require convexity, monotonicity, differentiability,
etc., though such properties may be imposed on a case-to-case basis in order to
obtain or solve functional equations resulting from the principle and to assure
that an extremal (stationary) solution is, in fact, optimal.
2. Observations. The formulation of the principle as presented in [1, p. 83],
see Section 1 above, contains some explicit and implicit assumptions, such as the
existence of an optimal policy which includes optimal remaining decisions as its
parts, the existence of «the state resulting from the first decision», and the exis-
tence of non-optimal policies and corresponding trajectories emanating from the
same initial or intermediate state along an optimal trajectory. Among the most
important suppositions, we can list the following properties.
2.1. The congruence property. For simplicity, suppose first that an optimal
policy is unique and defined by a sequence of decisions { , ..., , ..., }u u un N1 or by
a continuous function u (t), 0 � �t T, T � �.
Then, «the remaining decisions must constitute an optimal policy with re-
gard to the state resulting from the first decision». Denote this subsequent opti-
mal policy by the sequence{ , ..., }v vn N , 1 < n < N, or, respectively, by the func-
tion v (t), 0 1� � �t t T, with the understanding that n or t1 correspond to the
state resulting from decisions { , ..., }u un1 1� , or u (t), 0 1� �t t . The congruence
property is the requirement of the principle of optimality that vi = ui , i = n, ..., N,
or, respectively, v (t) � u (t), t t T1 �
. Indeed , otherwise, { , ..., }u u N1 , or u (t),
0 �
t T, would not constitute an optimal policy as specified in the principle, if
Contiguity and Dynamic Programming
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 25
an optimal policy is unique. This can be taken as an immediate proof by con-
tradiction which, indeed, appeared in the literature, see Section 4.
If an optimal policy is not unique, then ui (i = 1, ..., N) and u (t) above may
be set-valued. In this case, the congruence property means that v ui i� , i =
= n, ..., N, or v (t) � u (t), t1 � t < T. For this and other reasons, see Section 5, we
prefer to consider congruence as a separate assumption meaning the inclusion of
«remaining decisions» in the remaining part of the original policy, without re-
quirement of their optimality. There are different terms expressing this notion
in the literature, e.g., «joining controls» in [6, p. 83], or «concatenated» pieces
of admissible trajectory in [7, p. 87] .
2.2 Additivity of costs in the optimality criterion (cost function). This
assumption can be seen from the functional equation (2) and other similar equa-
tions derived from the principle of optimality. In some sources, e. g., in [6, pp.
84, 85], it is postulated separately in order to prove the principle, see Section 3.
In integral criteria, it follows from the additivity of the integral with respect to in-
tervals of integration. This assumption represents a restriction on the class of
problems to which the principle of optimality is applied.
2.3 Feedback structure of optimal policies. This property is implied
by the statement that «the remaining decisions must constitute an optimal policy
with regard to the state resulting from the first decision». It can be seen from var-
ious equations in [1 — 4] that define the optimal policy, from (6), (7) for vo (a,
c) � uo(t, x (t)), and from other similar equations in the literature.
2.4. Independence on the preceding data and (or) decisions. The opti-
mality of remaining decisions «with regard to the state resulting from the first
decision» must not explicitly depend on the initial state nor the initial deci-
sion. This requirement is manifested by the word «whatever» in the formula-
tion of the principle. In practice, it means that feedback controls obtained
from equations implied by the principle explicitly depend on the current state,
and not on the initial state nor preceding states, nor preceding decisions.
2.5. Invariance of the Bellman function f (�). This hypothesis is not men-
tioned in the principle itself as formulated in [1, p. 83], see Section 1, and it does
not follow from the principle nor from any of the preceding observations. How-
ever, it is used in derivation of the functional equations and PDEs of dynamic
programming, see (2), (3) at left and in the bracket, and it is tacitly assumed in
applications and research works that followed in subsequent publications. This
hypothesis is supported by a number of solved examples, though without any
theoretical justification. The invariance of the function f (�) may be violated with
modification of parameters or arguments at later steps (subsequent parts) of con-
trolled trajectory.
3. Proof by contradiction. Geometrically, the principle of optimality can
be illustrated as follows. Consider a period of time [t0, T], T � �, an admissible
E. A. Galperin
26 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6
region X (state space), and denote by x (t)� X a point on a trajectory which may
depend on a policy (control) u (x, t) applied to the process to assure the optimality
of some criterion J (u (�), ��� say, max J, see [1, 2]. A trajectory x [t0, T) can be
discrete or continuous. Admissible controls are piecewise continuous func-
tions, and the cost function (optimality criterion) J (u(� ), �� is assumed to be
additive [1—8], that is
J t t J t t J t t[ , ] [ , ] [ , ]1 3 1 2 2 3� � , t t t t T0 1 2 3�
� . (8)
The policy (control)
u t t u x t J uo[ , ] ( , ) argmax ( , )1 3 � � � , t t t�[ , ]1 3 (9)
and the corresponding piece of trajectory x t t[ , ]1 3 are called optimal. The state-
ment of the principle is simple: if u t T[ , )0 is optimal for the whole trajectory
x t T[ , )0 , then remaining parts u [t, T), x [t, T) are also optimal for any t t T�( , )0 ,
provided that x t x t T( ) [ , )� 0 . Shortly: remaining parts of an optimal policy and
trajectory are themselves optimal with respect to the same criterion.
In some sources, see, e.g., [8, footnote to (4.2.16)], this condition is quali-
fied as sufficient condition. Other sources present a proof of the principle as
a necessary condition of optimality. In [6, pp. 86, 87], or [7, p. 87], a simple
proof of the principle is given which can be summarized as follows.
P r o o f. Setting t1 = t0 , t3 = T in (8) and assuming the existence of optimal
policies (trajectories) starting at any point x t x t T( ) [ , )� 0 , one can see that if
J t T J o[ , ]0 � is optimal, say, J J t To � max [ , ]0 for some uo(t), t t T�( , )0 , then
it implies
max [ , ] [ , ] max [ , ]J t T J t t J t T0 0 2 2� � . (10)
Indeed, if J t T[ , ]2 in (10) is not maximal, then there is a better policy
u t T[ , ]2 , thus, J t T[ , ]0 at the left in (10) is not optimal, contradicting the as-
sumption of its optimality.
4. Total optimality. Theorem 1. If for an optimal trajectory, all remaining
decisions are optimal, then every part of that trajectory is itself optimal.
P r o o f. Consider relation (8) again with t3 = T, not fixing t1 nor t2. By
assumption, J [t1, T], J [t2, T] are optimal, say, maximal, as corresponding to
optimal remaining decisions, thus, instead of (8), we can write
max [ , ] max [ , ] max [ , ]J t T J t t J t T1 1 2 2� � , t t t T0 1 2�
� . (11)
Now, if J t t[ , ]1 2 in (11) is not maximal, there is a better policy u t t[ , ]1 2 , thus,
J t T[ , ]1 at the left in (11) is not optimal, contradicting the assumption of its
optimality.
Contiguity and Dynamic Programming
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 27
C o r o l l a r y. If two remaining parts x t T[ , ]1 , x t T[ , ]2 of a trajectory (not
necessarily optimal) are optimal, then the piece x t t[ , ]1 2 of that trajectory is
also optimal.
D e f i n i t i o n. If all parts of a trajectory including the whole trajectory
itself are optimal, then such a trajectory (process) is called totally optimal.
We see that, according to the proof by contradiction, the principle of
optimality under assumptions 2.1—2.5 implies total optimality. It is well known
that certain optimal autonomous systems are totally optimal. Certain optimal
non autonomous systems may also be totally optimal. Clearly, every totally opti-
mal system satisfies the principle of optimality.
5. Contiguity and dynamic programming. If an optimal trajectory (pol-
icy) is not totally optimal, then the principle of optimality is not valid for such a
trajectory (policy). However, a modification of the derivation of the functional
equations for the optimal cost and optimal policy allows one to conclude that the
same resulting equations (5) — (7) hold under certain supplementary condition,
irrespective of the principle of optimality.
As in [1], we make a blanket assumption that all functions have continuous par-
tial derivatives up to the order we may need, and that optimal trajectories we may
consider do exist. In the course of derivation, we shall indicate all additional condi-
tions that are required and must be verified when solving practical problems.
If the principle of optimality does not apply to the problem, then the transi-
tion from (1) to (2) is not valid. Moreover, an optimal cost f (a, c) in (1) for the
remaining policies on [a, T), a > 0, need not be preserved as the same function
for all a because the optimal policy ut
o ( )� in the integrand may not be congruent
with ua
o ( )� , that is, u t u tt
o
a
o( ) ( )� for t > a, where ut
o ( )� , ua
o ( )� are curves, not partial
derivatives. With this in mind, denote
f a c J u F x u t dta
u u
a
T
( , ) max ( ) max ( , , )� � � , (12)
where c = x (a) is an intermediate point for t = a� [0, T) . Now, we have
f a c F x u t dt F x u t dta
u a T
a
a S
a S
( , ) max ( , , ) ( , , )
[ , ]
� �
�
�
�
�
�
T
�
�
�
�
;
(13)
f a c F x u t dt f a S x a S ga
u a a S
a S a( , ) max ( , , ) ( , ( ))
[ , ]
� � � � �
�
� �S
a
a S
( )�
�
�
�
�
�
�
� , (14)
where g a S� �( ) is the difference between the value of the second integral in (13)
with the optimal control u a To [ , ), and the optimal value f a S� �( ) of the same in-
E. A. Galperin
28 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6
tegral maximized separately by u*[a + S, T) which may not be restriction of
u a To [ , ) onto [a + S, T), that is, u t u to* ( ) ( )� for t � [a + S, T) (non-congru-
ence), if the trajectory is not totally optimal.
The weak invariance hypothesis: assume that lim ( ) ( )f fa S a� � � � as S � 0
for every a � [0, T), and same for its partial derivatives of the first order.
Now, expanding the first two terms of (14) into Taylor series with respect
to S and denoting v = u(a), we obtain
f a c F c v a S f a c Sf Sfa
u a a S
a
a
a
c
a( , ) max [ ( , , ) ( , ) ( )
[ , ]
� � � � �
�
( ) ( ) /� �dc S dS
� � ��o S g a S( ) ( )] . (15)
In (15), we cancel out the term f a ca ( , ), then divide the equation by S > 0, take
f a ca
a ( , ) to the left hand side, and consider S � 0. Suppose that the following
assumption holds
lim ( ) /
S
a Sg S
�
� � �
0
0 for all a � (0, T), (16)
which we call the contiguity condition. Assume also that the Taylor series in
(15) is convergent, and all terms in the bracket (14) with their respective partial
derivatives are continuous in all arguments. Then noting that u (a) = v, x (a) = c,
we get the equation
� � �f a c F c v a f a c G c v aa
a
v
c
a( , ) max [ ( , , ) ( , ) ( , , )], a � [0. T) (17)
and superscript a is redundant because of (16), thus, (17) is identical to (5). Since
by notation a = t, c = x (t), v = u (a) = u (t), so relation (17) can be written in the
usual form
� � �f t x F x u t f t x G x u tt
u t
x( , ) max [ ( , , ) ( , ) ( , , )]
( )
, t � [0, T). (18)
If optimal uo(t) is in the interior of the admissible region, then we have the equa-
tions
� � �f t x F x u t f t x G x u tt
o
x
o( , ) ( , , ) ( , ) ( , , ), t � [0, T), (19)
0 � �F x u t f t x G x u tu
o
x u
o( , , ) ( , ) ( , , ), t � [0, T) (20)
which are identical to (6), (7). This means that the contiguity condition (16) is
sufficient for derivation of those equations without the optimality of remaining
trajectories. Vice versa, it is easy to see that, if the limit in (16) is nonzero or does
not exist, the equation (17) does not follow from (15), thus, equations (19), (20)
are invalid despite sufficient smoothness and convergence of the Taylor series
in (15). Hence, under those regularity conditions, the following statement is
proved:
Contiguity and Dynamic Programming
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 29
Theorem 2. Under the Weak Invariance Hypothesis and the assumptions
cited in observations 2.2, 2.4, the derivation of the functional equations and
PDEs of dynamic programming is valid if and only if the contiguity condition
(16) is satisfied, irrespective of the principle of optimality.
Remark 2. It is worth noting that for totally optimal systems (e.g., certain au-
tonomous systems), we have g a S� � �( ) 0 for all a, S which means precisely the
optimality of all remaining trajectories. Also, the validity of derivation may not
imply the existence of solutions (same as indicated in [1] on many occasions).
Contiguity condition (16) can be simplified. To make it independent of the
concrete problem above and special notations in (12) — (16), we return to con-
siderations of Sections 3, 4, see (8), (10), (11). In our notations of (11), the limit
as S � 0 [ [1], Ch. IX, § 13], corresponds to the limit as t2� t1, t1 � [t0, T) in
(11). Without optimality of remaining decisions, we have to consider two differ-
ent trajectories and, instead of (11), the following relations:
x t T x t t x t T
t t
[ , ] lim ( [ , ] [ , ])*
1 1 2 2
2 1
� �
�
, x t x t( ) ( )*
2 2� , t t t T0 1 2�
, (21)
J x t T J x t t J x t T
t t
( [ , ]) lim{ ( [ , ]) max ( [ , ])}*
1 1 2 2
2 1
� �
�
. (22)
(The set-wise limit in (21) means simple coincidence as t2 = t1, t1� [t0, T), with-
out consideration of various definitions of such limits.)
Here x t T[ , ]1 designates a remaining (not necessarily optimal on [ , ]t T1 )
part of an entire optimal trajectory x t To [ , ]0 , on which part the value J x t T( [ , ])1
is attained under original optimal control u t To [ , )0 . If we remove the equality
signs at left and the limits in (21), (22), then x t t[ , ]1 2 designates the initial part
of semi-trajectory x t T[ , ]1 with the value J x t t( [ , ])1 2 upon x t t[ , ]1 2 . Notation
x t T* [ , ]2 means certain optimal semi-trajectory starting at x (t2) with the value
max J x t T( [ , ])*
2 attained on it. This semi-trajectory x t T* [ , ]2 may not coin-
cide with the remaining part x t T[ , ]2 of the trajectory x t T[ , ]1 .
In cases when the principle of optimality is valid for the system under con-
sideration, we have identically x t T* [ , ]2 � x t T[ , ]2 , equalities in (21), (22) are
valid without the limit operation, relation (22) coincides with (11), and relation
(21) becomes trivial tautology omitted in the proof of Theorem 1.
If the principle of optimality is invalid, then x*(t) � x(t), t� (t2,T], but rela-
tions (21), (22) may still hold in the limit for any t1�[t0, T). A careful reader can
see that it is the equalities of the type (21), (22) that are actually necessary and
sufficient for the derivation of the functional equations and PDEs of dynamic
programming in [1, Ch.9, § 3, 6, 13, 22], as it is presented above in (12) to (20).
Denote J x t To( [ , ])2 the (non optimal) value of J(·) over remaining part
x t To [ , ]2 of the semi-trajectory x t T[ , ]1 , t2 > t1 = a � t0, under original optimal
E. A. Galperin
30 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6
control uo(t)� uo[t0, T]. Let extr J = J (x*[t2, T]) be the extremal (max or min)
value of J(�) over a different semi-trajectory x*[t2, T] starting at the same point
x*(t2) = x to ( )2 but optimized separately by some control u*(t) � uo(t), t �[t2, T).
Instead of g a S� �( ) in (14) — (16), consider the functional
C t t C u t T u t T J t x t T J xo o[ , ]: ( [ , ], [ , ] ( , [ , ]) ( [*
1 2 1 2 1 2� � � * t T2 , ]),
u t T u t To o[ , ] [ , ]1 0� , (23)
where t1, t2 are variable markers indicating two starting points of two non
congruent remaining trajectories. The quantity C t t[ , ]1 2 of (23) is identical
to g a S� �( ) > 0 for min-problem, and to g a S� �( ) < 0 for max-problem as in (14),
(15). The value J t x t To( , [ , ])1 2 is equal to the second integral in (13), and
J x t T( [ , ])*
2 equals the term f a S� �( ) in (14).
The functional (23) is a measure of J-proximity of semi-trajectories
x t To [ , ]2 , x t T* [ , ]2 of which x t T* [ , ]2 is optimal with respect to the same crite-
rion J (·) but does not coincide with (non optimal) remaining part x t To [ , ]2 . For to-
tally optimal systems (where the principle of optimality is valid), we have C [·] � 0.
Otherwise, C [·] should be of higher order than the time difference t t2 1 0� � for
all t1�[t0, T) in accordance with (16) to assure the applicability of dynamic pro-
gramming methods.
Denote S = t2 – t1 (same S as in [1, Ch. 9, § 3, 6, 13, 22]), then (23) can be re-
written as follows:
C t t S C u t T u t S To[ , ]: ( [ , ], [ , ]*
1 1 1 1� � � �
� � � �J t x t S T J x t S To( , [ , ]) ( [ , ])1 1 1
* , (24)
where t1 = a in (12) to (16), and we get an equivalent representation for (16):
lim{ [ , ]/ }
S
C t t S S
�
� �
0
1 1 0 for all t t T1 0�[ , ) . (25)
Under assumptions of continuous partial derivatives adopted in [1], the
functional C t t S[ , ]1 1 � is continuously differentiable with respect to S. Con-
sider the limit
C t C t t S J t x t S T J x
S S
o( ) lim [ , ] lim{ ( , [ , ]) (1
0
1 1
0
1 1� � � � �
� �
* [ , ])}t S T1 � . (26)
If C (t1) � 0, then in (25) we have ± �, and non-contiguity follows. If
C t( )1 0� , the l’Hopital rule applies, and (25) is equivalent to the condition
lim{ [ , ]/ }
S
dC t t S dS
�
� �
0
1 1 0 for all t t T1 0�[ , ), (27)
Contiguity and Dynamic Programming
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 31
which is the contiguity condition for general systems and functionals, not neces-
sarily of the type considered in [1, 2], as in (12).
6. Non contiguous optimal dynamical systems. It is tempting to think that
dynamic programming, applicable even without the principle of optimality,
might be universally applicable to almost all systems which most likely satisfy
the Invariance Hypothesis and the contiguity condition, with the exception of,
maybe, some pathological examples. Unfortunately, this is not the case.
Non-contiguity (i.e., non satisfaction of the contiguity condition) in optimal
dynamical systems is a rather fine property that invalidates the extension (appli-
cation) of an optimal control over some part of a trajectory for use over another
part of the same trajectory. Strangely as it may seem, non-contiguous systems
are common in practice. In such cases, dynamic programming is inapplicable,
and other methods can be employed, see [5—11].
Systems with incomplete information and/or structure constraints are gener-
ally non-contiguous, even if they are autonomous. To illustrate the existence of
such systems, we consider an engineering problem, see [10; 11, p. 4—7], of opti-
mal stabilization of a stationary linear oscillator (engines, turbines, vibratory
machines, etc.) which we study in two distinct settings to demonstrate the funda-
mental difference in regard to optimality that may occur with modifications in
the structure of a controller.
Example 1 [10]. In all textbooks on undergraduate mathematics, mechan-
ics, or control theory, the following linear oscillator equation is considered
d x dt x u2 2/ � � , t � 0, x (0) = x0, dx (0)/dt = v0. (28)
Denoting dx/dt = v (velocity), we convert (28) into the normal form:
dx/dt = v, dv/dt = – x + u, t � 0, x (0) = x0, v (0) = v0. (29)
The oscillator (28) or (29) should be stabilized to minimize the functional
f (x0 , v0) = min J (u) = ( )ax bv cu dt2 2 2
0
� �
�
� , a, b, c = const > 0. (30)
The system is autonomous, its optimal trajectories are totally optimal, the prin-
ciple of optimality applies, thus, denoting f (x, v) = min J (u) for t > 0, the Bellman
equation follows with fT � 0 for T = � (fixed), cf. [1, Ch. IX, § 3, (12)] :
fT (x, v) = min [ax2 + bv2 + cu2 + v f x� �/ + (– x + u) � �f v/ ] = 0. (31)
With the control space open, it implies for optimal uo,
2cuo + �f /�v = 0. (32)
E. A. Galperin
32 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6
Substituting optimal control uo from (32) into (31) (which removes the min
operation) and solving the nonlinear PDE, we obtain the minimal cost (J (u) of
(30) is strictly convex) and the unique optimal feedback :
f (x, v) = (c + a)1 2/ (b – 2cp)1 2/ x2 – 2cpxv + (bc – 2c2p)1 2/ v2, (33)
uo = px – ( b/c – 2p)1 2/ v, p = 1– (1 + a/c)1 2/ . (34)
We see that feedback (34) does not depend on initial conditions in (29). It
means that every remaining part of an optimal trajectory is itself optimal. Also,
for fixed T , the equation (31) does not depend on boundary conditions x (t0),
x (T), 0 � t0 < T < �� which means that every intermediate part x [t0, T] of the op-
timal trajectory is itself optimal (total optimality) with the same control (34)
(congruence), cf. Theorem 1.
Example 2 [10]. Suppose that the space coordinate x(t) cannot be measured
(fluid friction which is common in engineering), thus, the control must be of the
form u = u (v). If a = 0 in (30), then p = 0, and uo of (34) is feasible. However, in
(30) we have a > 0 , thus, in (34) p < 0 and uo is infeasible. In the class of con-
trols u (v) depending on velocity, equation (31) does not have a solution, though
«approximate» solutions with small | p | do exist, all non optimal since they actu-
ally correspond to small a > 0 in (30).
In the spirit of dynamic programming and noting the result (34), let us look
for the optimal feedback in the class of controls
u = – 2qv, q > 0, (35)
which represents a nonsingular Lagrange problem of the calculus of variations
with two constraints (29) if we eliminate u from (30) by the substitution u = x + �v
from (29), and then substitute (35) into (29).
For this class of controls, one can also write the Bellman equation by simply
substituting (35) into (31), and then considering q (�) as a control. Denoting u =
= �2q (�) v = uo(�), one gets the same equations (31), (32) with the solution
(34) — nothing new.
However, the optimal regulator of the form (35) with q = const (optimal
dampening) does exist. For system (29), (30), (35), the characteristic equation is
r2 + 2qr + 1 = 0, so that Re r1,2 < 0, the integral (30) is convergent and, for q � 1,
it has the value
J x v q a c x cv q a b x v q ax( , , ) [( ) ] ( )( )0 0 0
2
0
2 1
4 0
2
0
2 1� � � � � � ��
0 0v . (36)
From the equation � J/� q = 0, we find the extremal value
q a b x v a c x cv0
2 1
4 0
2
0
2
0
2
0
2� � � � �( )( ) / [( ) ], q
0
0� , (37)
Contiguity and Dynamic Programming
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 33
for which �2J/�q2 > 0, yielding the minimum in (36), see [10, formulae
(5.7)—(5.12)]. Since J (� ) of (36) is continuously differentiable at q = 1, for all
values of parameters in (36), formulae (37), (36) are valid also for q = 1, cover-
ing the whole range of possible friction fluid densities.
Comparing (34) and (35), (37), one can see that:
1) uo of (34) does not depend on initial data meaning that uo is optimal not
only for any remaining part of an optimal trajectory (Bellman’s principle of
optimality) but also for all trajectories in the feasible space X. It means that
equations (33), (34) establish a property which may be called field optimality
in the sense that for the system (28) — (30) there exists one single optimal con-
trol (34) such that every trajectory starting from any point x � X = R2 is optimal .
Moreover, there is a unique surface (33) which renders the optimal value f (x)
of the functional (30) on the trajectory starting from that point x.
2) u = – q0 v of (35), (37) does depend on initial data x v0 0, , meaning that,
being optimal for the whole trajectory corresponding to those data, it is not opti-
mal for any remaining part thereof, hence, Bellman’s principle of optimality is
invalid in the class of controls (35), if a > 0 in (30).
3) If a = 0 in (30), we have u � uo , see (34), (35), (37), thus, for this class of
functionals in (30), Bellman’s principle of optimality is valid in the class of controls
(35) for the particular system (29). Moreover, field optimality is preserved.
Nonexistence of solutions of the Bellman equation (31) for linear system (29),
(30), (35), which equation can be formally written as proposed in [1], means that this
autonomous system with incomplete information is non-contiguous. Its non-conti-
guity invalidates the derivation of Bellman’s equations presented in [1]. In contrast,
this derivation is valid for contiguous systems even without optimality of remaining
semi-trajectories because the limit operation as S � 0 removes the necessity of ex-
act coincidence (principle of optimality) of two (maybe, different) semi-trajectories
starting at the same point x (t), t > t0 = 0. Those two semi-trajectories need not
coincide. They must be touching at x (t + S) as S � 0 to the second or higher or-
der, i.e., lie in a weak neighborhood of one another (cf. the curvature in the
Weierstrass form of the Euler—Lagrange equations).
7. Concluding remarks and open problems. In this paper, the notions of
total optimality and contiguity are defined, and a study of contiguity of dynami-
cal systems is presented in application to dynamic programming. It is demon-
strated that, according to the currently available proof [6, 7], the principle of
optimality is a partial statement about total optimality. With free variations of
controls, optimal autonomous systems are usually totally optimal. There are op-
timal non autonomous systems that are also totally optimal. Whether or not ev-
ery such system can be transformed (by a non autonomous transformation) into a
totally optimal system, or reduced to such a system is an open question of much
practical importance.
E. A. Galperin
34 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6
Dynamic programming approach does not require optimality of remain-
ing semi-trajectories: a much weaker contiguity condition would suffice. Many
practical problems with structural constraints, incomplete information, and other
specific requirements, do not satisfy the contiguity condition, even if such sys-
tems are autonomous. In some cases, optimal problems for such systems can be
reformulated to allow for formal setting of dynamic programming equations.
Those equations may have solutions which may not respect the constraints. It is
interesting to find out how much the costs on real trajectories (with constraints)
would exceed the costs of the formal dynamic programming solution without
constraints, or with constraints partially satisfied.
Among other open questions, we can list the following problems:
1. Find an alternative proof of Bellman’s principle of optimality, different
from that presented in Section 3, which would not imply total optimality, Sec-
tion 4.
2. Prove or disprove the equivalence of the Weak Invariance Hypothesis
(Section 5) and Bellman’s Invariance Hypothesis (Section 2, observation 2.5).
3. Find an example of a contiguous but not totally optimal dynamical sys-
tem, or prove that contiguity implies total optimality.
4. Extend dynamic programming to the case when optimal trajectories are
on the boundary, replacing gradient conditions by the Karush—Kuhn—Tucker
type conditions.
5. Example 1 presents an instance where Bellman’s equations imply field
optimality. Study the conditions for this phenomenon.
Ðîçãëÿíóòî äåÿê³ àñïåêòè ïðèíöèïó îïòèìàëüíîñò³ [1, c.83] ³ çàïðîïîíîâàíî éîãî ìîäè-
ô³êàö³þ äëÿ âèâåäåííÿ îñíîâíèõ ôóíêö³îíàëüíèõ ð³âíÿíü äèíàì³÷íîãî ïðîãðàìóâàííÿ,
ùîá ïîêàçàòè, ùî ö³ ð³âíÿííÿ º ñïðàâåäëèâèìè òàêîæ ó âèïàäêó íåîïòèìàëüíèõ çàëèøêî-
âèõ òðàºêòîð³é çà ïåâíîþ óìîâîþ ñóì³æíîñò³. Íàâåäåíî âèçíà÷åííÿ òà àíàë³ç ö³º¿ óìîâè.
Ñèñòåìè óïðàâë³ííÿ ç íåïîâíîþ ³íôîðìàö³ºþ àáî ñòðóêòóðíèìè îáìåæåííÿìè óïðàâë³í-
íÿ íå çàäîâîëüíÿþòü óìîâ³ ñóì³æíîñò³ ó çàãàëüíîìó âèïàäêó. Çàäà÷³ óïðàâë³ííÿ äëÿ òàêèõ
ñèñòåì ìîæóòü ìàòè îïòèìàëüí³ ðîçâ’ÿçêè, àëå ¿õ íå ìîæíà îòðèìàòè ìåòîäîì äèíàì³÷-
íîãî ïðîãðàìóâàííÿ. Öåé ôàêò ïîêàçàíî íà ïðèêëàä³ òåõí³÷íî¿ ñèñòåìè, ùî øèðîêî
âèêîðèñòîâóºòüñÿ, äëÿ ÿêî¿ âñ³ çàëèøêîâ³ ÷àñòèíè îïòèìàëüíèõ òðàºêòîð³é íå îïòèìàëüí³
òà íå ñóì³æí³ ç îïòèìàëüíèìè òðàºêòîð³ÿìè. Äàíî òåîðåòè÷íå îá´ðóíòóâàííÿ äèíàì³÷íîãî
ïðîãðàìóâàííÿ äëÿ ñóì³æíèõ ñèñòåì áåç ïðèíöèïó îïòèìàëüíîñò³ òà ðîçãëÿíóòî äåÿê³
âàæëèâ³ ïðèêëàäí³ çàäà÷³.
1. Bellman R. Dynamic Programming. — Princeton, N. J. : Princeton University Press, 1957.
2. Bellman R. E., Glicksberg I., Gross O. A. Some Aspects of the Mathematical Theory of Con-
trol Processes. — Santa Monica, California: RAND Corporation, 1958.
3. Bellman R. Adaptive Control Processes: A Guided Tour. — Princeton, N. J. : Princeton Uni-
versity Press, 1961.
4. Bellman R. E., Dreyfus S. E. Applied Dynamic Programming. — Princeton, N. J. : Princeton
University Press, 1962.
Contiguity and Dynamic Programming
ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 35
5. Lee E., Marcus L. Foundations of Optimal Control Theory. — N.Y. : Wiley, 1967.
6. Leitmann G. The Calculus of Variations and Optimal Control. — N. Y. : Plenum Press,
1981.
7. Knowles G. An Introduction to Applied Optimal Control. — Academic Press, 1981.
8. Bryson A. E., Jr., and Yu-Chi Ho Applied Optimal Control. — Waltham, Massachusets:
Blaisdell Publ. Co., a Division of Ginn & Co., 1969.
9. Galperin E. A. Translational symmetry of certain integrals with application to the calculus of
variations and optimal control// Mathematical and Computer Modeling. — 2002. — 36. —
P. 717—728.
10. Galperin E. A., Dergacheva E. I. Optimum stabilization of a linear system with incomplete
information// Automation and Remote Control. — 1968. — 8. — P. 1193—1202.
11. Galperin E. A., Zheng Q. Global Solutions in Optimal Control and Games. — Montreal:
NP Research Publ., 1991.
Ïîñòóïèëà 24.11.06
E. A. Galperin
36 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6
|