Contiguity and Dynamic Programming

Some aspects of the principle of optimality [1, p.83] are considered, and a modification is proposed for the derivation of the main functional equations of dynamic programming to demonstrate that those equations are valid also in the case of non-optimal remaining trajectories under certain contiguit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Datum:2007
1. Verfasser: Galperin, E.A.
Format: Artikel
Sprache:English
Veröffentlicht: Інститут проблем моделювання в енергетиці ім. Г.Є. Пухова НАН України 2007
Schriftenreihe:Электронное моделирование
Schlagworte:
Online Zugang:https://nasplib.isofts.kiev.ua/handle/123456789/101822
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:Digital Library of Periodicals of National Academy of Sciences of Ukraine
Zitieren:Contiguity and Dynamic Programming / E.A. Galperin // Электронное моделирование. — 2007. — Т. 29, № 6. — С. 23-36. — Бібліогр.: 11 назв. — англ.

Institution

Digital Library of Periodicals of National Academy of Sciences of Ukraine
id nasplib_isofts_kiev_ua-123456789-101822
record_format dspace
spelling nasplib_isofts_kiev_ua-123456789-1018222025-02-09T18:03:47Z Contiguity and Dynamic Programming Galperin, E.A. Математические методы и модели Some aspects of the principle of optimality [1, p.83] are considered, and a modification is proposed for the derivation of the main functional equations of dynamic programming to demonstrate that those equations are valid also in the case of non-optimal remaining trajectories under certain contiguity condition that is defined and analyzed in the paper. Control systems with incomplete information or structural limitations on controls do not, in general, satisfy the contiguity condition. Control problems for such systems may have optimal solutions which, however, cannot be obtained by dynamic programming. This fact is shown on example of a widely used engineering system for which optimal trajectories have all its remaining parts non optimal and non contiguous to the optimal trajectories. The paper presents theoretical justification of dynamic programming for contiguous systems without the principle of optimality, and discussion of some problems important for applications. Рассмотрены некоторые аспекты принципа оптимальности [1, с. 83] и предложена его модификация для вывода основных функциональных уравнений динамического программирования, чтобы показать, что эти уравнения справедливы также в случае неоптимальных остаточных траекторий при некотором условии смежности. Приведены определение и анализ этого условия. Системы управления с неполной информацией или структурными ограничениями управления не удовлетворяют условию смежности в общем случае. Задачи управления для таких систем могут иметь оптимальные решения, но их нельзя получить методом динамического программирования. Этот факт продемонстрирован на примере широко применяемой технической системы, для которой все остаточные части оптимальных траекторий не оптимальны и не смежны с оптимальными траекториями. Дано теоретическое обоснование динамического программирования для смежных систем без принципа оптимальности и рассмотрены некоторые важные прикладные задачи. Розглянуто деякі аспекти принципу оптимальності [1, c.83] і запропоновано його модифікацію для виведення основних функціональних рівнянь динамічного програмування, щоб показати, що ці рівняння є справедливими також у випадку неоптимальних залишкових траєкторій за певною умовою суміжності. Наведено визначення та аналіз цієї умови. Системи управління з неповною інформацією або структурними обмеженнями управління не задовольняють умові суміжності у загальному випадку. Задачі управління для таких систем можуть мати оптимальні розв’язки, але їх не можна отримати методом динамічного програмування. Цей факт показано на прикладі технічної системи, що широко використовується, для якої всі залишкові частини оптимальних траєкторій не оптимальні та не суміжні з оптимальними траєкторіями. Дано теоретичне обґрунтування динамічного програмування для суміжних систем без принципу оптимальності та розглянуто деякі важливі прикладні задачі. 2007 Article Contiguity and Dynamic Programming / E.A. Galperin // Электронное моделирование. — 2007. — Т. 29, № 6. — С. 23-36. — Бібліогр.: 11 назв. — англ. 0204-3572 https://nasplib.isofts.kiev.ua/handle/123456789/101822 en Электронное моделирование application/pdf Інститут проблем моделювання в енергетиці ім. Г.Є. Пухова НАН України
institution Digital Library of Periodicals of National Academy of Sciences of Ukraine
collection DSpace DC
language English
topic Математические методы и модели
Математические методы и модели
spellingShingle Математические методы и модели
Математические методы и модели
Galperin, E.A.
Contiguity and Dynamic Programming
Электронное моделирование
description Some aspects of the principle of optimality [1, p.83] are considered, and a modification is proposed for the derivation of the main functional equations of dynamic programming to demonstrate that those equations are valid also in the case of non-optimal remaining trajectories under certain contiguity condition that is defined and analyzed in the paper. Control systems with incomplete information or structural limitations on controls do not, in general, satisfy the contiguity condition. Control problems for such systems may have optimal solutions which, however, cannot be obtained by dynamic programming. This fact is shown on example of a widely used engineering system for which optimal trajectories have all its remaining parts non optimal and non contiguous to the optimal trajectories. The paper presents theoretical justification of dynamic programming for contiguous systems without the principle of optimality, and discussion of some problems important for applications.
format Article
author Galperin, E.A.
author_facet Galperin, E.A.
author_sort Galperin, E.A.
title Contiguity and Dynamic Programming
title_short Contiguity and Dynamic Programming
title_full Contiguity and Dynamic Programming
title_fullStr Contiguity and Dynamic Programming
title_full_unstemmed Contiguity and Dynamic Programming
title_sort contiguity and dynamic programming
publisher Інститут проблем моделювання в енергетиці ім. Г.Є. Пухова НАН України
publishDate 2007
topic_facet Математические методы и модели
url https://nasplib.isofts.kiev.ua/handle/123456789/101822
citation_txt Contiguity and Dynamic Programming / E.A. Galperin // Электронное моделирование. — 2007. — Т. 29, № 6. — С. 23-36. — Бібліогр.: 11 назв. — англ.
series Электронное моделирование
work_keys_str_mv AT galperinea contiguityanddynamicprogramming
first_indexed 2025-11-29T06:51:16Z
last_indexed 2025-11-29T06:51:16Z
_version_ 1850106534253559808
fulltext E. A. Galperin Department of Mathematics Universite du Quebec a Montreal (C.P. 8888, Succ. Centre Ville, Montreal, Quebec H3C 3P8, Canada E-mail:galperin.efim@uqam.ca) Contiguity and Dynamic Programming (Recommended by Prof. A. Martynyuk) Some aspects of the principle of optimality [1, p.83] are considered, and a modification is pro- posed for the derivation of the main functional equations of dynamic programming to demon- strate that those equations are valid also in the case of non-optimal remaining trajectories under certain contiguity condition that is defined and analyzed in the paper. Control systems with in- complete information or structural limitations on controls do not, in general, satisfy the contiguity condition. Control problems for such systems may have optimal solutions which, however, can- not be obtained by dynamic programming. This fact is shown on example of a widely used engi- neering system for which optimal trajectories have all its remaining parts non optimal and non contiguous to the optimal trajectories. The paper presents theoretical justification of dynamic programming for contiguous systems without the principle of optimality, and discussion of some problems important for applications. Ðàññìîòðåíû íåêîòîðûå àñïåêòû ïðèíöèïà îïòèìàëüíîñòè [1, ñ. 83] è ïðåäëîæåíà åãî ìîäèôèêàöèÿ äëÿ âûâîäà îñíîâíûõ ôóíêöèîíàëüíûõ óðàâíåíèé äèíàìè÷åñêîãî ïðîãðàì- ìèðîâàíèÿ, ÷òîáû ïîêàçàòü, ÷òî ýòè óðàâíåíèÿ ñïðàâåäëèâû òàêæå â ñëó÷àå íåîïòèìàëü- íûõ îñòàòî÷íûõ òðàåêòîðèé ïðè íåêîòîðîì óñëîâèè ñìåæíîñòè. Ïðèâåäåíû îïðåäåëåíèå è àíàëèç ýòîãî óñëîâèÿ. Ñèñòåìû óïðàâëåíèÿ ñ íåïîëíîé èíôîðìàöèåé èëè ñòðóêòóðíûìè îãðàíè÷åíèÿìè óïðàâëåíèÿ íå óäîâëåòâîðÿþò óñëîâèþ ñìåæíîñòè â îáùåì ñëó÷àå. Çàäà- ÷è óïðàâëåíèÿ äëÿ òàêèõ ñèñòåì ìîãóò èìåòü îïòèìàëüíûå ðåøåíèÿ, íî èõ íåëüçÿ ïîëó- ÷èòü ìåòîäîì äèíàìè÷åñêîãî ïðîãðàììèðîâàíèÿ. Ýòîò ôàêò ïðîäåìîíñòðèðîâàí íà ïðè- ìåðå øèðîêî ïðèìåíÿåìîé òåõíè÷åñêîé ñèñòåìû, äëÿ êîòîðîé âñå îñòàòî÷íûå ÷àñòè îïòèìàëüíûõ òðàåêòîðèé íå îïòèìàëüíû è íå ñìåæíû ñ îïòèìàëüíûìè òðàåêòîðèÿìè. Äàíî òåîðåòè÷åñêîå îáîñíîâàíèå äèíàìè÷åñêîãî ïðîãðàììèðîâàíèÿ äëÿ ñìåæíûõ ñèñòåì áåç ïðèíöèïà îïòèìàëüíîñòè è ðàññìîòðåíû íåêîòîðûå âàæíûå ïðèêëàäíûå çàäà÷è. K e y w o r d s: dynamic programming, total optimality, contiguity. 1. Introduction. In the ten line section from [1, § 3, p. 83] we read: «In each pro- cess, the functional equation governing the process was obtained by an applica- tion of the following intuitive: Principle of optimality. An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 23 The mathematical transliteration of this simple principle will yield all the functional equations we shall encounter throughout the remainder of the book. A proof by contradiction is immediate». In control literature, the mathematical transliteration of the above principle for deterministic processes is related to certain types of functional equations, see, e.g., [1—8]. Now we reproduce some of them in authors’ notations from [1, Ch. IX, § 13] where a new formalization of the calculus of variations is proposed for non-autonomous problems f a c J u F x u t dt u u a T ( , ) max ( ) max ( , , )� � � , dx dt G x u t� ( , , ), 0 � � �a t T, x a c( ) � , (1) parameterized by a, c with variable a T�[ , )0 . Splitting the integral over two seg- ments [a, a + S] and [a + S, T], we have, according to the principle of optimality, f a c F x u t dt f a S c a S u a a S a a S ( , ) max ( , , ) ( , ( )) [ , ] � � � � � � � � � � � � , (2) for all a and all S such that 0 � � �a a S T. For small S this yields f a c F c u a a S f a c Sf Sf dc S u a a S a c( , ) max [ ( , ( ), ) ( , ) ( [ , ] � � � � � ) / ( )]dS o S� . (3) In (3) the terms f (a, c) cancel out and, dividing by S we get, to the first order as S � 0 0 � � �max [ ( , ( ), ) ( , ( ), )] ( )u a a cF c u a a f f G c u a a , (4) � � �f F c v a f G c v aa v cmax [ ( , , ) ( , , )] , v u a� ( ) , (5) and for optimal v = vo in the interior of the region, we get, cf. [1, Ch. IX, § 13, (8)]: � � �f F c v a f G c v aa o c o( , , ) ( , , ) , (6) 0 � �F c v a f G c v av o c v o( , , ) ( , , ) . (7) Solving this system for fa , fc and equating fac = fca yields a partial differential equation for vo(a, c) which is the extremal control uo(t, x (t)) . Remark 1. If the extremal (supposed to be optimal) vo of (5), is not in the interior of the admissible region, then PDEs (6), (7), must be replaced by extremality (stationarity) conditions of the Karush — Kuhn — Tucker type. Mo- reover, in this case the classical theorem on mixed partial derivatives to find uo, vo may be inapplicable. In engineering and economics, optimal policies (con- trols) usually belong to boundaries of admissible regions for controls and states (trajectories), which necessitates the replacement of those PDEs (not the Bell- man equation (5)). This extension is beyond the scope of the paper. E. A. Galperin 24 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6 There is vast literature on the subject, see, e.g., [1—8] and references therein, mainly concerned with the application of the optimality principle to var- ious kinds of problems, derivation of the corresponding functional equations, their solution by different methods under certain assumptions and conditions, in- vestigation of the existence of solutions and stability properties, numerical studies and computational experiments, etc. However, the principle of optimality is an important statement in its own right and it is interesting to study it as such, without an accent on its numerous applications, particular functional equations, technical details or connections with other theories. The statement of the principle of optimality is very general. It does not specify a notion of optimality to which the principle is referred. In [1], it is applied to max (min) optimality for certain classes of functionals, and to optimality in games with a saddle point. The statement does not specify a set of possible de- cisions (admissible controls), nor a class of problems to which it is applicable, nor allowable restrictions, etc., leaving one to believe that it is a universally valid principle, i.e., «a comprehensive and fundamental law, doctrine, or as- sumption» (Webster’s dictionary), under the sole hypothesis that an optimal policy exists. It also does not require convexity, monotonicity, differentiability, etc., though such properties may be imposed on a case-to-case basis in order to obtain or solve functional equations resulting from the principle and to assure that an extremal (stationary) solution is, in fact, optimal. 2. Observations. The formulation of the principle as presented in [1, p. 83], see Section 1 above, contains some explicit and implicit assumptions, such as the existence of an optimal policy which includes optimal remaining decisions as its parts, the existence of «the state resulting from the first decision», and the exis- tence of non-optimal policies and corresponding trajectories emanating from the same initial or intermediate state along an optimal trajectory. Among the most important suppositions, we can list the following properties. 2.1. The congruence property. For simplicity, suppose first that an optimal policy is unique and defined by a sequence of decisions { , ..., , ..., }u u un N1 or by a continuous function u (t), 0 � �t T, T � �. Then, «the remaining decisions must constitute an optimal policy with re- gard to the state resulting from the first decision». Denote this subsequent opti- mal policy by the sequence{ , ..., }v vn N , 1 < n < N, or, respectively, by the func- tion v (t), 0 1� � �t t T, with the understanding that n or t1 correspond to the state resulting from decisions { , ..., }u un1 1� , or u (t), 0 1� �t t . The congruence property is the requirement of the principle of optimality that vi = ui , i = n, ..., N, or, respectively, v (t) � u (t), t t T1 � . Indeed , otherwise, { , ..., }u u N1 , or u (t), 0 � t T, would not constitute an optimal policy as specified in the principle, if Contiguity and Dynamic Programming ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 25 an optimal policy is unique. This can be taken as an immediate proof by con- tradiction which, indeed, appeared in the literature, see Section 4. If an optimal policy is not unique, then ui (i = 1, ..., N) and u (t) above may be set-valued. In this case, the congruence property means that v ui i� , i = = n, ..., N, or v (t) � u (t), t1 � t < T. For this and other reasons, see Section 5, we prefer to consider congruence as a separate assumption meaning the inclusion of «remaining decisions» in the remaining part of the original policy, without re- quirement of their optimality. There are different terms expressing this notion in the literature, e.g., «joining controls» in [6, p. 83], or «concatenated» pieces of admissible trajectory in [7, p. 87] . 2.2 Additivity of costs in the optimality criterion (cost function). This assumption can be seen from the functional equation (2) and other similar equa- tions derived from the principle of optimality. In some sources, e. g., in [6, pp. 84, 85], it is postulated separately in order to prove the principle, see Section 3. In integral criteria, it follows from the additivity of the integral with respect to in- tervals of integration. This assumption represents a restriction on the class of problems to which the principle of optimality is applied. 2.3 Feedback structure of optimal policies. This property is implied by the statement that «the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision». It can be seen from var- ious equations in [1 — 4] that define the optimal policy, from (6), (7) for vo (a, c) � uo(t, x (t)), and from other similar equations in the literature. 2.4. Independence on the preceding data and (or) decisions. The opti- mality of remaining decisions «with regard to the state resulting from the first decision» must not explicitly depend on the initial state nor the initial deci- sion. This requirement is manifested by the word «whatever» in the formula- tion of the principle. In practice, it means that feedback controls obtained from equations implied by the principle explicitly depend on the current state, and not on the initial state nor preceding states, nor preceding decisions. 2.5. Invariance of the Bellman function f (�). This hypothesis is not men- tioned in the principle itself as formulated in [1, p. 83], see Section 1, and it does not follow from the principle nor from any of the preceding observations. How- ever, it is used in derivation of the functional equations and PDEs of dynamic programming, see (2), (3) at left and in the bracket, and it is tacitly assumed in applications and research works that followed in subsequent publications. This hypothesis is supported by a number of solved examples, though without any theoretical justification. The invariance of the function f (�) may be violated with modification of parameters or arguments at later steps (subsequent parts) of con- trolled trajectory. 3. Proof by contradiction. Geometrically, the principle of optimality can be illustrated as follows. Consider a period of time [t0, T], T � �, an admissible E. A. Galperin 26 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6 region X (state space), and denote by x (t)� X a point on a trajectory which may depend on a policy (control) u (x, t) applied to the process to assure the optimality of some criterion J (u (�), ��� say, max J, see [1, 2]. A trajectory x [t0, T) can be discrete or continuous. Admissible controls are piecewise continuous func- tions, and the cost function (optimality criterion) J (u(� ), �� is assumed to be additive [1—8], that is J t t J t t J t t[ , ] [ , ] [ , ]1 3 1 2 2 3� � , t t t t T0 1 2 3� � . (8) The policy (control) u t t u x t J uo[ , ] ( , ) argmax ( , )1 3 � � � , t t t�[ , ]1 3 (9) and the corresponding piece of trajectory x t t[ , ]1 3 are called optimal. The state- ment of the principle is simple: if u t T[ , )0 is optimal for the whole trajectory x t T[ , )0 , then remaining parts u [t, T), x [t, T) are also optimal for any t t T�( , )0 , provided that x t x t T( ) [ , )� 0 . Shortly: remaining parts of an optimal policy and trajectory are themselves optimal with respect to the same criterion. In some sources, see, e.g., [8, footnote to (4.2.16)], this condition is quali- fied as sufficient condition. Other sources present a proof of the principle as a necessary condition of optimality. In [6, pp. 86, 87], or [7, p. 87], a simple proof of the principle is given which can be summarized as follows. P r o o f. Setting t1 = t0 , t3 = T in (8) and assuming the existence of optimal policies (trajectories) starting at any point x t x t T( ) [ , )� 0 , one can see that if J t T J o[ , ]0 � is optimal, say, J J t To � max [ , ]0 for some uo(t), t t T�( , )0 , then it implies max [ , ] [ , ] max [ , ]J t T J t t J t T0 0 2 2� � . (10) Indeed, if J t T[ , ]2 in (10) is not maximal, then there is a better policy u t T[ , ]2 , thus, J t T[ , ]0 at the left in (10) is not optimal, contradicting the as- sumption of its optimality. 4. Total optimality. Theorem 1. If for an optimal trajectory, all remaining decisions are optimal, then every part of that trajectory is itself optimal. P r o o f. Consider relation (8) again with t3 = T, not fixing t1 nor t2. By assumption, J [t1, T], J [t2, T] are optimal, say, maximal, as corresponding to optimal remaining decisions, thus, instead of (8), we can write max [ , ] max [ , ] max [ , ]J t T J t t J t T1 1 2 2� � , t t t T0 1 2� � . (11) Now, if J t t[ , ]1 2 in (11) is not maximal, there is a better policy u t t[ , ]1 2 , thus, J t T[ , ]1 at the left in (11) is not optimal, contradicting the assumption of its optimality. Contiguity and Dynamic Programming ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 27 C o r o l l a r y. If two remaining parts x t T[ , ]1 , x t T[ , ]2 of a trajectory (not necessarily optimal) are optimal, then the piece x t t[ , ]1 2 of that trajectory is also optimal. D e f i n i t i o n. If all parts of a trajectory including the whole trajectory itself are optimal, then such a trajectory (process) is called totally optimal. We see that, according to the proof by contradiction, the principle of optimality under assumptions 2.1—2.5 implies total optimality. It is well known that certain optimal autonomous systems are totally optimal. Certain optimal non autonomous systems may also be totally optimal. Clearly, every totally opti- mal system satisfies the principle of optimality. 5. Contiguity and dynamic programming. If an optimal trajectory (pol- icy) is not totally optimal, then the principle of optimality is not valid for such a trajectory (policy). However, a modification of the derivation of the functional equations for the optimal cost and optimal policy allows one to conclude that the same resulting equations (5) — (7) hold under certain supplementary condition, irrespective of the principle of optimality. As in [1], we make a blanket assumption that all functions have continuous par- tial derivatives up to the order we may need, and that optimal trajectories we may consider do exist. In the course of derivation, we shall indicate all additional condi- tions that are required and must be verified when solving practical problems. If the principle of optimality does not apply to the problem, then the transi- tion from (1) to (2) is not valid. Moreover, an optimal cost f (a, c) in (1) for the remaining policies on [a, T), a > 0, need not be preserved as the same function for all a because the optimal policy ut o ( )� in the integrand may not be congruent with ua o ( )� , that is, u t u tt o a o( ) ( )� for t > a, where ut o ( )� , ua o ( )� are curves, not partial derivatives. With this in mind, denote f a c J u F x u t dta u u a T ( , ) max ( ) max ( , , )� � � , (12) where c = x (a) is an intermediate point for t = a� [0, T) . Now, we have f a c F x u t dt F x u t dta u a T a a S a S ( , ) max ( , , ) ( , , ) [ , ] � � � � � � � T � � � � ; (13) f a c F x u t dt f a S x a S ga u a a S a S a( , ) max ( , , ) ( , ( )) [ , ] � � � � � � � �S a a S ( )� � � � � � � � , (14) where g a S� �( ) is the difference between the value of the second integral in (13) with the optimal control u a To [ , ), and the optimal value f a S� �( ) of the same in- E. A. Galperin 28 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6 tegral maximized separately by u*[a + S, T) which may not be restriction of u a To [ , ) onto [a + S, T), that is, u t u to* ( ) ( )� for t � [a + S, T) (non-congru- ence), if the trajectory is not totally optimal. The weak invariance hypothesis: assume that lim ( ) ( )f fa S a� � � � as S � 0 for every a � [0, T), and same for its partial derivatives of the first order. Now, expanding the first two terms of (14) into Taylor series with respect to S and denoting v = u(a), we obtain f a c F c v a S f a c Sf Sfa u a a S a a a c a( , ) max [ ( , , ) ( , ) ( ) [ , ] � � � � � � ( ) ( ) /� �dc S dS � � ��o S g a S( ) ( )] . (15) In (15), we cancel out the term f a ca ( , ), then divide the equation by S > 0, take f a ca a ( , ) to the left hand side, and consider S � 0. Suppose that the following assumption holds lim ( ) / S a Sg S � � � � 0 0 for all a � (0, T), (16) which we call the contiguity condition. Assume also that the Taylor series in (15) is convergent, and all terms in the bracket (14) with their respective partial derivatives are continuous in all arguments. Then noting that u (a) = v, x (a) = c, we get the equation � � �f a c F c v a f a c G c v aa a v c a( , ) max [ ( , , ) ( , ) ( , , )], a � [0. T) (17) and superscript a is redundant because of (16), thus, (17) is identical to (5). Since by notation a = t, c = x (t), v = u (a) = u (t), so relation (17) can be written in the usual form � � �f t x F x u t f t x G x u tt u t x( , ) max [ ( , , ) ( , ) ( , , )] ( ) , t � [0, T). (18) If optimal uo(t) is in the interior of the admissible region, then we have the equa- tions � � �f t x F x u t f t x G x u tt o x o( , ) ( , , ) ( , ) ( , , ), t � [0, T), (19) 0 � �F x u t f t x G x u tu o x u o( , , ) ( , ) ( , , ), t � [0, T) (20) which are identical to (6), (7). This means that the contiguity condition (16) is sufficient for derivation of those equations without the optimality of remaining trajectories. Vice versa, it is easy to see that, if the limit in (16) is nonzero or does not exist, the equation (17) does not follow from (15), thus, equations (19), (20) are invalid despite sufficient smoothness and convergence of the Taylor series in (15). Hence, under those regularity conditions, the following statement is proved: Contiguity and Dynamic Programming ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 29 Theorem 2. Under the Weak Invariance Hypothesis and the assumptions cited in observations 2.2, 2.4, the derivation of the functional equations and PDEs of dynamic programming is valid if and only if the contiguity condition (16) is satisfied, irrespective of the principle of optimality. Remark 2. It is worth noting that for totally optimal systems (e.g., certain au- tonomous systems), we have g a S� � �( ) 0 for all a, S which means precisely the optimality of all remaining trajectories. Also, the validity of derivation may not imply the existence of solutions (same as indicated in [1] on many occasions). Contiguity condition (16) can be simplified. To make it independent of the concrete problem above and special notations in (12) — (16), we return to con- siderations of Sections 3, 4, see (8), (10), (11). In our notations of (11), the limit as S � 0 [ [1], Ch. IX, § 13], corresponds to the limit as t2� t1, t1 � [t0, T) in (11). Without optimality of remaining decisions, we have to consider two differ- ent trajectories and, instead of (11), the following relations: x t T x t t x t T t t [ , ] lim ( [ , ] [ , ])* 1 1 2 2 2 1 � � � , x t x t( ) ( )* 2 2� , t t t T0 1 2� , (21) J x t T J x t t J x t T t t ( [ , ]) lim{ ( [ , ]) max ( [ , ])}* 1 1 2 2 2 1 � � � . (22) (The set-wise limit in (21) means simple coincidence as t2 = t1, t1� [t0, T), with- out consideration of various definitions of such limits.) Here x t T[ , ]1 designates a remaining (not necessarily optimal on [ , ]t T1 ) part of an entire optimal trajectory x t To [ , ]0 , on which part the value J x t T( [ , ])1 is attained under original optimal control u t To [ , )0 . If we remove the equality signs at left and the limits in (21), (22), then x t t[ , ]1 2 designates the initial part of semi-trajectory x t T[ , ]1 with the value J x t t( [ , ])1 2 upon x t t[ , ]1 2 . Notation x t T* [ , ]2 means certain optimal semi-trajectory starting at x (t2) with the value max J x t T( [ , ])* 2 attained on it. This semi-trajectory x t T* [ , ]2 may not coin- cide with the remaining part x t T[ , ]2 of the trajectory x t T[ , ]1 . In cases when the principle of optimality is valid for the system under con- sideration, we have identically x t T* [ , ]2 � x t T[ , ]2 , equalities in (21), (22) are valid without the limit operation, relation (22) coincides with (11), and relation (21) becomes trivial tautology omitted in the proof of Theorem 1. If the principle of optimality is invalid, then x*(t) � x(t), t� (t2,T], but rela- tions (21), (22) may still hold in the limit for any t1�[t0, T). A careful reader can see that it is the equalities of the type (21), (22) that are actually necessary and sufficient for the derivation of the functional equations and PDEs of dynamic programming in [1, Ch.9, § 3, 6, 13, 22], as it is presented above in (12) to (20). Denote J x t To( [ , ])2 the (non optimal) value of J(·) over remaining part x t To [ , ]2 of the semi-trajectory x t T[ , ]1 , t2 > t1 = a � t0, under original optimal E. A. Galperin 30 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6 control uo(t)� uo[t0, T]. Let extr J = J (x*[t2, T]) be the extremal (max or min) value of J(�) over a different semi-trajectory x*[t2, T] starting at the same point x*(t2) = x to ( )2 but optimized separately by some control u*(t) � uo(t), t �[t2, T). Instead of g a S� �( ) in (14) — (16), consider the functional C t t C u t T u t T J t x t T J xo o[ , ]: ( [ , ], [ , ] ( , [ , ]) ( [* 1 2 1 2 1 2� � � * t T2 , ]), u t T u t To o[ , ] [ , ]1 0� , (23) where t1, t2 are variable markers indicating two starting points of two non congruent remaining trajectories. The quantity C t t[ , ]1 2 of (23) is identical to g a S� �( ) > 0 for min-problem, and to g a S� �( ) < 0 for max-problem as in (14), (15). The value J t x t To( , [ , ])1 2 is equal to the second integral in (13), and J x t T( [ , ])* 2 equals the term f a S� �( ) in (14). The functional (23) is a measure of J-proximity of semi-trajectories x t To [ , ]2 , x t T* [ , ]2 of which x t T* [ , ]2 is optimal with respect to the same crite- rion J (·) but does not coincide with (non optimal) remaining part x t To [ , ]2 . For to- tally optimal systems (where the principle of optimality is valid), we have C [·] � 0. Otherwise, C [·] should be of higher order than the time difference t t2 1 0� � for all t1�[t0, T) in accordance with (16) to assure the applicability of dynamic pro- gramming methods. Denote S = t2 – t1 (same S as in [1, Ch. 9, § 3, 6, 13, 22]), then (23) can be re- written as follows: C t t S C u t T u t S To[ , ]: ( [ , ], [ , ]* 1 1 1 1� � � � � � � �J t x t S T J x t S To( , [ , ]) ( [ , ])1 1 1 * , (24) where t1 = a in (12) to (16), and we get an equivalent representation for (16): lim{ [ , ]/ } S C t t S S � � � 0 1 1 0 for all t t T1 0�[ , ) . (25) Under assumptions of continuous partial derivatives adopted in [1], the functional C t t S[ , ]1 1 � is continuously differentiable with respect to S. Con- sider the limit C t C t t S J t x t S T J x S S o( ) lim [ , ] lim{ ( , [ , ]) (1 0 1 1 0 1 1� � � � � � � * [ , ])}t S T1 � . (26) If C (t1) � 0, then in (25) we have ± �, and non-contiguity follows. If C t( )1 0� , the l’Hopital rule applies, and (25) is equivalent to the condition lim{ [ , ]/ } S dC t t S dS � � � 0 1 1 0 for all t t T1 0�[ , ), (27) Contiguity and Dynamic Programming ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 31 which is the contiguity condition for general systems and functionals, not neces- sarily of the type considered in [1, 2], as in (12). 6. Non contiguous optimal dynamical systems. It is tempting to think that dynamic programming, applicable even without the principle of optimality, might be universally applicable to almost all systems which most likely satisfy the Invariance Hypothesis and the contiguity condition, with the exception of, maybe, some pathological examples. Unfortunately, this is not the case. Non-contiguity (i.e., non satisfaction of the contiguity condition) in optimal dynamical systems is a rather fine property that invalidates the extension (appli- cation) of an optimal control over some part of a trajectory for use over another part of the same trajectory. Strangely as it may seem, non-contiguous systems are common in practice. In such cases, dynamic programming is inapplicable, and other methods can be employed, see [5—11]. Systems with incomplete information and/or structure constraints are gener- ally non-contiguous, even if they are autonomous. To illustrate the existence of such systems, we consider an engineering problem, see [10; 11, p. 4—7], of opti- mal stabilization of a stationary linear oscillator (engines, turbines, vibratory machines, etc.) which we study in two distinct settings to demonstrate the funda- mental difference in regard to optimality that may occur with modifications in the structure of a controller. Example 1 [10]. In all textbooks on undergraduate mathematics, mechan- ics, or control theory, the following linear oscillator equation is considered d x dt x u2 2/ � � , t � 0, x (0) = x0, dx (0)/dt = v0. (28) Denoting dx/dt = v (velocity), we convert (28) into the normal form: dx/dt = v, dv/dt = – x + u, t � 0, x (0) = x0, v (0) = v0. (29) The oscillator (28) or (29) should be stabilized to minimize the functional f (x0 , v0) = min J (u) = ( )ax bv cu dt2 2 2 0 � � � � , a, b, c = const > 0. (30) The system is autonomous, its optimal trajectories are totally optimal, the prin- ciple of optimality applies, thus, denoting f (x, v) = min J (u) for t > 0, the Bellman equation follows with fT � 0 for T = � (fixed), cf. [1, Ch. IX, § 3, (12)] : fT (x, v) = min [ax2 + bv2 + cu2 + v f x� �/ + (– x + u) � �f v/ ] = 0. (31) With the control space open, it implies for optimal uo, 2cuo + �f /�v = 0. (32) E. A. Galperin 32 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6 Substituting optimal control uo from (32) into (31) (which removes the min operation) and solving the nonlinear PDE, we obtain the minimal cost (J (u) of (30) is strictly convex) and the unique optimal feedback : f (x, v) = (c + a)1 2/ (b – 2cp)1 2/ x2 – 2cpxv + (bc – 2c2p)1 2/ v2, (33) uo = px – ( b/c – 2p)1 2/ v, p = 1– (1 + a/c)1 2/ . (34) We see that feedback (34) does not depend on initial conditions in (29). It means that every remaining part of an optimal trajectory is itself optimal. Also, for fixed T , the equation (31) does not depend on boundary conditions x (t0), x (T), 0 � t0 < T < �� which means that every intermediate part x [t0, T] of the op- timal trajectory is itself optimal (total optimality) with the same control (34) (congruence), cf. Theorem 1. Example 2 [10]. Suppose that the space coordinate x(t) cannot be measured (fluid friction which is common in engineering), thus, the control must be of the form u = u (v). If a = 0 in (30), then p = 0, and uo of (34) is feasible. However, in (30) we have a > 0 , thus, in (34) p < 0 and uo is infeasible. In the class of con- trols u (v) depending on velocity, equation (31) does not have a solution, though «approximate» solutions with small | p | do exist, all non optimal since they actu- ally correspond to small a > 0 in (30). In the spirit of dynamic programming and noting the result (34), let us look for the optimal feedback in the class of controls u = – 2qv, q > 0, (35) which represents a nonsingular Lagrange problem of the calculus of variations with two constraints (29) if we eliminate u from (30) by the substitution u = x + �v from (29), and then substitute (35) into (29). For this class of controls, one can also write the Bellman equation by simply substituting (35) into (31), and then considering q (�) as a control. Denoting u = = �2q (�) v = uo(�), one gets the same equations (31), (32) with the solution (34) — nothing new. However, the optimal regulator of the form (35) with q = const (optimal dampening) does exist. For system (29), (30), (35), the characteristic equation is r2 + 2qr + 1 = 0, so that Re r1,2 < 0, the integral (30) is convergent and, for q � 1, it has the value J x v q a c x cv q a b x v q ax( , , ) [( ) ] ( )( )0 0 0 2 0 2 1 4 0 2 0 2 1� � � � � � �� 0 0v . (36) From the equation � J/� q = 0, we find the extremal value q a b x v a c x cv0 2 1 4 0 2 0 2 0 2 0 2� � � � �( )( ) / [( ) ], q 0 0� , (37) Contiguity and Dynamic Programming ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 33 for which �2J/�q2 > 0, yielding the minimum in (36), see [10, formulae (5.7)—(5.12)]. Since J (� ) of (36) is continuously differentiable at q = 1, for all values of parameters in (36), formulae (37), (36) are valid also for q = 1, cover- ing the whole range of possible friction fluid densities. Comparing (34) and (35), (37), one can see that: 1) uo of (34) does not depend on initial data meaning that uo is optimal not only for any remaining part of an optimal trajectory (Bellman’s principle of optimality) but also for all trajectories in the feasible space X. It means that equations (33), (34) establish a property which may be called field optimality in the sense that for the system (28) — (30) there exists one single optimal con- trol (34) such that every trajectory starting from any point x � X = R2 is optimal . Moreover, there is a unique surface (33) which renders the optimal value f (x) of the functional (30) on the trajectory starting from that point x. 2) u = – q0 v of (35), (37) does depend on initial data x v0 0, , meaning that, being optimal for the whole trajectory corresponding to those data, it is not opti- mal for any remaining part thereof, hence, Bellman’s principle of optimality is invalid in the class of controls (35), if a > 0 in (30). 3) If a = 0 in (30), we have u � uo , see (34), (35), (37), thus, for this class of functionals in (30), Bellman’s principle of optimality is valid in the class of controls (35) for the particular system (29). Moreover, field optimality is preserved. Nonexistence of solutions of the Bellman equation (31) for linear system (29), (30), (35), which equation can be formally written as proposed in [1], means that this autonomous system with incomplete information is non-contiguous. Its non-conti- guity invalidates the derivation of Bellman’s equations presented in [1]. In contrast, this derivation is valid for contiguous systems even without optimality of remaining semi-trajectories because the limit operation as S � 0 removes the necessity of ex- act coincidence (principle of optimality) of two (maybe, different) semi-trajectories starting at the same point x (t), t > t0 = 0. Those two semi-trajectories need not coincide. They must be touching at x (t + S) as S � 0 to the second or higher or- der, i.e., lie in a weak neighborhood of one another (cf. the curvature in the Weierstrass form of the Euler—Lagrange equations). 7. Concluding remarks and open problems. In this paper, the notions of total optimality and contiguity are defined, and a study of contiguity of dynami- cal systems is presented in application to dynamic programming. It is demon- strated that, according to the currently available proof [6, 7], the principle of optimality is a partial statement about total optimality. With free variations of controls, optimal autonomous systems are usually totally optimal. There are op- timal non autonomous systems that are also totally optimal. Whether or not ev- ery such system can be transformed (by a non autonomous transformation) into a totally optimal system, or reduced to such a system is an open question of much practical importance. E. A. Galperin 34 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6 Dynamic programming approach does not require optimality of remain- ing semi-trajectories: a much weaker contiguity condition would suffice. Many practical problems with structural constraints, incomplete information, and other specific requirements, do not satisfy the contiguity condition, even if such sys- tems are autonomous. In some cases, optimal problems for such systems can be reformulated to allow for formal setting of dynamic programming equations. Those equations may have solutions which may not respect the constraints. It is interesting to find out how much the costs on real trajectories (with constraints) would exceed the costs of the formal dynamic programming solution without constraints, or with constraints partially satisfied. Among other open questions, we can list the following problems: 1. Find an alternative proof of Bellman’s principle of optimality, different from that presented in Section 3, which would not imply total optimality, Sec- tion 4. 2. Prove or disprove the equivalence of the Weak Invariance Hypothesis (Section 5) and Bellman’s Invariance Hypothesis (Section 2, observation 2.5). 3. Find an example of a contiguous but not totally optimal dynamical sys- tem, or prove that contiguity implies total optimality. 4. Extend dynamic programming to the case when optimal trajectories are on the boundary, replacing gradient conditions by the Karush—Kuhn—Tucker type conditions. 5. Example 1 presents an instance where Bellman’s equations imply field optimality. Study the conditions for this phenomenon. Ðîçãëÿíóòî äåÿê³ àñïåêòè ïðèíöèïó îïòèìàëüíîñò³ [1, c.83] ³ çàïðîïîíîâàíî éîãî ìîäè- ô³êàö³þ äëÿ âèâåäåííÿ îñíîâíèõ ôóíêö³îíàëüíèõ ð³âíÿíü äèíàì³÷íîãî ïðîãðàìóâàííÿ, ùîá ïîêàçàòè, ùî ö³ ð³âíÿííÿ º ñïðàâåäëèâèìè òàêîæ ó âèïàäêó íåîïòèìàëüíèõ çàëèøêî- âèõ òðàºêòîð³é çà ïåâíîþ óìîâîþ ñóì³æíîñò³. Íàâåäåíî âèçíà÷åííÿ òà àíàë³ç ö³º¿ óìîâè. Ñèñòåìè óïðàâë³ííÿ ç íåïîâíîþ ³íôîðìàö³ºþ àáî ñòðóêòóðíèìè îáìåæåííÿìè óïðàâë³í- íÿ íå çàäîâîëüíÿþòü óìîâ³ ñóì³æíîñò³ ó çàãàëüíîìó âèïàäêó. Çàäà÷³ óïðàâë³ííÿ äëÿ òàêèõ ñèñòåì ìîæóòü ìàòè îïòèìàëüí³ ðîçâ’ÿçêè, àëå ¿õ íå ìîæíà îòðèìàòè ìåòîäîì äèíàì³÷- íîãî ïðîãðàìóâàííÿ. Öåé ôàêò ïîêàçàíî íà ïðèêëàä³ òåõí³÷íî¿ ñèñòåìè, ùî øèðîêî âèêîðèñòîâóºòüñÿ, äëÿ ÿêî¿ âñ³ çàëèøêîâ³ ÷àñòèíè îïòèìàëüíèõ òðàºêòîð³é íå îïòèìàëüí³ òà íå ñóì³æí³ ç îïòèìàëüíèìè òðàºêòîð³ÿìè. Äàíî òåîðåòè÷íå îá´ðóíòóâàííÿ äèíàì³÷íîãî ïðîãðàìóâàííÿ äëÿ ñóì³æíèõ ñèñòåì áåç ïðèíöèïó îïòèìàëüíîñò³ òà ðîçãëÿíóòî äåÿê³ âàæëèâ³ ïðèêëàäí³ çàäà÷³. 1. Bellman R. Dynamic Programming. — Princeton, N. J. : Princeton University Press, 1957. 2. Bellman R. E., Glicksberg I., Gross O. A. Some Aspects of the Mathematical Theory of Con- trol Processes. — Santa Monica, California: RAND Corporation, 1958. 3. Bellman R. Adaptive Control Processes: A Guided Tour. — Princeton, N. J. : Princeton Uni- versity Press, 1961. 4. Bellman R. E., Dreyfus S. E. Applied Dynamic Programming. — Princeton, N. J. : Princeton University Press, 1962. Contiguity and Dynamic Programming ISSN 0204–3572. Ýëåêòðîí. ìîäåëèðîâàíèå. 2007. Ò. 29. ¹ 6 35 5. Lee E., Marcus L. Foundations of Optimal Control Theory. — N.Y. : Wiley, 1967. 6. Leitmann G. The Calculus of Variations and Optimal Control. — N. Y. : Plenum Press, 1981. 7. Knowles G. An Introduction to Applied Optimal Control. — Academic Press, 1981. 8. Bryson A. E., Jr., and Yu-Chi Ho Applied Optimal Control. — Waltham, Massachusets: Blaisdell Publ. Co., a Division of Ginn & Co., 1969. 9. Galperin E. A. Translational symmetry of certain integrals with application to the calculus of variations and optimal control// Mathematical and Computer Modeling. — 2002. — 36. — P. 717—728. 10. Galperin E. A., Dergacheva E. I. Optimum stabilization of a linear system with incomplete information// Automation and Remote Control. — 1968. — 8. — P. 1193—1202. 11. Galperin E. A., Zheng Q. Global Solutions in Optimal Control and Games. — Montreal: NP Research Publ., 1991. Ïîñòóïèëà 24.11.06 E. A. Galperin 36 ISSN 0204–3572. Electronic Modeling. 2007. V. 29. ¹ 6