Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case

The paper deals with the asymptotic properties of an online learning procedure for identifying non-linear systems via neural networks models of these systems. The probabilistic convergence condi-tions of this procedure are presented for the special case where a nonlinearity can exactly be ap-proxima...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Індуктивне моделювання складних систем
Datum:2015
Hauptverfasser: Zhiteckii, L.S., Nikolaienko, S.A.
Format: Artikel
Sprache:Englisch
Veröffentlicht: Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України 2015
Online Zugang:https://nasplib.isofts.kiev.ua/handle/123456789/125021
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Назва журналу:Digital Library of Periodicals of National Academy of Sciences of Ukraine
Zitieren:Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case / L.S. Zhiteckii, S.A. Nikolaienko // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2015. — Вип. 7. — С. 46-58. — Бібліогр.: 27 назв. — англ

Institution

Digital Library of Periodicals of National Academy of Sciences of Ukraine
_version_ 1859638705772298240
author Zhiteckii, L.S.
Nikolaienko, S.A.
author_facet Zhiteckii, L.S.
Nikolaienko, S.A.
citation_txt Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case / L.S. Zhiteckii, S.A. Nikolaienko // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2015. — Вип. 7. — С. 46-58. — Бібліогр.: 27 назв. — англ
collection DSpace DC
container_title Індуктивне моделювання складних систем
description The paper deals with the asymptotic properties of an online learning procedure for identifying non-linear systems via neural networks models of these systems. The probabilistic convergence condi-tions of this procedure are presented for the special case where a nonlinearity can exactly be ap-proximated by a suitable neural network. Keywords: identification, nonlinear system, neural network, learning algorithm, stochastic environment, convergence. Стаття стосується асимптотичних властивостей деякої процедури навчання в реальному часі для ідентифікації нелінійних систем з використанням нейронних мереж як моделей цих систем. Представлені умови ймовірносної збіжності цієї процедури для спеціального випадку, коли нелінійність може бути точно апроксимована належною нейронною мережею. Ключові слова: ідентифікація, нелінійна система, нейронна мережа, алгоритм навчання, стохастичне середовище, збіжність. Статья касается асимптотических свойств некоторой процедуры обучения в реальном време-ни для идентификации нелинейных систем с использованием нейронных сетей в качестве моделей этих систем. Представлении условия вероятностной сходимости этой процедуры для специального случая, когда нелинейность может быть точно аппроксимирована подхо-дящей нейронной сетью.
first_indexed 2025-12-07T13:19:29Z
format Article
fulltext Convergence of sequential gradient Індуктивне моделювання складних систем, випуск 7, 2015 46 UDC 681.5 CONVERGENCE OF SEQUENTIAL GRADIENT LEARNING ALGORITHMS IN NEURAL NETWORKS FOR ONLINE IDENTIFICATION OF NONLINEAR SYSTEMS: A SPECIAL CASE L.S. Zhiteckii, S.A. Nikolaienko International Research and Training Center for Information Technologies and Systems leonid_zhiteckii@i.ua Стаття стосується асимптотичних властивостей деякої процедури навчання в реальному часі для ідентифікації нелінійних систем з використанням нейронних мереж як моделей цих систем. Представлені умови ймовірносної збіжності цієї процедури для спеціального випадку, коли нелінійність може бути точно апроксимована належною нейронною мережею. Ключові слова: ідентифікація, нелінійна система, нейронна мережа, алгоритм навчання, стохастичне середовище, збіжність. The paper deals with the asymptotic properties of an online learning procedure for identifying non- linear systems via neural networks models of these systems. The probabilistic convergence condi- tions of this procedure are presented for the special case where a nonlinearity can exactly be ap- proximated by a suitable neural network. Keywords: identification, nonlinear system, neural network, learning algorithm, stochastic environment, convergence. Статья касается асимптотических свойств некоторой процедуры обучения в реальном време- ни для идентификации нелинейных систем с использованием нейронных сетей в качестве моделей этих систем. Представлении условия вероятностной сходимости этой процедуры для специального случая, когда нелинейность может быть точно аппроксимирована подхо- дящей нейронной сетью. Ключевые слова: идентификация, нелинейная система, нейронная сеть, алгоритм обучения, стохастическая среда, сходимость. Introduction. The problem of identifying complex unknown systems in the presence of noise remains important from both theoretical and practical point of view up to now. Significant progress in this research area were achieved in the frameworks of well-known group method of data handling (GMDH) advanced by A. G. Ivakhnenko in the late 1960s to deal with a finite set of training examples to be used for deriving mathematical models of unknown systems [1]. Over the past decades, interest has been increasing toward the use of multilayer neural networks as models for the adaptive identification of nonlinearly parameterized dynamic systems [2–5]. Several learning methods for updating the weights of neural networks have been advanced in literature. Most of these methods rely on the gradient concept [5, 6]. Although this concept has been successfully used in many empirical studies, there are very few fundamental results dealing with the convergence of gradient algorithms for learning neural networks. One of these results is based on utilizing the Lyapunov stability theory [3, 6]. Zhiteckii L.S., Nikolaienko S.A. Індуктивне моделювання складних систем, випуск 7, 2015 47 The asymptotic behavior of online adaptive gradient algorithms for the network learning has been studied by many authors [7–22]. In particular, the convergence of the learning process for the so-called feedforward network models with single hidden layer is investigated in [7] by using the stochastic approximation theory. The conver- gence results have been derived in [9–15] among many others provided that input signals have a probabilistic nature. In their stochastic approach, the learning rate goes to zero as the learning process tends to infinity. Unfortunately, this gives that the learning goes faster in the beginning and slows down in the late stage. The convergence analysis of learning algorithm with deterministic (non- stochastic) nature has been given in [16–21]. In contrast to the stochastic approach, several of these results allow to employ a constant learning rate [18, 22]. However, they assume that learning set must be finite whereas in online identification schemes, this set is theoretically infinite. To the best of author’s knowledge, there are no gen- eral results in literature concerning the global convergence properties of training pro- cedures with a fixed learning rate applicable to the case of infinite learning set. The distinguishing feature of multi-layer neural networks is that they describe some nonlinearly parameterized models needed to be identified. This leads to diffi- culties in deriving their convergence properties for a general case. To avoid these dif- ficulties in non-stochastic case, the assumption that similar nonlinear functions need to be convex (concave) is introduced in [23]. However, such an assumption is not ap- propriate for neural network’s description of nonlinearity. A popular approach to analyze the asymptotic behavior of online gradient algo- rithms in stochastic case is based on Martingale convergence theory [24]. This ap- proach has been exploited in [25, 26] to derive some local convergence in stochastic framework for standard online gradient algorithms with the constant learning rate. This paper is an extension of [25, 26]. The main efforts is focused on establish- ing sufficient conditions under which the global convergence of gradient algorithm for learning neural networks models in the stochastic environments will be achieved. The key idea in deriving these convergence results is based on the use of the Lyapu- nov methodology [27]. 1. System identification using a neural network model Let )())(()( nnxFny (1) be the nonlinear equation in the compact form describing a complex system to be identified. In this equation, IR)(ny and N nx IR)( are the scalar output and the so-called state vector, respectively, available for the measurement at each nth time instant, )(n is noise at some time instant ),,2,1( n and IRIR N F : represents some unknown nonlinear mapping. (Note that )(nx may include the cur- rent inputs of this system and possibly its past inputs and also outputs; see [6, sect. Convergence of sequential gradient Індуктивне моделювання складних систем, випуск 7, 2015 48 5.15].) Without loss of generality, one supposes that the nonlinearity )(xF is the con- tinuous and smooth function on a bounded set N X IR ).( Xdiam To approximate )(xF by a suitable nonlinearly parameterized function, the two-layer neural network model containing M )1(M neurons in its hidden layer is employed. The inputs to the each jth neuron of this layer at the time instant n are the components of ).(nx Its output signal at the nth time instant is specified as ,)()( 1 )1()1()1( N i iijjj nxwbny ,,,1 Mj  (2) where )(nxi denotes the ith component of ),(nx and )1( ijw and )1( jb are the weight coefficients and the bias of this jth neuron, respectively. )( denotes the so-called activation function defined usually as the sigmoid functions )exp(1 1 )( s s (3) or ).(tanh)( ss (4) There is only one neuron in the output (second) layer, whose inputs are the outputs of the hidden layer’s neurons. The output signal of second layer, ),( )2( ny at the time instant n is determined by ,)()( )2( 1 )1()2()2( bnywny M j jj (5) where )2()2( 1 ,, Mww  are the weights of this neuron and )2( b is its bias. Since s)( defined by (3) and (4) are nonlinear, it follows from (2), (5) that )( )2( ny is the nonlinear function depending on )1(nx and also on the )1)2(( NM -dimensional parameter vector .],,,,,,,,,,,[ )2()2()2( 1 )1()1()1( 1 )1( 1 )1( 1 )1( 11 T MMNMMN bwwbwwbwww  (6) To emphasize this fact, define the output signal of the neural network in the form )),(()( )2( wnxny NN (7) using the notation .: 1)2( IRIRIRNN NMN Taking into account that the neural network plays the role of a model of the nonlinearity ),(xF rewrite (7) as follows: ).),(()(mod wnxny NN (8) Optimal value ww specified by the least modulus |),()(|maxminarg wxxFw Xxw NN (9) Zhiteckii L.S., Nikolaienko S.A. Індуктивне моделювання складних систем, випуск 7, 2015 49 and also the discrepancy ),()( wxxFe NN between )(xF and the output of its neural network’s model for a fixed w correspond- ing to (8) are unknown. To do an adaptation of the neural network model to the uncertain system (1), the standard online gradient learning algorithm ))1(),(()()1()( nwnxQnnwnw w (10) taken, for example, from [5,6] is usually utilized. In this algorithm, ))1(),(( nwnxQw represents the gradient of the quadratic loss function 2 )],([ 2 1 ),( wxywxQ NN (11) with respect to w at )1(nww for given ),(nxx and )(n is the learning rate (step size) of (10). Due to (11) we have 2 ))]1(),(()([ 2 1 )1(),(( nwnxnynwnxQ NN (12) with the variable ))1(),(()())1(,( nwnxnynwne NN (13) representing the current model error which can be measured at the nth time instant. Now, using (11) – (13), rewrite the learning algorithm (10) as follows: )).1(),1(())1(,()()1()( nwnxnwnennwnw w NN (14) Thus, (2), (5), (7) and (14) describe the learning system necessary for the adap- tive identification of (1). For better understanding its performance, the structure of this system is depicted in Fig. 1. Learning Algorithm Neural Network Model )(nx )(ny )(nw )(ne )(mod ny Unknown Nonlinear System + _ + + )(n Fig. 1. Configuration of online learning system Convergence of sequential gradient Індуктивне моделювання складних систем, випуск 7, 2015 50 2. Statement of the problem Consider a special case where )(xF can exactly be approximated by a neural network representation for all Xx implying ).,()( wxxF NN (15) In this case called in [5, p. 304] as the ideal case, one has 0),( wne with w given by (9) if only )(n is absent. Note that this special case is similar to the so-called the hypothesis of representation [6, p. 81] advanced by M.A. Aizerman, E.M. Braverman and L.I. Rozonoer in the machine learning theory at the beginning 1960s. Suppose )}({ nx is an infinite sequence of vectors belonging to the bounded .X The aim of this paper consists in studying the asymptotic properties of the learning procedure (14) caused by this )}.({ nx More certainty, the following problem is stated. It is required to derive the conditions under which )}({ nw will converge in the sense that wnw n )(lim with .|||| w (16) 3. Preliminaries First, recall that the condition 0))1(),1(())1(,()( nw nwnxnwnen NN (17) followed from (14) is necessary to achieve the limit (16), for a given )}({ nx [6, sect. 3.13]. Since ,0))(),(( nwnxw NN it can be observed that or the condition 1 of the form ,)( constn 0))(,( nwne as n or the condition 2 of the form ,0)(n 0))(,( nwne as n are required to satisfy (17). Note that the condition 1 cannot take place if the noise )(n are present because )(),( nwne (due to (1), (13), (15)). It turned out that in the special case, the set ,W containing these sw becomes not one-point [25, 26]. To show it, put ,1N .1M Due to (6), this implies . 4 IRw Let T wwwww ],,,[ 4321 be a vector satisfying (15). Then, (2) and (5) together with (3) give that another T wwwwww ],,,[ 43321 will also satisfy the equality (15). Introduce the scalar variable 2 |||| ww representing the square of Euclidean distance between w and a ,w and define .||||inf)( 2 wwwV Ww (18) Zhiteckii L.S., Nikolaienko S.A. Індуктивне моделювання складних систем, випуск 7, 2015 51 Denote )).((: nwVVn Since 0nV (due to (18)), it is clear that if 1nn VV (19) then the sequence ,...,...,:}{ 0 nn VVV has always a limit, ,V as n tends to infinity, i.e., ,lim VVn n (20) meaning that the algorithm (14) converges. On the other hand, the fact that }{ nV is monotonical non-increasing sequence is not necessary to achieve (20) in principle. Note that the existence of the limit (20) does not imply that 0V even when the condition (15) is satisfied. Moreover, this limit may not exist if )}({ nx is an arbitrary sequence leading to the violation of (19) [25]. Nevertheless, if the asymptotic proper- ty (16) takes place, then )}({ nw converges to some nWw inflim where   1 :inflim n nk kn WW (21) denotes the so-called limit set introduced in [24, sect. 1.3] in which }.0)),1(()(:{: wnxnywWn NN Note that the limit set, ,inflim nW given by (21) represents a nonlinear mani- fold on 1)2( NM IR whose dimension satisfies ).2(inflimdim0 NMWn It can be understood that the algorithm (14) “attempts” to solve the infinite set of the equations ,0)),1(()( wnxny NN ,2,1n (22) with respect to unknown . 1)2( NM w IR In fact, this algorithm may give the solution ww of the remainder of (22), which is determined as the limit set (21) but not as .W It was observed that the condition (19) meaning that }{ nV is the monotonically non-increasing sequence may not be satisfied if the neural network model contains the hidden layer, in general. To demonstrate some asymptotic properties of (14), two simulation experi- ments with the scalar nonlinear system (1) having the nonlinearity )15.7exp(19.01 )15.7exp(05.075.3 )( x x xF were conducted. It can be shown that this nonlinearity can explicitly be approximated by the two-layer neural network model described by (2), (3), (5) and (7) with the components of two )2()1( , wwww summarized in Table 1. Convergence of sequential gradient Індуктивне моделювання складних систем, випуск 7, 2015 52 Table 1 Parameters of neural network model Exp. No Parameter )1( 11w )1( 1b )2( 1w )2( b 1, 2 Components of )1( w 7,15 1.65 3.45 0.3 Components of )2( w -7.15 -1.65 -3.45 3.75 1 Initial estimate 0.53 -0.50 -0.92 1.04 Final estimate 5.41 1.32 3.82 -0.05 2 Initial estimate 0.38 -0.57 -0.98 1.14 Final estimate -5.13 -1.52 -4.20 3.78 In all of the experiments, )(n was taken as .01.0)(n In these experi- ments, )}({ nx was generated as sequence of independent identically distributed (i.i.d.) pseudo random numbers on ].0.1,0.1[X The duration of the learning processes was always equal to 40 000 steps. Simulation results of first and second experiments are presented in Fig. 2 left and right, respectively. The initial estimated )0(w in both examples was chosen so that the distance between )0(w and W was large enough, and the condition ))0(())0(( )2()1( wVwV was satisfied. It was observed that at an initial stage of the learning process, }{ )1( nV was increasing and )2()1( nn VV for several ,,2,1 n as shown in Fig. 2, left. Further, }{ )1( nV became decreasing. Such a behavior of these se- quence leaded to appearing the feature that )2()1( nn VV for all sufficiently large .n In the second example, the initial )0(w was chosen to be close to that in the first example. One can observe that in this case, )1( nn VV (see Fig. 2, right). Fig. 2. Behavior of gradient learning algorithm (14) in Examples 1 (left) and 2 (right) in the absence of noise Zhiteckii L.S., Nikolaienko S.A. Індуктивне моделювання складних систем, випуск 7, 2015 53 It turned out that in there simulation examples, the condition (19) is not satisfied whereas the learning algorithm (14) remains indeed convergent. Thus, additional assumptions with respect to )}({ nx are required to guarantee the convergence of )}.({ nw 4. Local and global convergence results Assumption 1. )}({ nx is a sequence of vectors appearing randomly in accor- dance with some probability density function )(xp such that .1)( X dxxp Furthermore, )(xp has the following properties: X dxxpXnxP 0)(:})({ for any subset XX whose dimension is ,N and X dxxpXnxP 0)(:})({ if ,dim NX where }{P denotes the probability of corresponding random event. Assumption 2. It is assumed that )(xp represents a continuous function which may become zero only at some isolated points on .X Assumption 3. The noise is absent, i.e., .0)(n In this case, ),( wxQ defined in (11) becomes .)],()([ 2 1 ),( 2 wxxFwxQ NN (23) Introduce the performance index )},({)( wxQEwJ (24) which evaluates the quality of learning process with ),( wxQ given by (23). In this expression, X dxxpwxxFwxQE )()],()([:)},({ 2 NN denotes the expectation of ),( wxQ with respect to the random s.x Let )(wW denote an -neighborhood of some Ww defined as },||||:{:)( wwwwW which does not contain another points of .W Suppose a) the assumption 1 – 3 are valid; b) the condition dxxpwwwxwxwx wWx T w )())(,()],(),([ )( NNNNNN Convergence of sequential gradient Індуктивне моделювання складних систем, випуск 7, 2015 54 )( 22 )(),()],(),([ wWx w dxxpwxwxwx NNNNNN (25) meaning }),()],(),( )})(,()],(),({[ 22 wxwxwxE wwwxwxwxE w T w NNNN{[NN NNNNNN are satisfied for all Xx and for any ww, from ; 1)2( NM IR c) an initial )0(w satisfies ).()0( wWw In the work [25] it has been established that, under the conditions a) – c), the limit of )}({ nw exists almost sure (a.s.) as n approaches to infinity, i.e., wnw n )(lim a.s. (26) with some Ww if the step size )(n is chosen as )(n where .20 (27) By virtue of (15), the property (26) yields 0)( n wJ a.s. (28) The proof of the probabilistic convergence of )}({ nw caused by the learning algorithm (14) with constant satisfying (27) utilizes essentially the Doob’s martingale convergence theorem [24] after establishing the fact that, under the condition (25), the random }{ nV is the supermartingale defined as 1)}0(,),1(|{ nn VwnwVE  (29) with ,||||inf 2 n Ww n wwV (30) and also takes into account the Borel – Cantelli lemma [24, sect. 15.3]. (In expression (29), }|{ nVE denotes the conditional expectation of nV .) The conditions given above are only the sufficient conditions guaranteeing the local convergence of (14) with probability 1. Since the condition b) requires some computation effort for its verification whereas the condition c) cannot be verified before starting the learning algorithm, these local convergence results make of the mathematical sense. Zhiteckii L.S., Nikolaienko S.A. Індуктивне моделювання складних систем, випуск 7, 2015 55 Comparing (29) and (19), we notice that the variable nV given by (30) is a peculiar stochastic counterpart of the Lyapunov function of (14) if )()( wWnw will be guaranteed. At first sight, it seems that V(w) defined in (18) might be exploited as a Lyapu- nov function for analyzing the asymptotic behavior of (14) in a stochastic framework for any .)0( 1)2( NM w IR In fact, by the definition, V(w) has the property .0)(0)( WwwVWwwV ifandif (31) However, the requirement ||||||)()(|| wwLwwV (32) with the Lipschitz constant 0L advanced in [27] is not satisfied for any ,w w from . 1)2( NM IR Thus, )(wV having the form (18) is indeed not admissible to study the global convergence properties of (14) based on results of [27]. In [26] it has been derived that the limit (26) will be achieved for an arbitrary initial )0(w if the assumptions 1 – 3 made above hold and, instead of (27), the learning rate, ),(n is chosen as L/20 (33) with ,0 )},({ ||)},({|| inf: 2 wxQE wxQEw Ww . )},({ }||),({|| sup: 2 wxQE wxQE w Ww The proof of this result establishing the conditions for the global probabilistic convergence of the learning algorithm (14) utilizes the Theorem 3´ of [27] after replacing )(wV of the form (18) by )}.,({)( wxQEwV (34) Now, let )(n be present. Then, the requirement (33) needs to be replaced by another requirement under which 0)(n as n tends to . It can be shown that, using the same Lyapunov function as in (34), and exploiting the Theorem 3 of [27], the convergence properties (27), (29) will be ensured if )}({ n satisfies the standard requirements ,)( 0n n 0 2 )( n n (35) arising first in [6]. (Notice that (35) are satisfied if nn)( with .12/1 ) Convergence of sequential gradient Індуктивне моделювання складних систем, випуск 7, 2015 56 To illustrate the asymptotic behavior of the algorithm (14) in the presence of noise, we conducted the same simulation experiment 1 but with .0)(n Namely, )}({ n was chosen as a pseudorandom i.i.d. sequence in the range ].05.0,05.0[ Two separate simulations were conducted. In first simulation, )(n was chosen as 01.0)(n whereas in second simulation, 51.0 )( nn was taken. Results of these simulation experiments are presented in Fig. 3. ))}(,({ nwxQE 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 n a) ))}(,({ nwxQE 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 n b) Fig. 3. Behavior of learning algorithm (14) with noise in Example 1: a) ;01.0 b) 51.0 )( nn It is seen that in the first case, where the learning rate remains constant, the oscillations of ))}(,({ nwxQE are observed (Fig. 3, a). Nevertheless, this variable converges to zero as n becomes large enough; see Fig. 3, b. Conclusion. The main contribution of this paper consisted in theoretical and experimental studying the asymptotical properties of standard online gradient algorithms applicable to the learning neural networks in the stochastic framework. Namely, sufficient conditions for the global convergence of these algorithms have been established. It was shown that adding a penalty term to the current error function is indeed not necessary to guarantee their convergence properties. References 1. Ивахненко А.Г., Степашко В.С. Помехоустойчивость моделирования. – К.: Наук. думка, 1985. – 216 с. 2. Suykens J., Moor B.D. Nonlinear system identification using multilayer neural networks: some ideas for initial weights, number of hidden neurons and error criteria Zhiteckii L.S., Nikolaienko S.A. Індуктивне моделювання складних систем, випуск 7, 2015 57 // Proc. 12nd IFAC World Congress (Sydney, Australia, July 1993). – 1993. – vol. 3. – P. 49–52. 3. Kosmatopoulos E.S., Polycarpou M.M., Christodoulou M.A., Ioannou P.A. High-order neural network structures for identification of dynamical systems // IEEE Trans. on Neural Networks. – 1995. – vol. 6. – P. 422–431. 4. Levin A.U., Narendra K. S., Recursive identification using feedforward neural networks // Int. J. Contr. – 1995. – vol. 61. – P. 533–547. 5. Tsypkin Ya.Z., Mason J.D., Avedyan E.D., Warwick K., Levin I. K. Neural networks for identification of nonlinear systems under random piecewise polynomial disturbances // IEEE Trans. on Neural Networks. – 1999. – vol. 10. – P. 303–311. 6. Tsypkin Ya. Z. Adaptation and learning in automatic systems. – New-York: Academic Press. – 1971. – 291 p. 7. White H. Some asymptotic results for learning in single hidden-layer neural network models // J. Amer. Statist. Assoc. – 1987. – vol. 84. – P. 117–134. 8. Behera L., Kumar S., Patnaik A. On adaptive learning rate that guarantees convergence in feedforward networks // IEEE Trans. on Neural Networks. – 2006. – vol. 17. – P. 1116–1125. 9. Kuan C. M., Hornik K. Convergence of learning algorithms with constant learning rates // Ibid. – 1991. – vol. 2. – P. 484 – 489. 10. Luo Z. On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks // Neural Comput. – 1991. – vol. 3. – P. 226–245. 11. Finnoff W. Diffusion approximations for the constant learning rate backpro- pagation algorithm and resistance to local minima // Ibid. – 1994. – 6. – P. 285– 295. 12. Gaivoronski A.A. Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods // Optim. Methods Software. – 1994. – 4. – P. 117–134. 13. Fine T.L., Mukherjee S. Parameter convergence and learning curves for neural networks // Neural Comput. – 1999. – 11. – P. 749–769. 14. Tadic V., Stankovic S. Learning in neural networks by normalized stochastic gradient algorithm: Local convergence // Proc. 5th Seminar Neural Netw. Appl. Electr. Eng. (Yugoslavia,Sept. 2000). – 2000. – P. 11–17. 15. Zhang H., Wu W., Liu F., Yao M. Boundedness and convergence of online gradient method with penalty for feedforward neural networks // IEEE Trans. on Neural Networks. – 2009. – vol. 20. – P. 1050–1054. 16. Mangasarian O.L., Solodov M.V. Serial and parallel backpropagation conver- gence via nonmonotone perturbed minimization // Optim. Methods Software. –1994. – P. 103–106. Convergence of sequential gradient Індуктивне моделювання складних систем, випуск 7, 2015 58 17. Wu W., Feng G., Li X. Training multilayer perceptrons via minimization of ridge functions // Advances in Comput. Mathematics. – 2002. – vol. 17. – P. 331– 347. 18. Zhang N., Wu W., Zheng G. Convergence of gradient method with momen- tum for two-layer feedforward neural networks // IEEE Trans. on Neural Networks. – 2006. – vol. 17. – P. 522–525. 19. Wu W., Feng G., Li X., Xu Y. Deterministic convergence of an online gra- dient method for BP neural networks // Ibid. –2005. – vol. 16. – P. 1–9. 20. Xu Z.B., Zhang R., Jing W.F. When does online BP training converge? // Ibid. – 2009. – vol. 20. – P. 1529–1539. 21. Shao H., Wu W., Liu L. Convergence and monotonicity of an online gradient method with penalty for neural networks // WSEAS Trans. Math. – 2007. – vol. 6. – P. 469–476. 22. Ellacott S.W. The numerical analysis approach // Mathematical Approaches to Neural Networks (J.G. Taylor, ed; B.V.: Elsevier Science Publisher). – 1993. – P. 103–137. 23. Skantze F.P., Kojic A., Loh A.P., Annaswamy A.M. Adaptive estimation of discrete time systems with nonlinear parameterization // Automatica. – 2000. – vol. 36. – P. 1879–1887. 24. Loeve M. Probability theory. – New-York: Springer-Verlag. – 1963. – 425 p. 25. Zhiteckii L.S., Azarskov V.N., Nikolaienko S.A. Convergence of learning al- gorithms in neural networks for adaptive identification of nonlinearly parameterized systems // in Proc. 16th IFAC Symposium on System Identification (Brussels, Bel- gium). – 2012. – P. 1593–1598. 26. Azarskov V.N., Kucherov D.P, Nikolaienko S.A., Zhiteckii L.S. Asymptotic behaviour of gradient learning algorithms in neural network models for the identifica- tion of nonlinear systems // American Journal of Neural Networks and Applications. – 2015. – No 1(1). – P. 1–10. 27. Polyak B.T. Convergence and convergence rate of iterative stochastic algo- rithms, I: General case // Autom. Remote Control. – 1976. – vol. 12. – P. 1858–1868.
id nasplib_isofts_kiev_ua-123456789-125021
institution Digital Library of Periodicals of National Academy of Sciences of Ukraine
issn XXXX-0044
language English
last_indexed 2025-12-07T13:19:29Z
publishDate 2015
publisher Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України
record_format dspace
spelling Zhiteckii, L.S.
Nikolaienko, S.A.
2017-10-13T16:18:25Z
2017-10-13T16:18:25Z
2015
Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case / L.S. Zhiteckii, S.A. Nikolaienko // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2015. — Вип. 7. — С. 46-58. — Бібліогр.: 27 назв. — англ
XXXX-0044
https://nasplib.isofts.kiev.ua/handle/123456789/125021
681.5
The paper deals with the asymptotic properties of an online learning procedure for identifying non-linear systems via neural networks models of these systems. The probabilistic convergence condi-tions of this procedure are presented for the special case where a nonlinearity can exactly be ap-proximated by a suitable neural network. Keywords: identification, nonlinear system, neural network, learning algorithm, stochastic environment, convergence.
Стаття стосується асимптотичних властивостей деякої процедури навчання в реальному часі для ідентифікації нелінійних систем з використанням нейронних мереж як моделей цих систем. Представлені умови ймовірносної збіжності цієї процедури для спеціального випадку, коли нелінійність може бути точно апроксимована належною нейронною мережею. Ключові слова: ідентифікація, нелінійна система, нейронна мережа, алгоритм навчання, стохастичне середовище, збіжність.
Статья касается асимптотических свойств некоторой процедуры обучения в реальном време-ни для идентификации нелинейных систем с использованием нейронных сетей в качестве моделей этих систем. Представлении условия вероятностной сходимости этой процедуры для специального случая, когда нелинейность может быть точно аппроксимирована подхо-дящей нейронной сетью.
en
Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України
Індуктивне моделювання складних систем
Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case
Article
published earlier
spellingShingle Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case
Zhiteckii, L.S.
Nikolaienko, S.A.
title Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case
title_full Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case
title_fullStr Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case
title_full_unstemmed Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case
title_short Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case
title_sort сonvergence of sequential gradient learning algorithms in neural networks for online identification of nonlinear systems: a special case
url https://nasplib.isofts.kiev.ua/handle/123456789/125021
work_keys_str_mv AT zhiteckiils sonvergenceofsequentialgradientlearningalgorithmsinneuralnetworksforonlineidentificationofnonlinearsystemsaspecialcase
AT nikolaienkosa sonvergenceofsequentialgradientlearningalgorithmsinneuralnetworksforonlineidentificationofnonlinearsystemsaspecialcase