Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case
The paper deals with the asymptotic properties of an online learning procedure for identifying non-linear systems via neural networks models of these systems. The probabilistic convergence condi-tions of this procedure are presented for the special case where a nonlinearity can exactly be ap-proxima...
Gespeichert in:
| Veröffentlicht in: | Індуктивне моделювання складних систем |
|---|---|
| Datum: | 2015 |
| Hauptverfasser: | , |
| Format: | Artikel |
| Sprache: | Englisch |
| Veröffentlicht: |
Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України
2015
|
| Online Zugang: | https://nasplib.isofts.kiev.ua/handle/123456789/125021 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Назва журналу: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| Zitieren: | Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case / L.S. Zhiteckii, S.A. Nikolaienko // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2015. — Вип. 7. — С. 46-58. — Бібліогр.: 27 назв. — англ |
Institution
Digital Library of Periodicals of National Academy of Sciences of Ukraine| _version_ | 1859638705772298240 |
|---|---|
| author | Zhiteckii, L.S. Nikolaienko, S.A. |
| author_facet | Zhiteckii, L.S. Nikolaienko, S.A. |
| citation_txt | Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case / L.S. Zhiteckii, S.A. Nikolaienko // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2015. — Вип. 7. — С. 46-58. — Бібліогр.: 27 назв. — англ |
| collection | DSpace DC |
| container_title | Індуктивне моделювання складних систем |
| description | The paper deals with the asymptotic properties of an online learning procedure for identifying non-linear systems via neural networks models of these systems. The probabilistic convergence condi-tions of this procedure are presented for the special case where a nonlinearity can exactly be ap-proximated by a suitable neural network. Keywords: identification, nonlinear system, neural network, learning algorithm, stochastic environment, convergence.
Стаття стосується асимптотичних властивостей деякої процедури навчання в реальному часі для ідентифікації нелінійних систем з використанням нейронних мереж як моделей цих систем. Представлені умови ймовірносної збіжності цієї процедури для спеціального випадку, коли нелінійність може бути точно апроксимована належною нейронною мережею. Ключові слова: ідентифікація, нелінійна система, нейронна мережа, алгоритм навчання, стохастичне середовище, збіжність.
Статья касается асимптотических свойств некоторой процедуры обучения в реальном време-ни для идентификации нелинейных систем с использованием нейронных сетей в качестве моделей этих систем. Представлении условия вероятностной сходимости этой процедуры для специального случая, когда нелинейность может быть точно аппроксимирована подхо-дящей нейронной сетью.
|
| first_indexed | 2025-12-07T13:19:29Z |
| format | Article |
| fulltext |
Convergence of sequential gradient
Індуктивне моделювання складних систем, випуск 7, 2015 46
UDC 681.5
CONVERGENCE OF SEQUENTIAL GRADIENT
LEARNING ALGORITHMS IN NEURAL NETWORKS FOR ONLINE
IDENTIFICATION OF NONLINEAR SYSTEMS: A SPECIAL CASE
L.S. Zhiteckii, S.A. Nikolaienko
International Research and Training Center for Information Technologies and Systems
leonid_zhiteckii@i.ua
Стаття стосується асимптотичних властивостей деякої процедури навчання в реальному часі
для ідентифікації нелінійних систем з використанням нейронних мереж як моделей цих
систем. Представлені умови ймовірносної збіжності цієї процедури для спеціального
випадку, коли нелінійність може бути точно апроксимована належною нейронною мережею.
Ключові слова: ідентифікація, нелінійна система, нейронна мережа, алгоритм навчання,
стохастичне середовище, збіжність.
The paper deals with the asymptotic properties of an online learning procedure for identifying non-
linear systems via neural networks models of these systems. The probabilistic convergence condi-
tions of this procedure are presented for the special case where a nonlinearity can exactly be ap-
proximated by a suitable neural network.
Keywords: identification, nonlinear system, neural network, learning algorithm, stochastic
environment, convergence.
Статья касается асимптотических свойств некоторой процедуры обучения в реальном време-
ни для идентификации нелинейных систем с использованием нейронных сетей в качестве
моделей этих систем. Представлении условия вероятностной сходимости этой процедуры
для специального случая, когда нелинейность может быть точно аппроксимирована подхо-
дящей нейронной сетью.
Ключевые слова: идентификация, нелинейная система, нейронная сеть, алгоритм обучения,
стохастическая среда, сходимость.
Introduction. The problem of identifying complex unknown systems in the
presence of noise remains important from both theoretical and practical point of view
up to now. Significant progress in this research area were achieved in the frameworks
of well-known group method of data handling (GMDH) advanced by A. G.
Ivakhnenko in the late 1960s to deal with a finite set of training examples to be used
for deriving mathematical models of unknown systems [1]. Over the past decades,
interest has been increasing toward the use of multilayer neural networks as models
for the adaptive identification of nonlinearly parameterized dynamic systems [2–5].
Several learning methods for updating the weights of neural networks have been
advanced in literature. Most of these methods rely on the gradient concept [5, 6].
Although this concept has been successfully used in many empirical studies, there are
very few fundamental results dealing with the convergence of gradient algorithms for
learning neural networks. One of these results is based on utilizing the Lyapunov
stability theory [3, 6].
Zhiteckii L.S., Nikolaienko S.A.
Індуктивне моделювання складних систем, випуск 7, 2015 47
The asymptotic behavior of online adaptive gradient algorithms for the network
learning has been studied by many authors [7–22]. In particular, the convergence of
the learning process for the so-called feedforward network models with single hidden
layer is investigated in [7] by using the stochastic approximation theory. The conver-
gence results have been derived in [9–15] among many others provided that input
signals have a probabilistic nature. In their stochastic approach, the learning rate goes
to zero as the learning process tends to infinity. Unfortunately, this gives that the
learning goes faster in the beginning and slows down in the late stage.
The convergence analysis of learning algorithm with deterministic (non-
stochastic) nature has been given in [16–21]. In contrast to the stochastic approach,
several of these results allow to employ a constant learning rate [18, 22]. However,
they assume that learning set must be finite whereas in online identification schemes,
this set is theoretically infinite. To the best of author’s knowledge, there are no gen-
eral results in literature concerning the global convergence properties of training pro-
cedures with a fixed learning rate applicable to the case of infinite learning set.
The distinguishing feature of multi-layer neural networks is that they describe
some nonlinearly parameterized models needed to be identified. This leads to diffi-
culties in deriving their convergence properties for a general case. To avoid these dif-
ficulties in non-stochastic case, the assumption that similar nonlinear functions need
to be convex (concave) is introduced in [23]. However, such an assumption is not ap-
propriate for neural network’s description of nonlinearity.
A popular approach to analyze the asymptotic behavior of online gradient algo-
rithms in stochastic case is based on Martingale convergence theory [24]. This ap-
proach has been exploited in [25, 26] to derive some local convergence in stochastic
framework for standard online gradient algorithms with the constant learning rate.
This paper is an extension of [25, 26]. The main efforts is focused on establish-
ing sufficient conditions under which the global convergence of gradient algorithm
for learning neural networks models in the stochastic environments will be achieved.
The key idea in deriving these convergence results is based on the use of the Lyapu-
nov methodology [27].
1. System identification using a neural network model
Let
)())(()( nnxFny (1)
be the nonlinear equation in the compact form describing a complex system to be
identified. In this equation, IR)(ny and
N
nx IR)( are the scalar output and the
so-called state vector, respectively, available for the measurement at each nth time
instant, )(n is noise at some time instant ),,2,1( n and IRIR
N
F :
represents some unknown nonlinear mapping. (Note that )(nx may include the cur-
rent inputs of this system and possibly its past inputs and also outputs; see [6, sect.
Convergence of sequential gradient
Індуктивне моделювання складних систем, випуск 7, 2015 48
5.15].) Without loss of generality, one supposes that the nonlinearity )(xF is the con-
tinuous and smooth function on a bounded set N
X IR ).( Xdiam
To approximate )(xF by a suitable nonlinearly parameterized function, the
two-layer neural network model containing M )1(M neurons in its hidden layer is
employed. The inputs to the each jth neuron of this layer at the time instant n are the
components of ).(nx Its output signal at the nth time instant is specified as
,)()(
1
)1()1()1(
N
i
iijjj nxwbny ,,,1 Mj (2)
where )(nxi denotes the ith component of ),(nx and )1(
ijw and )1(
jb are the weight
coefficients and the bias of this jth neuron, respectively. )( denotes the so-called
activation function defined usually as the sigmoid functions
)exp(1
1
)(
s
s (3)
or
).(tanh)( ss (4)
There is only one neuron in the output (second) layer, whose inputs are the
outputs of the hidden layer’s neurons. The output signal of second layer, ),(
)2(
ny at
the time instant n is determined by
,)()(
)2(
1
)1()2()2(
bnywny
M
j
jj (5)
where
)2()2(
1 ,, Mww are the weights of this neuron and
)2(
b is its bias.
Since s)( defined by (3) and (4) are nonlinear, it follows from (2), (5) that
)(
)2(
ny is the nonlinear function depending on )1(nx and also on the
)1)2(( NM -dimensional parameter vector
.],,,,,,,,,,,[
)2()2()2(
1
)1()1()1(
1
)1(
1
)1(
1
)1(
11
T
MMNMMN bwwbwwbwww (6)
To emphasize this fact, define the output signal of the neural network in the form
)),(()(
)2(
wnxny NN (7)
using the notation .:
1)2(
IRIRIRNN
NMN
Taking into account that the neural
network plays the role of a model of the nonlinearity ),(xF rewrite (7) as follows:
).),(()(mod wnxny NN (8)
Optimal value ww specified by the least modulus
|),()(|maxminarg wxxFw
Xxw
NN (9)
Zhiteckii L.S., Nikolaienko S.A.
Індуктивне моделювання складних систем, випуск 7, 2015 49
and also the discrepancy
),()( wxxFe NN
between )(xF and the output of its neural network’s model for a fixed w correspond-
ing to (8) are unknown.
To do an adaptation of the neural network model to the uncertain system (1),
the standard online gradient learning algorithm
))1(),(()()1()( nwnxQnnwnw w (10)
taken, for example, from [5,6] is usually utilized. In this algorithm,
))1(),(( nwnxQw represents the gradient of the quadratic loss function
2
)],([
2
1
),( wxywxQ NN (11)
with respect to w at )1(nww for given ),(nxx and )(n is the learning rate
(step size) of (10). Due to (11) we have
2
))]1(),(()([
2
1
)1(),(( nwnxnynwnxQ NN (12)
with the variable
))1(),(()())1(,( nwnxnynwne NN (13)
representing the current model error which can be measured at the nth time instant.
Now, using (11) – (13), rewrite the learning algorithm (10) as follows:
)).1(),1(())1(,()()1()( nwnxnwnennwnw w NN (14)
Thus, (2), (5), (7) and (14) describe the learning system necessary for the adap-
tive identification of (1). For better understanding its performance, the structure of
this system is depicted in Fig. 1.
Learning
Algorithm
Neural Network Model
)(nx )(ny
)(nw
)(ne
)(mod ny
Unknown
Nonlinear
System
+
_
+
+
)(n
Fig. 1. Configuration of online learning system
Convergence of sequential gradient
Індуктивне моделювання складних систем, випуск 7, 2015 50
2. Statement of the problem
Consider a special case where )(xF can exactly be approximated by a neural
network representation for all Xx implying
).,()( wxxF NN (15)
In this case called in [5, p. 304] as the ideal case, one has 0),( wne with w given
by (9) if only )(n is absent. Note that this special case is similar to the so-called the
hypothesis of representation [6, p. 81] advanced by M.A. Aizerman, E.M. Braverman
and L.I. Rozonoer in the machine learning theory at the beginning 1960s.
Suppose )}({ nx is an infinite sequence of vectors belonging to the bounded .X
The aim of this paper consists in studying the asymptotic properties of the
learning procedure (14) caused by this )}.({ nx More certainty, the following problem
is stated. It is required to derive the conditions under which )}({ nw will converge in
the sense that
wnw
n
)(lim with .|||| w (16)
3. Preliminaries
First, recall that the condition
0))1(),1(())1(,()(
nw nwnxnwnen NN (17)
followed from (14) is necessary to achieve the limit (16), for a given )}({ nx [6,
sect. 3.13]. Since ,0))(),(( nwnxw NN it can be observed that or the condition 1 of
the form
,)( constn 0))(,( nwne as n
or the condition 2 of the form
,0)(n 0))(,( nwne as n
are required to satisfy (17). Note that the condition 1 cannot take place if the noise
)(n are present because )(),( nwne (due to (1), (13), (15)).
It turned out that in the special case, the set ,W containing these sw becomes
not one-point [25, 26]. To show it, put ,1N .1M Due to (6), this implies
.
4
IRw Let
T
wwwww ],,,[ 4321 be a vector satisfying (15). Then, (2) and (5)
together with (3) give that another
T
wwwwww ],,,[ 43321 will also satisfy
the equality (15).
Introduce the scalar variable
2
|||| ww representing the square of Euclidean
distance between w and a ,w and define
.||||inf)(
2
wwwV
Ww
(18)
Zhiteckii L.S., Nikolaienko S.A.
Індуктивне моделювання складних систем, випуск 7, 2015 51
Denote )).((: nwVVn Since 0nV (due to (18)), it is clear that if
1nn VV (19)
then the sequence ,...,...,:}{ 0 nn VVV has always a limit, ,V as n tends to infinity,
i.e.,
,lim VVn
n
(20)
meaning that the algorithm (14) converges. On the other hand, the fact that }{ nV is
monotonical non-increasing sequence is not necessary to achieve (20) in principle.
Note that the existence of the limit (20) does not imply that 0V even when the
condition (15) is satisfied. Moreover, this limit may not exist if )}({ nx is an arbitrary
sequence leading to the violation of (19) [25]. Nevertheless, if the asymptotic proper-
ty (16) takes place, then )}({ nw converges to some nWw inflim where
1
:inflim
n nk
kn WW (21)
denotes the so-called limit set introduced in [24, sect. 1.3] in which
}.0)),1(()(:{: wnxnywWn NN
Note that the limit set, ,inflim nW given by (21) represents a nonlinear mani-
fold on
1)2( NM
IR whose dimension satisfies ).2(inflimdim0 NMWn
It can be understood that the algorithm (14) “attempts” to solve the infinite set
of the equations
,0)),1(()( wnxny NN ,2,1n (22)
with respect to unknown .
1)2( NM
w IR In fact, this algorithm may give the solution
ww of the remainder of (22), which is determined as the limit set (21) but not as
.W
It was observed that the condition (19) meaning that }{ nV is the monotonically
non-increasing sequence may not be satisfied if the neural network model contains
the hidden layer, in general.
To demonstrate some asymptotic properties of (14), two simulation experi-
ments with the scalar nonlinear system (1) having the nonlinearity
)15.7exp(19.01
)15.7exp(05.075.3
)(
x
x
xF
were conducted. It can be shown that this nonlinearity can explicitly be approximated
by the two-layer neural network model described by (2), (3), (5) and (7) with the
components of two
)2()1(
, wwww summarized in Table 1.
Convergence of sequential gradient
Індуктивне моделювання складних систем, випуск 7, 2015 52
Table 1
Parameters of neural network model
Exp.
No
Parameter )1(
11w
)1(
1b
)2(
1w
)2(
b
1, 2
Components of
)1(
w 7,15
1.65
3.45
0.3
Components of
)2(
w -7.15
-1.65
-3.45
3.75
1
Initial estimate 0.53 -0.50 -0.92 1.04
Final estimate 5.41 1.32 3.82 -0.05
2
Initial estimate 0.38 -0.57 -0.98 1.14
Final estimate -5.13 -1.52 -4.20 3.78
In all of the experiments, )(n was taken as .01.0)(n In these experi-
ments, )}({ nx was generated as sequence of independent identically distributed
(i.i.d.) pseudo random numbers on ].0.1,0.1[X The duration of the learning
processes was always equal to 40 000 steps.
Simulation results of first and second experiments are presented in Fig. 2 left
and right, respectively. The initial estimated )0(w in both examples was chosen so
that the distance between )0(w and W was large enough, and the condition
))0(())0((
)2()1(
wVwV was satisfied. It was observed that at an initial stage of the
learning process, }{
)1(
nV was increasing and
)2()1(
nn VV for several ,,2,1 n as
shown in Fig. 2, left. Further, }{
)1(
nV became decreasing. Such a behavior of these se-
quence leaded to appearing the feature that )2()1(
nn VV for all sufficiently large .n
In the second example, the initial )0(w was chosen to be close to that in the
first example. One can observe that in this case,
)1(
nn VV (see Fig. 2, right).
Fig. 2. Behavior of gradient learning algorithm (14) in
Examples 1 (left) and 2 (right) in the absence of noise
Zhiteckii L.S., Nikolaienko S.A.
Індуктивне моделювання складних систем, випуск 7, 2015 53
It turned out that in there simulation examples, the condition (19) is not
satisfied whereas the learning algorithm (14) remains indeed convergent.
Thus, additional assumptions with respect to )}({ nx are required to guarantee
the convergence of )}.({ nw
4. Local and global convergence results
Assumption 1. )}({ nx is a sequence of vectors appearing randomly in accor-
dance with some probability density function )(xp such that
.1)(
X
dxxp
Furthermore, )(xp has the following properties:
X
dxxpXnxP 0)(:})({
for any subset XX whose dimension is ,N and
X
dxxpXnxP 0)(:})({
if ,dim NX where }{P denotes the probability of corresponding random event.
Assumption 2. It is assumed that )(xp represents a continuous function which
may become zero only at some isolated points on .X
Assumption 3. The noise is absent, i.e., .0)(n In this case, ),( wxQ defined
in (11) becomes
.)],()([
2
1
),(
2
wxxFwxQ NN (23)
Introduce the performance index
)},({)( wxQEwJ (24)
which evaluates the quality of learning process with ),( wxQ given by (23). In this
expression,
X
dxxpwxxFwxQE )()],()([:)},({
2
NN
denotes the expectation of ),( wxQ with respect to the random s.x
Let )(wW denote an -neighborhood of some Ww defined as
},||||:{:)( wwwwW which does not contain another points of .W Suppose
a) the assumption 1 – 3 are valid;
b) the condition
dxxpwwwxwxwx
wWx
T
w )())(,()],(),([
)(
NNNNNN
Convergence of sequential gradient
Індуктивне моделювання складних систем, випуск 7, 2015 54
)(
22
)(),()],(),([
wWx
w dxxpwxwxwx NNNNNN (25)
meaning
}),()],(),(
)})(,()],(),({[
22
wxwxwxE
wwwxwxwxE
w
T
w
NNNN{[NN
NNNNNN
are satisfied for all Xx and for any ww, from ;
1)2( NM
IR
c) an initial )0(w satisfies ).()0( wWw
In the work [25] it has been established that, under the conditions a) – c), the
limit of )}({ nw exists almost sure (a.s.) as n approaches to infinity, i.e.,
wnw
n
)(lim a.s. (26)
with some Ww if the step size )(n is chosen as )(n where
.20 (27)
By virtue of (15), the property (26) yields
0)(
n
wJ a.s. (28)
The proof of the probabilistic convergence of )}({ nw caused by the learning
algorithm (14) with constant satisfying (27) utilizes essentially the Doob’s
martingale convergence theorem [24] after establishing the fact that, under the
condition (25), the random }{ nV is the supermartingale defined as
1)}0(,),1(|{ nn VwnwVE (29)
with
,||||inf
2
n
Ww
n wwV (30)
and also takes into account the Borel – Cantelli lemma [24, sect. 15.3]. (In expression
(29), }|{ nVE denotes the conditional expectation of nV .)
The conditions given above are only the sufficient conditions guaranteeing the
local convergence of (14) with probability 1. Since the condition b) requires some
computation effort for its verification whereas the condition c) cannot be verified
before starting the learning algorithm, these local convergence results make of the
mathematical sense.
Zhiteckii L.S., Nikolaienko S.A.
Індуктивне моделювання складних систем, випуск 7, 2015 55
Comparing (29) and (19), we notice that the variable nV given by (30) is a
peculiar stochastic counterpart of the Lyapunov function of (14) if )()( wWnw
will be guaranteed.
At first sight, it seems that V(w) defined in (18) might be exploited as a Lyapu-
nov function for analyzing the asymptotic behavior of (14) in a stochastic framework
for any .)0(
1)2( NM
w IR In fact, by the definition, V(w) has the property
.0)(0)( WwwVWwwV ifandif (31)
However, the requirement
||||||)()(|| wwLwwV (32)
with the Lipschitz constant 0L advanced in [27] is not satisfied for any ,w w
from .
1)2( NM
IR Thus, )(wV having the form (18) is indeed not admissible to study
the global convergence properties of (14) based on results of [27].
In [26] it has been derived that the limit (26) will be achieved for an arbitrary
initial )0(w if the assumptions 1 – 3 made above hold and, instead of (27), the
learning rate, ),(n is chosen as
L/20 (33)
with
,0
)},({
||)},({||
inf:
2
wxQE
wxQEw
Ww
.
)},({
}||),({||
sup:
2
wxQE
wxQE w
Ww
The proof of this result establishing the conditions for the global probabilistic
convergence of the learning algorithm (14) utilizes the Theorem 3´ of [27] after
replacing )(wV of the form (18) by
)}.,({)( wxQEwV (34)
Now, let )(n be present. Then, the requirement (33) needs to be replaced by
another requirement under which 0)(n as n tends to . It can be shown that,
using the same Lyapunov function as in (34), and exploiting the Theorem 3 of [27],
the convergence properties (27), (29) will be ensured if )}({ n satisfies the standard
requirements
,)(
0n
n
0
2
)(
n
n (35)
arising first in [6]. (Notice that (35) are satisfied if nn)( with .12/1 )
Convergence of sequential gradient
Індуктивне моделювання складних систем, випуск 7, 2015 56
To illustrate the asymptotic behavior of the algorithm (14) in the presence of
noise, we conducted the same simulation experiment 1 but with .0)(n Namely,
)}({ n was chosen as a pseudorandom i.i.d. sequence in the range ].05.0,05.0[
Two separate simulations were conducted. In first simulation, )(n was chosen as
01.0)(n whereas in second simulation, 51.0
)( nn was taken.
Results of these simulation experiments are presented in Fig. 3.
))}(,({ nwxQE
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
n
a)
))}(,({ nwxQE
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
n
b)
Fig. 3. Behavior of learning algorithm (14) with noise in Example 1:
a) ;01.0 b)
51.0
)( nn
It is seen that in the first case, where the learning rate remains constant, the
oscillations of ))}(,({ nwxQE are observed (Fig. 3, a). Nevertheless, this variable
converges to zero as n becomes large enough; see Fig. 3, b.
Conclusion. The main contribution of this paper consisted in theoretical and
experimental studying the asymptotical properties of standard online gradient
algorithms applicable to the learning neural networks in the stochastic framework.
Namely, sufficient conditions for the global convergence of these algorithms have
been established. It was shown that adding a penalty term to the current error function
is indeed not necessary to guarantee their convergence properties.
References
1. Ивахненко А.Г., Степашко В.С. Помехоустойчивость моделирования. –
К.: Наук. думка, 1985. – 216 с.
2. Suykens J., Moor B.D. Nonlinear system identification using multilayer neural
networks: some ideas for initial weights, number of hidden neurons and error criteria
Zhiteckii L.S., Nikolaienko S.A.
Індуктивне моделювання складних систем, випуск 7, 2015 57
// Proc. 12nd IFAC World Congress (Sydney, Australia, July 1993). – 1993. – vol. 3.
– P. 49–52.
3. Kosmatopoulos E.S., Polycarpou M.M., Christodoulou M.A., Ioannou P.A.
High-order neural network structures for identification of dynamical systems // IEEE
Trans. on Neural Networks. – 1995. – vol. 6. – P. 422–431.
4. Levin A.U., Narendra K. S., Recursive identification using feedforward neural
networks // Int. J. Contr. – 1995. – vol. 61. – P. 533–547.
5. Tsypkin Ya.Z., Mason J.D., Avedyan E.D., Warwick K., Levin I. K. Neural
networks for identification of nonlinear systems under random piecewise polynomial
disturbances // IEEE Trans. on Neural Networks. – 1999. – vol. 10. – P. 303–311.
6. Tsypkin Ya. Z. Adaptation and learning in automatic systems. – New-York:
Academic Press. – 1971. – 291 p.
7. White H. Some asymptotic results for learning in single hidden-layer neural
network models // J. Amer. Statist. Assoc. – 1987. – vol. 84. – P. 117–134.
8. Behera L., Kumar S., Patnaik A. On adaptive learning rate that guarantees
convergence in feedforward networks // IEEE Trans. on Neural Networks. – 2006. –
vol. 17. – P. 1116–1125.
9. Kuan C. M., Hornik K. Convergence of learning algorithms with constant
learning rates // Ibid. – 1991. – vol. 2. – P. 484 – 489.
10. Luo Z. On the convergence of the LMS algorithm with adaptive learning rate
for linear feedforward networks // Neural Comput. – 1991. – vol. 3. – P. 226–245.
11. Finnoff W. Diffusion approximations for the constant learning rate backpro-
pagation algorithm and resistance to local minima // Ibid. – 1994. – 6. – P. 285– 295.
12. Gaivoronski A.A. Convergence properties of backpropagation for neural nets
via theory of stochastic gradient methods // Optim. Methods Software. – 1994. – 4. –
P. 117–134.
13. Fine T.L., Mukherjee S. Parameter convergence and learning curves for neural
networks // Neural Comput. – 1999. – 11. – P. 749–769.
14. Tadic V., Stankovic S. Learning in neural networks by normalized stochastic
gradient algorithm: Local convergence // Proc. 5th Seminar Neural Netw. Appl.
Electr. Eng. (Yugoslavia,Sept. 2000). – 2000. – P. 11–17.
15. Zhang H., Wu W., Liu F., Yao M. Boundedness and convergence of online
gradient method with penalty for feedforward neural networks // IEEE Trans. on
Neural Networks. – 2009. – vol. 20. – P. 1050–1054.
16. Mangasarian O.L., Solodov M.V. Serial and parallel backpropagation conver-
gence via nonmonotone perturbed minimization // Optim. Methods Software. –1994.
– P. 103–106.
Convergence of sequential gradient
Індуктивне моделювання складних систем, випуск 7, 2015 58
17. Wu W., Feng G., Li X. Training multilayer perceptrons via minimization of
ridge functions // Advances in Comput. Mathematics. – 2002. – vol. 17. – P. 331–
347.
18. Zhang N., Wu W., Zheng G. Convergence of gradient method with momen-
tum for two-layer feedforward neural networks // IEEE Trans. on Neural Networks. –
2006. – vol. 17. – P. 522–525.
19. Wu W., Feng G., Li X., Xu Y. Deterministic convergence of an online gra-
dient method for BP neural networks // Ibid. –2005. – vol. 16. – P. 1–9.
20. Xu Z.B., Zhang R., Jing W.F. When does online BP training converge? // Ibid.
– 2009. – vol. 20. – P. 1529–1539.
21. Shao H., Wu W., Liu L. Convergence and monotonicity of an online gradient
method with penalty for neural networks // WSEAS Trans. Math. – 2007. – vol. 6. –
P. 469–476.
22. Ellacott S.W. The numerical analysis approach // Mathematical Approaches to
Neural Networks (J.G. Taylor, ed; B.V.: Elsevier Science Publisher). – 1993. – P.
103–137.
23. Skantze F.P., Kojic A., Loh A.P., Annaswamy A.M. Adaptive estimation of
discrete time systems with nonlinear parameterization // Automatica. – 2000. –
vol. 36. – P. 1879–1887.
24. Loeve M. Probability theory. – New-York: Springer-Verlag. – 1963. – 425 p.
25. Zhiteckii L.S., Azarskov V.N., Nikolaienko S.A. Convergence of learning al-
gorithms in neural networks for adaptive identification of nonlinearly parameterized
systems // in Proc. 16th IFAC Symposium on System Identification (Brussels, Bel-
gium). – 2012. – P. 1593–1598.
26. Azarskov V.N., Kucherov D.P, Nikolaienko S.A., Zhiteckii L.S. Asymptotic
behaviour of gradient learning algorithms in neural network models for the identifica-
tion of nonlinear systems // American Journal of Neural Networks and Applications.
– 2015. – No 1(1). – P. 1–10.
27. Polyak B.T. Convergence and convergence rate of iterative stochastic algo-
rithms, I: General case // Autom. Remote Control. – 1976. – vol. 12. – P. 1858–1868.
|
| id | nasplib_isofts_kiev_ua-123456789-125021 |
| institution | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| issn | XXXX-0044 |
| language | English |
| last_indexed | 2025-12-07T13:19:29Z |
| publishDate | 2015 |
| publisher | Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України |
| record_format | dspace |
| spelling | Zhiteckii, L.S. Nikolaienko, S.A. 2017-10-13T16:18:25Z 2017-10-13T16:18:25Z 2015 Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case / L.S. Zhiteckii, S.A. Nikolaienko // Індуктивне моделювання складних систем: Зб. наук. пр. — К.: МННЦ ІТС НАН та МОН України, 2015. — Вип. 7. — С. 46-58. — Бібліогр.: 27 назв. — англ XXXX-0044 https://nasplib.isofts.kiev.ua/handle/123456789/125021 681.5 The paper deals with the asymptotic properties of an online learning procedure for identifying non-linear systems via neural networks models of these systems. The probabilistic convergence condi-tions of this procedure are presented for the special case where a nonlinearity can exactly be ap-proximated by a suitable neural network. Keywords: identification, nonlinear system, neural network, learning algorithm, stochastic environment, convergence. Стаття стосується асимптотичних властивостей деякої процедури навчання в реальному часі для ідентифікації нелінійних систем з використанням нейронних мереж як моделей цих систем. Представлені умови ймовірносної збіжності цієї процедури для спеціального випадку, коли нелінійність може бути точно апроксимована належною нейронною мережею. Ключові слова: ідентифікація, нелінійна система, нейронна мережа, алгоритм навчання, стохастичне середовище, збіжність. Статья касается асимптотических свойств некоторой процедуры обучения в реальном време-ни для идентификации нелинейных систем с использованием нейронных сетей в качестве моделей этих систем. Представлении условия вероятностной сходимости этой процедуры для специального случая, когда нелинейность может быть точно аппроксимирована подхо-дящей нейронной сетью. en Міжнародний науково-навчальний центр інформаційних технологій і систем НАН та МОН України Індуктивне моделювання складних систем Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case Article published earlier |
| spellingShingle | Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case Zhiteckii, L.S. Nikolaienko, S.A. |
| title | Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case |
| title_full | Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case |
| title_fullStr | Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case |
| title_full_unstemmed | Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case |
| title_short | Сonvergence of Sequential Gradient Learning Algorithms in Neural Networks for Online Identification of Nonlinear Systems: a Special Case |
| title_sort | сonvergence of sequential gradient learning algorithms in neural networks for online identification of nonlinear systems: a special case |
| url | https://nasplib.isofts.kiev.ua/handle/123456789/125021 |
| work_keys_str_mv | AT zhiteckiils sonvergenceofsequentialgradientlearningalgorithmsinneuralnetworksforonlineidentificationofnonlinearsystemsaspecialcase AT nikolaienkosa sonvergenceofsequentialgradientlearningalgorithmsinneuralnetworksforonlineidentificationofnonlinearsystemsaspecialcase |