Combined Reduced-Rank Transform

We propose and justify a new approach to constructing optimal nonlinear transforms of random vectors. We show that the proposed transform improves such characteristics of rank-reduced transforms as compression ratio, accuracy of decompression and reduces required computational work. The proposed tra...

Full description

Saved in:

Bibliographic Details
Published in:	Symmetry, Integrability and Geometry: Methods and Applications
Date:	2006
Main Authors:	Torokhti, A., Howlett, P.
Format:	Article
Language:	English
Published:	Інститут математики НАН України 2006
Online Access:	https://nasplib.isofts.kiev.ua/handle/123456789/146176
Tags:	Add Tag No Tags, Be the first to tag this record!
Journal Title:	Digital Library of Periodicals of National Academy of Sciences of Ukraine
Cite this:	Combined Reduced-Rank Transform / A. Torokhti, P. Howlett // Symmetry, Integrability and Geometry: Methods and Applications. — 2006. — Т. 2. — Бібліогр.: 47 назв. — англ.

Institution

Digital Library of Periodicals of National Academy of Sciences of Ukraine

id	nasplib_isofts_kiev_ua-123456789-146176
record_format	dspace
spelling	Torokhti, A. Howlett, P. 2019-02-07T20:20:34Z 2019-02-07T20:20:34Z 2006 Combined Reduced-Rank Transform / A. Torokhti, P. Howlett // Symmetry, Integrability and Geometry: Methods and Applications. — 2006. — Т. 2. — Бібліогр.: 47 назв. — англ. 1815-0659 2000 Mathematics Subject Classification: 41A29 https://nasplib.isofts.kiev.ua/handle/123456789/146176 We propose and justify a new approach to constructing optimal nonlinear transforms of random vectors. We show that the proposed transform improves such characteristics of rank-reduced transforms as compression ratio, accuracy of decompression and reduces required computational work. The proposed transform Tp is presented in the form of a sum with p terms where each term is interpreted as a particular rank-reduced transform. Moreover, terms in Tp are represented as a combination of three operations Fk, Qk and φk with k = 1,...,p. The prime idea is to determine Fk separately, for each k = 1,...,p, from an associated rank-constrained minimization problem similar to that used in the Karhunen-Loève transform. The operations Qk andφk are auxiliary for finding Fk. The contribution of each term in Tp improves the entire transform performance. A corresponding unconstrained nonlinear optimal transform is also considered. Such a transform is important in its own right because it is treated as an optimal filter without signal compression. A rigorous analysis of errors associated with the proposed transforms is given. The first co-author is grateful to Oliver Capp´e for useful discussions related to the structure of the proposed transform. en Інститут математики НАН України Symmetry, Integrability and Geometry: Methods and Applications Combined Reduced-Rank Transform Article published earlier
institution	Digital Library of Periodicals of National Academy of Sciences of Ukraine
collection	DSpace DC
title	Combined Reduced-Rank Transform
spellingShingle	Combined Reduced-Rank Transform Torokhti, A. Howlett, P.
title_short	Combined Reduced-Rank Transform
title_full	Combined Reduced-Rank Transform
title_fullStr	Combined Reduced-Rank Transform
title_full_unstemmed	Combined Reduced-Rank Transform
title_sort	combined reduced-rank transform
author	Torokhti, A. Howlett, P.
author_facet	Torokhti, A. Howlett, P.
publishDate	2006
language	English
container_title	Symmetry, Integrability and Geometry: Methods and Applications
publisher	Інститут математики НАН України
format	Article
description	We propose and justify a new approach to constructing optimal nonlinear transforms of random vectors. We show that the proposed transform improves such characteristics of rank-reduced transforms as compression ratio, accuracy of decompression and reduces required computational work. The proposed transform Tp is presented in the form of a sum with p terms where each term is interpreted as a particular rank-reduced transform. Moreover, terms in Tp are represented as a combination of three operations Fk, Qk and φk with k = 1,...,p. The prime idea is to determine Fk separately, for each k = 1,...,p, from an associated rank-constrained minimization problem similar to that used in the Karhunen-Loève transform. The operations Qk andφk are auxiliary for finding Fk. The contribution of each term in Tp improves the entire transform performance. A corresponding unconstrained nonlinear optimal transform is also considered. Such a transform is important in its own right because it is treated as an optimal filter without signal compression. A rigorous analysis of errors associated with the proposed transforms is given.
issn	1815-0659
url	https://nasplib.isofts.kiev.ua/handle/123456789/146176
citation_txt	Combined Reduced-Rank Transform / A. Torokhti, P. Howlett // Symmetry, Integrability and Geometry: Methods and Applications. — 2006. — Т. 2. — Бібліогр.: 47 назв. — англ.
work_keys_str_mv	AT torokhtia combinedreducedranktransform AT howlettp combinedreducedranktransform
first_indexed	2025-11-25T10:17:53Z
last_indexed	2025-11-25T10:17:53Z
_version_	1850512476929523712
fulltext	Symmetry, Integrability and Geometry: Methods and Applications Vol. 2 (2006), Paper 039, 21 pages Combined Reduced-Rank Transform Anatoli TOROKHTI and Phil HOWLETT School of Mathematics and Statistics, University of South Australia, Australia E-mail: anatoli.torokhti@unisa.edu.au, phil.howlett@unisa.edu.au URL: http://people.unisa.edu.au/Anatoli.Torokhti http://people.unisa.edu.au/Phil.Howlett Received November 25, 2005, in final form March 22, 2006; Published online April 07, 2006 Original article is available at http://www.emis.de/journals/SIGMA/2006/Paper039/ Abstract. We propose and justify a new approach to constructing optimal nonlinear trans- forms of random vectors. We show that the proposed transform improves such characteristics of rank-reduced transforms as compression ratio, accuracy of decompression and reduces re- quired computational work. The proposed transform Tp is presented in the form of a sum with p terms where each term is interpreted as a particular rank-reduced transform. More- over, terms in Tp are represented as a combination of three operations Fk, Qk and ϕk with k = 1, . . . , p. The prime idea is to determine Fk separately, for each k = 1, . . . , p, from an as- sociated rank-constrained minimization problem similar to that used in the Karhunen–Loève transform. The operations Qk and ϕk are auxiliary for finding Fk. The contribution of each term in Tp improves the entire transform performance. A corresponding unconstrained non- linear optimal transform is also considered. Such a transform is important in its own right because it is treated as an optimal filter without signal compression. A rigorous analysis of errors associated with the proposed transforms is given. Key words: best approximation; Fourier series in Hilbert space; matrix computation 2000 Mathematics Subject Classification: 41A29 1 Introduction Methods of data dimensionality reduction [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24] have been applied successfully to many applied problems. The diversity of applications has stimulated a considerable increase in the study of data dimensionality reduction in recent decades. Significant recent results in this challenging research area are described, in particular, in references [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. The known methods concern both a probabilistic setting (as in [5, 6, 7, 8, 9, 10, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]) and deterministic setting (as in [12, 13, 14, 15]) in the dimensionality reduction. The associated techniques are often based on the use of reduced-rank operators. In this paper, a further advance in the development of reduced-rank transforms is presented. We study a new approach to data dimensionality reduction in a probabilistic setting based on the development of ideas presented in [5, 6, 7, 26, 27, 28, 29]. Motivation for the proposed approach arises from the following observation. In general, the reduced-rank transform consists of the three companion operations which are filtering, compres- sion and reconstruction [5, 6, 7, 16, 26]. Filtering and compression are performed simultaneously to estimate a reference signal x with m components from noisy observable data y and to filter and reduce the data to a shorter vector x̂ with η components, η < m. Components of x̂ are often called principal components [4]. The quotient η/m is called the compression ratio. Re- construction returns a vector x̃ with m components so that x̃ should be close to the original x. mailto:anatoli.torokhti@unisa.edu.au mailto:phil.howlett@unisa.edu.au http://people.unisa.edu.au/Anatoli.Torokhti http://people.unisa.edu.au/Phil.Howlett http://www.emis.de/journals/SIGMA/2006/Paper039/ 2 A. Torokhti and P. Howlett It is natural to perform these three operations so that the reconstruction error and the related computational burden are minimal. As a result, the performance of the reduced-rank transform is characterized by three issues which are (i) associated accuracy, (ii) compression ratio, and (iii) computational work. For a given compression ratio, the Karhunen–Loève transform (KLT) [5, 6, 7] minimizes the reconstruction error over the class of all linear reduced-rank transforms. Nevertheless, it may happen that the accuracy and compression ratio associated with the KLT are still not satisfac- tory. In such a case, an improvement in the accuracy and compression ratio can be achieved by a transform with a more general structure than that of the KLT. Special non-linear transforms have been studied in [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34] using transform structures developed from the generalised Volterra polynomials. Nevertheless, the transforms [16, 26, 27, 28, 29] imply a substantial computational burden associated with the large number N of terms required by the underlying Volterra polynomial structure. Our objective is to justify a new transform that may have both accuracy and compression ratio better than those of the known transforms [5, 6, 7, 26, 27, 28, 29]. A related objective is to find a way to reduce the associated computational work compared with that implied by the transforms [26, 27, 28, 29]. The analysis of these issues is given in Sections 4, 5.2.2 (Remark 4), 5.2.3 and 5.2.4. In Section 5.2.5, we show that the proposed approach generalizes the Fourier series in Hilbert space, the Wiener filter, the Karhunen–Loève transform and the transforms given in [26, 27, 29]. 2 Method description We use the following notation: (Ω,Σ, µ) is a probability space, where Ω = {ω} is the set of outcomes, Σ a σ-field of mea- surable subsets of Ω and µ : Σ → [0, 1] an associated probability measure on Σ with µ(Ω) = 1; x ∈ L2(Ω, Rm) and y ∈ L2(Ω, Rn) are random vectors with realizations x = x(ω) ∈ Rm and y = y(ω) ∈ Rn, respectively. Each matrix M ∈ Rm×n defines a bounded linear transformationM : L2(Ω, Rn) → L2(Ω, Rm) via the formula [My](ω) = My(ω) for each ω ∈ Ω. We note that there are many bounded linear transformations from L2(Ω, Rn) into L2(Ω, Rm) that cannot be written in the form [My](ω) = My(ω) for each ω ∈ Ω. A trivial example is A : L2(Ω, Rn) → L2(Ω, Rm) given by A(y) = ∫ Ω y(ω)dµ(ω). Throughout the paper, the calligraphic character letters denote operators defined similarly to M. Let g = [g1 . . . gm]T ∈ L2(Ω, Rm) and h = [h1 . . .hn]T ∈ L2(Ω, Rn) be random vectors with gi,hk ∈ L2(Ω, R) for i = 1, . . . ,m, k = 1, . . . , n. For all i = 1, . . . ,m and k = 1, . . . , n, we set E[gi] = ∫ Ω gi(ω)dµ(ω), E[gihk] = ∫ Ω gi(ω)hk(ω)dµ(ω), (1) Egh = E [ ghT ] = {E[gihk]} ∈ Rm×n and Eg = E[g] = {E[gi]} ∈ Rm. (2) We also write Egh = E [ (g − Eg)(h− Eh)T ] = Egh − E[g]E [ hT ] . Achievement of the above objectives is based on the presentation of the proposed transform in the form of a sum with p terms (3) where each term is interpreted as a particular rank-reduced transform. Moreover, terms in (3) are represented as a combination of three operations Fk, Qk and ϕk for each k = 1, . . . , p, where ϕk is nonlinear. The prime idea is to determine Fk Combined Reduced-Rank Transform 3 separately, for each k = 1, . . . , p, from an associated rank-constrained minimization problem similar to that in the KLT. The operations Qk and ϕk are auxiliary for finding Fk. It is natural to expect that a contribution of each term in (3) will improve the entire transform performance. To realize such a scheme, we choose the Qk as orthogonal/orthonormal operators (see Sec- tion 3). Then each Fk can be determined independently for each individual problem (33) or (56) below. Next, operators ϕk are used to reduce the number of terms from N (as in [16, 26, 27, 28, 29]) to p with p � N . For example, this can be done when we choose ϕk in the form presented in Section 5.2.4. Moreover, the composition of operators Qk and ϕk allows us to reduce the related covariance matrices to the identity matrix or to a block-diagonal form with small blocks. Remark 4 in Section 5.2.2 gives more details in this regard. The computational work associated with such blocks is much less than that for the large covariance matrices in [16, 26, 27, 28, 29]. To regulate accuracy associated with the proposed transform and its compression ratio, we formulate the problem in the form (6)–(7) where (7) consists of p constraints. It is shown in Remark 2 of Section 4, and in Sections 5.2.1, 5.2.2 and 5.2.4 that such a combination of constraints allows us to equip the proposed transforms with several degrees of freedom. The structure of our transform is presented in Section 3 and the formal statement of the problem in Section 4. In Section 5, we determine operators Qk and Fk (Lemmata 1 and 3, and Theorems 1 and 2, respectively). 3 Structure of the proposed transform 3.1 Generic form The proposed transform Tp is presented in the form Tp(y) = f + p∑ k=1 FkQkϕk(y) = f + F1Q1ϕ1(y) + · · ·+ FpQpϕp(y), (3) where f ∈ Rm, ϕk : L2(Ω, Rn) → L2(Ω, Rn), Q1, . . . ,Qp : L2(Ω, Rn) → L2(Ω, Rn) and Fk : L2(Ω, Rn) → L2(Ω, Rm). In general, one can put x ∈ L2(Ω,HX), y ∈ L2(Ω,HY ), ϕk : L2(Ω,HY ) → L2(Ω,Hk), Qk : L2(Ω,Hk) → L2(Ω, H̃k) and Fk : L2(Ω, H̃k) → L2(Ω,HX) with HX , HY , Hk and H̃k separable Hilbert spaces, and k = 1, . . . , p. In (3), the vector f and operators F1, . . . ,Fp are determined from the minimization problem (6)–(7) given in the Section 4. Operators Q1, . . . ,Qp in (3) are orthogonal (orthonormal) in the sense of the Definition 1 in Section 4 (in this regard, see also Remark 3 in Section 5.1). To demonstrate and justify flexibility of the transform Tp with respect to the choice of ϕ1, . . . ,ϕp in (3), we mainly study the case where ϕ1, . . . ,ϕp are arbitrary. Specifications of ϕ1, . . . ,ϕp are presented in Sections 3.2, 5.2.4 and 5.2.5 where we also discuss the benefits associated with some particular forms of ϕ1, . . . ,ϕp. 3.2 Some particular cases Particular cases of the model Tp are associated with specific choices of ϕk, Qk and Fk. Some examples are given below. (i) If HX = HY = Rn and Hk = H̃k = Rnk where Rnk is the kth degree of Rn, then (3) generalises the known transform structures [16, 26, 27, 28, 29]. The models [16, 26, 27, 28, 29] follow from (3) if ϕk(y) = yk where yk = (y, . . . ,y) ∈ L2(Ω, Rnk), Qk = I, where I is the identity operator, and if Fk is a k-linear operator. It has been shown in [16, 26, 27, 28, 29] that 4 A. Torokhti and P. Howlett such a form of ϕk leads to a significant improvement in the associated accuracy. See Section 5.2.5 for more details. (ii) If ϕk : L2(Ω,HY ) → L2(Ω,HX) and {u1,u2, . . .} is a basis in L2(Ω,HX) then ϕk and Qk can be chosen so that ϕk(y) = uk and Qk = I, respectively. As a result, in this particular case, Tp(y) = f + p∑ k=1 Fk(uk). (iii) A similar case follows if ϕk : L2(Ω,HY ) → L2(Ω,Hk) is arbitrary but Qk : L2(Ω,Hk) → L2(Ω, H̃k) is defined so that Qk[ϕk(y)] = vk with k = 1, . . . , p where {v1,v2, . . .} is a basis in L2(Ω, H̃k). Then Tp(y) = f + p∑ k=1 Fk(vk). (iv) Let x̃(1), . . . , x̃(p) be estimates of x by the known transforms [7, 25, 30]. Then we can put ϕ1(y) = x̃(1), . . . , ϕp(y) = x̃(p). In particular, one could choose ϕ1(y) = y. In such a way, the vector x is pre-estimated from y, and therefore, the overall x estimate by Tp will be improved. A new recursive method for finding x̃(1), . . . , x̃(p) is given in Section 5.2.4 below. Other particular cases of the proposed transform are considered in Sections 5.2.4 and 5.2.5. Remark 1. The particular case of Tp considered in the item (iii) above can be interpreted as an operator form of the Fourier polynomial in Hilbert space [35]. The benefits associated with the Fourier polynomials are well known. In item (ii) of Section 5.2.5, this case is considered in more detail. 4 Statement of the problem First, we define orthogonal and orthonormal operators as follows. Definition 1. Let uk ∈ L2(Ω, Rn) and vk = Qk(uk). The operators Q1, . . . ,Qp are called pairwise orthonormal if Evivj = { O, i 6= j, I, i = j for any i, j = 1, . . . , p. Here, O and I are the zero matrix and identity matrix, respectively. If Evivj = O for i 6= j with i, j = 1, . . . , p, and if Evivj is not necessarily equal to I for i = j then Q1, . . . ,Qp are called pairwise orthogonal. Hereinafter, we suppose that Fk is linear for all k = 1, . . . , p and that the Hilbert spaces are the finite dimensional Eucledian spaces, HX = Rm and HY = Hk = H̃k = Rn. For any vector g ∈ L2(Ω, Rm), we set E[‖g‖2] = ∫ Ω ‖g(ω)‖2dµ(ω) < ∞, (4) where ‖g(ω)‖ is the Euclidean norm of g(ω). Let us denote J(f,F1, . . .Fp) = E [ ‖x− Tp(y)‖2 ] . (5) The problem is (i) to find operators Q1, . . . ,Qp satisfying Definition 1, and (ii) to determine the vector f0 and operators F0 1 , . . . ,F0 p such that J(f0,F0 1 , . . .F0 p ) = min f,F1,...,Fp J(f,F1, . . . ,Fp) (6) subject to rankF1 = η1, . . . , rankFp = ηp, (7) where η1 + · · ·+ ηp = η ≤ min{m,n}. Combined Reduced-Rank Transform 5 Here, for k = 1, . . . , p, (see, for example, [44]) rank(Fk) = dimFk(L2(Ω, Rn)). We write T 0 p (y) = f0 + p∑ k=1 F0 k (vk) (8) with vk defined by Definition 1. It is supposed that covariance matrices formed from vectors Q1ϕ1(y), . . . ,Qpϕp(y) in (3) are known or can be estimated. Various estimation methods can be found in [36, 37, 38, 39, 40, 41]. We note that such an assumption is traditional [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] in the study of optimal transforms. The effective estimate of covariance matrices represents a specific task [36, 37, 38, 39, 40, 41] which is not considered in this paper. Remark 2. Unlike known rank-constrained problems, we consider p constraints (7). The num- ber p of the constraints and the ranks η1, . . . , ηp form the degrees of freedom for T 0 p . Variation of p and η1, . . . , ηp allows us to regulate accuracy associated with the transform T 0 p (see (21) in Section 5.2.1 and (46) in Section 5.2.2) and its compression ratio (see (67) in Section 5.2.4). It follows from (21) and (46) that the accuracy increases if p and η1, . . . , ηp increase. Conversely, by (67), the compression ratio is improved if η1, . . . , ηp decrease. 5 Solution of the problem The problem (6)–(7) generalises the known rank-constrained problems where only one constraint has been considered. Our plan for the solution is as follows. First, in Section 5.1, we will determine the operators Q1, . . . ,Qp. Then, in Section 5.2, we will obtain f0 and F0 1 , . . . ,F0 p satisfying (6) and (7). 5.1 Determination of orthogonalizing operators Q1, . . . , Qp If M is a square matrix then we write M1/2 for a matrix such that M1/2M1/2 = M. We note that the matrix M1/2 can be computed in various ways [42]. In this paper, M1/2 is determined from the singular value decomposition (SVD) [43] of M . For the case when matrix Evkvk is invertible for any k = 1, . . . , p, the orthonormalization procedure is as follows. For uk ∈ L2(Ω, Rn), we write [Qk(uk)](ω) = Qkuk(ω), (9) where Qk ∈ Rn×n. For uk,vj ,wj ∈ L2(Ω, Rn), we also define operators Eukvj , E−1 vjvj : L2(Ω, Rn)→ L2(Ω, Rn) by the equations [Eukvj (wj)](ω) = Eukvjwj(ω) and [E−1 vjvj (wj)](ω) = E−1 vjvj wj(ω), (10) respectively. Lemma 1. Let w1 = u1 and wi = ui − i−1∑ k=1 Euiwk E−1 wkwk (wk) for i = 1, . . . , p, (11) where E−1 wkwk exists. Then 6 A. Torokhti and P. Howlett (i) the vectors w1, . . . ,wp are pairwise orthogonal, and (ii) the vectors v1, . . . ,vp, defined by vi = Qi(ui) (12) with Qi(ui) = ( E1/2 wiwi )−1(wi) (13) for i = 1, . . . , p, are pairwise orthonormal. Proof. The proof is given in the Appendix. � For the case when matrix Evkvk is singular for k = 1, . . . , p, the orthogonalizing operators Q1, . . . ,Qp are determined by Lemma 3 below. Another difference from Lemma 1 is that the vectors v1, . . . ,vp in Lemma 3 are pairwise orthogonal but not orthonormal. An intermediate result is given in Lemma 2. The symbol † is used to denote the pseudo-inverse operator [45]. It is supposed that the pseudo-inverse M † for matrix M is determined from the SVD of M . Lemma 2 ([26]). For any random vectors g ∈ L2(Ω, Rm) and h ∈ L2(Ω, Rn), EghE†hhEhh = Egh. (14) Lemma 3. Let vi = Qi(ui) for i = 1, . . . , p, where Q1, . . . ,Qp are such that Q1(u1) = u1 and Qi(ui) = ui − i−1∑ k=1 Zik(vk) for i = 2, . . . , p (15) with Zik : L2(Ω, Rn) → L2(Ω, Rn) defined by Zik = Euivk E†vkvk + Aik(I − Evkvk E†vkvk ) (16) with Aik ∈ Rn×n arbitrary. Then the vectors v1, . . . ,vp are pairwise orthogonal. Proof. The proof is given in the Appendix. � We note that Lemma 3 does not require invertibility of matrix Evkvk . At the same time, if E−1 vkvk exists, then vectors w1, . . . ,wp and v1, . . . ,vp defined by (11) and Lemma 3 respectively, coincide. Remark 3. Orthogonalization of random vectors is not, of course, a new idea. In particular, generalizations of the Gram–Schmidt orthogonalization procedure have been considered in [46, 47]. The proposed orthogonalization procedures in Lemmata 1 and 3 are different from those in [46, 47]. In particular, Lemma 3 establishes the vector orthogonalization in terms of pseudo- inverse operators. A particular case of the practical implementation of the random vector orthogonalization is considered in Section 6. Combined Reduced-Rank Transform 7 5.2 Determination of f0, F0 1 , . . . , F0 p satisfying (6)–(7) 5.2.1 The case when matrix Evivi is invertible for i = 1, . . . , p We consider the simpler case when Evivi is invertible for all i = 1, . . . , p. Then the vector f0 and operators F0 1 , . . . ,F0 p satisfying (6)–(7) are defined from the following Theorem 1. For each i = 1, . . . , p, let UiΣiV T i be the SVD of Exvi , UiΣiV T i = Exvi , (17) where Ui ∈ Rm×n, Vi ∈ Rn×n are orthogonal and Σi ∈ Rn×n is diagonal, Ui = [si1, . . . , sin], Vi = [di1, . . . , din] and Σi = diag (αi1, . . . , αin) (18) with αi1 ≥ · · · ≥ αir > 0, αi,r+1 = · · · = αin = 0 and r = 1, . . . , n where r = r(i). We set Uiηi = [si1, . . . , siηi ], Viηi = [di1, . . . , diηi ] and Σiηi = diag(αi1, . . . , αiηi), where Uiηi ∈ Rm×ηi , Viηi ∈ Rn×ηi and Σiηi ∈ Rηi×ηi . Now we define Kiηi ∈ Rm×n and Kiηi : L2(Ω, Rn) → L2(Ω, Rm) by Kiηi = UiηiΣiηiV T iηi and [Kiηi(wi)](ω) = Kiηi [wi(ω)], (19) respectively, for any wi ∈ L2(Ω, Rn). Theorem 1. Let v1, . . . ,vp be determined by Lemma 1. Then the vector f0 and operators F0 1 , . . . ,F0 p , satisfying (6)–(7), are determined by f0 = E[x]− p∑ k=1 F 0 k E[vk] and F0 1 = K1η1 , . . . , F0 p = Kpηp . (20) The accuracy associated with transform T 0 p , determined by (8) and (20), is given by E[‖x− T 0 p (y)‖2] = ‖E1/2 xx ‖2 − p∑ k=1 ηk∑ j=1 α2 kj . (21) Proof. The functional J(f,F1, . . . ,Fp) is written as J(f,F1, . . . ,Fp) = tr [ Exx − E[x]fT − p∑ i=1 ExviF T i − fE[xT ] + ffT + f p∑ i=1 E[vT i ]F T i − p∑ i=1 FiEvix + p∑ i=1 FiE[vi]fT + E ( p∑ i=1 Fi(vi) [ p∑ k=1 Fi(vi) ]T)] . (22) We remind (see Section 2) that here and below, Fi is defined by [Fi(vi)](ω) = Fi[vi(ω)] so that, for example, E[Fk(vk)xT k ] = FkEvkxk . In other words, the right hand side in (22) is a function of f , F1, . . . ,Fp. Let us show that J(f,F1, . . . ,Fp) can be represented as J(f,F1, . . . ,Fp) = J0 + J1 + J2, (23) where J0 = ‖E1/2 xx ‖2 − p∑ i=1 ‖Exvi‖2, (24) 8 A. Torokhti and P. Howlett J1 = ‖f − E[x] + p∑ i=1 FiE[vi]‖2 and J2 = p∑ i=1 ‖Fi − Exvi‖2. (25) Indeed, J1 and J2 are rewritten as follows J1 = tr ( ffT − fE[xT ] + p∑ i=1 fE[vT i ]Fi + E[x]E[xT ]− E[x]fT − p∑ i=1 E[x]E[vT i ]F T i + p∑ i=1 FiE[vi]fT − p∑ i=1 FiE[vi]E[xT ] + p∑ i=1 FiE[vi] p∑ k=1 E[vT k ]F T k ) (26) and J2 = p∑ i=1 tr (Fi − Exvi)(F T i − Evix) = p∑ i=1 tr (FiF T i − FiEvix − ExviF T i + ExviEvix). (27) In (27), p∑ i=1 tr (FiF T i ) can be represented in the form p∑ i=1 tr (FiF T i ) = tr [ E ( p∑ i=1 Fivi p∑ k=1 vT k F T k )] − tr ( p∑ i=1 FiE[vi] p∑ k=1 E[vT k ]F T k ) (28) because E[viv T k ]− E[vi]E[vT k ] = { O, i 6= k, I, i = k (29) due to the orthonormality of vectors v1, . . . ,vp. Then J0 + J1 + J2 = tr(Exx − E[x]E[xT ])− p∑ i=1 tr[ExviEvix] (30) + tr ( ffT − fE[xT ] + p∑ i=1 fE[vT i ]Fi + E[x]E[xT ]− E[x]fT − p∑ i=1 E[x]E[vT i ]F T i + p∑ i=1 FiE[vi]fT − p∑ i=1 FiE[vi]E[xT ] + p∑ i=1 FiE[vi] p∑ k=1 E[vT k ]F T k ) + tr [ E ( p∑ i=1 Fivi p∑ k=1 vT k F T k )] − tr ( p∑ i=1 FiE[vi] p∑ k=1 E[vT k ]F T k ) − p∑ i=1 tr(FiEvix − FiE[vi]E[xT ] + ExviF T i − E[x]E[vT i ]F T i − ExviEvix) = J(f,F1, . . . ,Fp). (31) Hence, (23) is true. Therefore, J(f,F1, . . . ,Fp) = ‖E1/2 xx ‖2 − p∑ k=1 ‖Exvk ‖2 + ‖f − E[x] + p∑ k=1 FkE[vk]‖2 Combined Reduced-Rank Transform 9 + p∑ k=1 ‖Fk − Exvk ‖2. (32) It follows from (32) that the constrained minimum (6)–(7) is achieved if f = f0 with f0 given by (20), and if F 0 k is such that Jk(F 0 k ) = min Fk Jk(Fk) subject to rank (Fk) = ηk, (33) where Jk(Fk) = ‖Fk − Exvk ‖2. The solution to (33) is given [43] by F 0 k = Kkηk . (34) Then E[‖x− T 0 p (y)‖2] = ‖E1/2 xx ‖2 − p∑ k=1 (‖Exvk ‖2 − ‖Kkηk − Exvk ‖2). Here [43], ‖Exvk ‖2 = r∑ j=1 α2 kj and ‖Kkηk − Exvk ‖2 = r∑ j=ηk+1 α2 kj (35) with r = r(k). Thus, (21) is true. The theorem is proved. � Corollary 1. Let v1, . . . ,vp be determined by Lemma 1. Then the vector f̂ and operators F̂1, . . . , F̂p satisfying the unconstrained problem (6), are determined by f̂ = E[x]− p∑ k=1 F̂kE[vk] and F̂1 = Exv1 , . . . , F̂p = Exvp (36) with F̂k such that [F̂k(vk)](ω) = F̂kvk(ω) where F̂k ∈ Rn×m and k = 1, . . . , p. The accuracy associated with transform T̂p given by T̂p(y) = f̂ + p∑ k=1 F̂k(vk) (37) is such that E[‖x− T̂p(y)‖2] = ‖E1/2 xx ‖2 − p∑ k=1 ‖Exvk ‖2. (38) Proof. The proof follows directly from (32). � 5.2.2 The case when matrix Evkvk is not invertible for k = 1, . . . , p We write Ak ∈ Rm×n for an arbitrary matrix, and define operators Ak : L2(Ω, Rn) → L2(Ω, Rm) and Evkvk , E†vkvk , (E1/2 vkvk)† : L2(Ω, Rn) → L2(Ω, Rn) similarly to those in (9) and (10). For the case under consideration (matrix Evkvk is not invertible), we introduce the SVD of Exvk (E1/2 vkvk)†, UkΣkV T k = Exvk (E1/2 vkvk )†, (39) 10 A. Torokhti and P. Howlett where, as above, Uk ∈ Rm×n, Vk ∈ Rn×n are orthogonal and Σk ∈ Rn×n is diagonal, Uk = [sk1, . . . , skn], Vk = [dk1, . . . , dkn] and Σk = diag(βk1, . . . , βkn) (40) with βk1 ≥ · · · ≥ βkr > 0, βk,r+1 = · · · = βkn = 0, r = 1, . . . , n and r = r(k). Let us set Ukηk = [sk1, . . . , skηk ], Vkηk = [dk1, . . . , dkηk ] and Σkηk = diag (βk1, . . . , βkηk ), (41) where Ukηk ∈ Rm×ηk , Vkηk ∈ Rn×ηk and Σkηk ∈ Rηk×ηk . Now we define Gkηk ∈ Rm×n and Gkηk : L2(Ω, Rn) → L2(Ω, Rm) by Gkηk = Ukηk Σkηk V T kηk and [Gkηk (wk)](ω) = Gkηk [wk(ω)], (42) respectively, for any wk ∈ L2(Ω, Rn). As noted before, we write I for the identity operator. Theorem 2. Let v1, . . . ,vp be determined by Lemma 3. Then f0 and F0 1 , . . . ,F0 p , satisfying (6)–(7), are determined by f0 = E[x]− p∑ k=1 F 0 k E[vk] (43) and F0 1 = G1η1(E1/2 v1v1 )† +A1[I − E1/2 v1v1 (E1/2 v1v1 )†], (44) · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · F0 p = Gpηp(E1/2 vpvp )† +Ap[I − E1/2 vpvp (E1/2 vpvp )†], (45) where for k = 1, . . . , p, Ak is any linear operator such that rankF0 k ≤ ηk 1. The accuracy associated with transform T 0 p given by (8) and (43)–(45) is such that E[‖x− T 0 p (y)‖2] = ‖E1/2 xx ‖2 − p∑ k=1 ηk∑ j=1 β2 kj . (46) Proof. For v1, . . . ,vp determined by Lemma 3, J(f,F1, . . . ,Fp) is represented by (22) as well. Let us consider J0, J1 and J2 given by J0 = ‖E1/2 xx ‖2 − p∑ k=1 ‖Exvk (E1/2 vkvk )†‖2, (47) J1 = ‖f − E[x] + p∑ k=1 FkE[vk]‖2 and J2 = p∑ k=1 ‖FkE1/2 vkvk − Exvk (E1/2 vkvk )†‖2. (48) To show that J(f,F1, . . . ,Fp) = J0 + J1 + J2 (49) with J(f,F1, . . . ,Fp) defined by (22), we use the relationships (see [26]) Exvk E†vkvk Evkvk = Exvk and E†vkvk E1/2 vkvk = (E1/2 vkvk )† (50) 1In particular, Ak can be chosen as the zero operator. Combined Reduced-Rank Transform 11 Then J1 = tr ( ffT − fE[xT ] + p∑ k=1 fE[vT k ]Fk + E[x]E[xT ]− E[x]fT − p∑ k=1 E[x]E[vT k ]F T k + p∑ k=1 FkE[vk]fT − p∑ k=1 FkE[vk]E[xT ] + p∑ k=1 FkE[vk] p∑ i=1 E[vT i ]F T i ) (51) and J2 = p∑ k=1 tr(Fk − Exvk E†vkvk )Evkvk (F T k − E†vkvk Evkx) = p∑ k=1 tr(FkEvkvk F T k − FkEvkx − Exvk F T k + Exvk E†vkvk Evkx), (52) where p∑ k=1 tr(FkEvkvk F T k ) = tr [ E ( p∑ k=1 Fkvk p∑ i=1 vT i F T i )] − tr ( p∑ k=1 FkE[vk] p∑ i=1 E[vT i ]F T i ) (53) because E[viv T k ]− E[vi]E[vT k ] = O for i 6= k (54) due to orthogonality of the vectors v1, . . . ,vs. On the basis of (50)–(53) and similarly to (30)– (31), we establish that (49) is true. Hence, J(f,F1, . . . ,Fp) = ‖E1/2 xx ‖2 − p∑ k=1 ‖Exvk (E1/2 vkvk )†‖2 + ‖f − E[x] + p∑ k=1 FkE[vk]‖2 + p∑ k=1 ‖FkE1/2 vkvk − Exvk (E1/2 vkvk )†‖2. (55) It follows from the last two terms in (55) that the constrained minimum (6)–(7) is achieved if f = f0 with f0 given by (43), and F 0 k is such that Jk(F 0 k ) = min Fk Jk(Fk) subject to rank (Fk) = ηk, (56) where Jk(Fk) = ‖FkE1/2 vkvk − Exvk (E1/2 vkvk)†‖2. The constrained minimum (6)–(7) is achieved if f = f0 is defined by (43), and if [43] FkE1/2 vkvk = Gηk . (57) The matrix equation (57) has the general solution [45] Fk = F 0 k = Gηk (E1/2 vkvk )† + Ak[I − E1/2 vkvk (E1/2 vkvk )†] (58) if and only if Gηk (E1/2 vkvk )†E1/2 vkvk = Gηk . (59) The latter is satisfied on the basis of the following derivation2. 2Note that the matrix I−E1/2 vkvk (E1/2 vkvk )† is simply a projection onto the null space of Evkvk and can be replaced by I − Evkvk (Evkvk )†. 12 A. Torokhti and P. Howlett As an extension of the technique presented in the proving Lemmata 1 and 2 in [26], it can be shown that for any matrices Q1, Q2 ∈ Rm×n, N (Q1) ⊆ N (Q2) ⇒ Q2(I −Q† 1Q1) = O, (60) where N (Qi) is the null space of Qi for i = 1, 2. In regard of the equation under consideration, N ([E1/2 vkvk ]†) ⊆ N (Exvk [E1/2 vkvk ]†). (61) The definition of Gηk implies that N (Exvk [E1/2 vkvk ]†) ⊆ N (Gηk ) and then N ([E1/2 vkvk ]†) ⊆ N (Gηk ). On the basis of (60), the latter implies Gηk [I − (E1/2 vkvk)†E1/2 vkvk ] = O, i.e. (59) is true. Hence, (58) and (44)–(45) are true as well. Next, similar to (35), ‖Exvk (E1/2 vkvk )†‖2 − ‖Gηk − Exvk (E1/2 vkvk )†‖2 = ηk∑ j=1 β2 kj . (62) Then (46) follows from (55), (58), (43) and (62). � Remark 4. The known reduced-rank transforms based on the Volterra polynomial structure [16, 27, 29] require the computation of a covariance matrix similar to Evv, where v = [v1, . . . ,vp]T , but for p = N where N is large (see Sections 1 and 2). The relationships (30)–(33) and (51)–(56) illustrate the nature of the proposed method and its difference from the techniques in [16, 27, 29]: due to the structure (3) of the transform Tp, the procedure for finding f0, F0 1 , . . ., F0 p avoids direct computation of Evv which could be troublesome due to large N . If operators Q1, . . . ,Qp are orthonormal, as in Theorem 1, then (29) is true and the covariance matrix Evv is reduced to the identity. If operators Q1, . . . ,Qp are orthogonal, as in Theorem 2, then (54) holds and the covariance matrix Evv is reduced to a block-diagonal form with non-zero blocks Ev1v1 , . . . , Evpvp so that Evv =  Ev1v1 O . . . O O Ev2v2 . . . O . . . . . . . . . . . . O O . . . Evpvp  with O denoting the zero block. As a result, the procedure for finding f0, F0 1 , . . . ,F0 p is reduced to p separate rank-constrained problems (33) or (56). Unlike the methods in [16, 27, 29], the operators F0 1 , . . . ,Fp 0 are determined with much smaller m× n and n× n matrices given by the simple formulae (20) and (43)–(45). This implies a reduction in computational work compared with that required by the approach in [27, 29, 34]. Corollary 2. Let v1, . . . ,vp be determined by Lemma 3. Then the vector f̄ and operators F̄1, . . . , F̄p, satisfying the unconstrained minimum (6), are determined by f̄ = E[x]− p∑ k=1 F̄kE[vk] (63) and F̄1 = Exv1E†v1v1 +A1[I − Ev1v1E†v1v1 ], (64) Combined Reduced-Rank Transform 13 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · F̄p = ExvpE†vpvp +Ap[I − EvpvpE†vpvp ]. (65) The associated accuracy for transform T̄p, defined by T̄p(y) = f̄ + p∑ k=1 F̄k(vk), is given by E[‖x− T̄p(y)‖2] = ‖E1/2 xx ‖2 − p∑ k=1 ‖Exvk (E1/2 vkvk )†‖2. (66) Proof. It follows from (55) that the unconstrained minimum (6) is achieved if f is defined by (63) and if Fk satisfies the equation FkE1/2 vkvk − Exvk (E1/2 vkvk)† = O for each k = 1, . . . , p. Similar to (57)–(58), its general solution is given by Fk = F̄k = Exvk E†vkvk + Ak[I − Evkvk E†vkvk ] because E1/2 vkvk(E1/2 vkvk)† = Evkvk E†vkvk . We define F̄k by [F̄k(wk)](ω) = F̄k[wk(ω)] for all k = 1, . . . , p, and then (64)–(65) are true. The relation (66) follows from (55) and (63)–(65). � Remark 5. The difference between the transforms given by Theorems 1 and 2 is that F0 k by (20) (Theorem 1) does not contain a factor associated with (E1/2 vkvk)† for all k = 1, . . . .p. A similar observation is true for Corollaries 1 and 2. Remark 6. The transforms given by Theorems 1 and 2 are not unique due to arbitrary operators A1, . . . ,Ap. A natural particular choice is A1 = · · · = Ap = O. 5.2.3 Compression procedure by T 0 p Let us consider transform T 0 p given by (8), (43)–(45) with Ak = O for k = 1, . . . , p where Ak is the matrix given in (58). We write [T 0 p (y)](ω) = T 0 p (y) with T 0 p : Rn → Rm. Let B (1) k = Skηk Vkηk DT kηk and B (2) k = DT kηk (E1/2 vkvk )† so that B (1) k ∈ Rm×ηk and B (2) k ∈ Rηk×n. Here, η1, . . ., ηp are determined by (7). Then T 0 p (y) = f + p∑ k=1 B (1) k B (2) k vk, where vk = vk(ω) and B (2) k vk ∈ Rηk for k = 1, . . . , p with η1 + · · · + ηp < m. Hence, matrices B (2) 1 , . . . , B (2) p perform compression of the data presented by v1, . . . , vp. Matrices B (1) 1 , . . . , B (1) p perform reconstruction of the reference signal from the compressed data. The compression ratio of transform T 0 p is given by r0 = (η1 + · · ·+ ηp)/m. (67) 14 A. Torokhti and P. Howlett 5.2.4 A special case of transform Tp The results above have been derived for any operators ϕ1, . . . ,ϕp in the model Tp. Some specializations for ϕ1, . . . ,ϕp were given in Section 3.2. Here and in Section 5.2.5, we consider alternative forms for ϕ1, . . . ,ϕp. (i) Operators ϕ1, . . . ,ϕp can be determined by a recursive procedure given below. The motivation follows from the observation that performance of the transform Tp is improved if y in (5) is replaced by an estimate of x. First, we set ϕk(y) = y and determine estimate x(1) of x from the solution of problem (6) (with no constraints (7)) by Corollaries 1 or 2 with p = 1. Next, we put ϕ1(y) = y and ϕ2(y) = x(1), and find estimate x(2) from the solution of unconstrained problem (6) with p = 2. In general, for j = 1, . . . , p, we define ϕj(y) = x(j−1), where x(j−1) has been determined similarly to x(2) from the previous steps. In particular, x(0) = y. (ii) Operators ϕ1, . . . ,ϕp can also be chosen as elementary functions. An example is given in item (i) of Section 3.2 where ϕk(y) was constructed from the power functions. An alternative possibility is to choose trigonometric functions for constructing ϕk(y). For instance, one can put [ϕ1(y)](ω) = y and [ϕk+1(y)](ω) = [cos(ky1), . . . , cos(kyn)]T (68) with y = [y1, . . . , yn]T and k = 1, . . . , p − 1. In this paper, we do not analyse such a possible choice for ϕ1, . . . ,ϕp. 5.2.5 Other particular cases of transform Tp and comparison with known transforms (i) Optimal non-linear filtering. The transforms T̂p (36)–(37) and T̄p (63)–(65), which are particular cases of the transforms given in Theorems 1 and 2, represent optimal filters that perform pure filtering with no signal compression. Therefore they are important in their own right. (ii) The Fourier series as a particular case of transform T̄p. For the case of the minimization problem (6) with no constraint (7), F1, . . . ,Fp are determined by the expressions (36) and (63)–(65) which are similar to those for the Fourier coefficients [35]. The structure of the model Tp presented by (3) is different, of course, from that for the Fourier series and Fourier polynomial (i.e. a truncated Fourier series) in Hilbert space [35]. The differences are that Tp transforms y (not x as the Fourier polynomial does) and that Tp consists of a combination of three operators ϕk, Qk and Fk where Fk : L2(Ω, H̃k) → L2(Ω,HX) is an operator, not a scalar as in the Fourier series [35]. The solutions (36) and (63)–(65) of the unconstrained problem (6) are given in terms of the observed vector y, not in terms of the basis of x as in the Fourier series/polynomial. The special features of Tp require special computation methods as described in Section 5. Here, we show that the Fourier series is a particular case of the transform Tp. Let x ∈ L2(Ω,H) with H a Hilbert space, and let {v1,v2, . . .} be an orthonormal basis in L2(Ω,H). For any g,h ∈ L2(Ω,H), we define the scalar product 〈·, ·〉 and the norm ‖ · ‖E in L2(Ω,H) by 〈g,h〉 = ∫ Ω g(ω)h(ω)dµ(ω) and ‖g‖E = 〈g, g〉1/2, (69) respectively. In particular, if H = Rm then ‖g‖2 E = ∫ Ω g(ω)[g(ω)]T dµ(ω) = ∫ Ω ‖g(ω)‖2dµ(ω) = E[‖g‖2], (70) i.e. E[‖g‖2] is defined similarly to that in (4). Combined Reduced-Rank Transform 15 Let us consider the special case of transform Tp presented in item (iii) of Section 3.2 and let us also consider the unconstrained problem (6) formulated in terms of such a Tp where we now assume that x has the zero mean, f = O, p = ∞, {v1,v2, . . .} is an orthonormal basis in L2(Ω,H) and Fk is a scalar, not an operator as before. We denote αk = Fk with αk ∈ R. Then similar to (36) in Corollary 1, the solution to unconstrained problem (6) is defined by α̂k such that α̂k = Exvk with k = 1, 2, . . . . Here, Exvk = E[xvk] − E[x]E[vk] = E[xvk] = 〈x,vk〉 since E[x] = 0 by the assumption. Hence, α̂k = Exvk is the Fourier coefficient and the considered particular case of Tp(y) with Fk determined by α̂k is given by Tp(y) = ∞∑ k=1 〈x,vk〉vk. (71) Thus, the Fourier series (71) in Hilbert space follows from (3), (6) and (36) when Tp has the form given in item (iii) of Section 3.2 with x, f , p, {v1,v2, . . .} and Fk as above. (iii) The Wiener filter as a particular case of transform T̄p (63)–(65). In the following Corollaries 3 and 4, we show that the filter T̄p guarantees better accuracy than that of the Wiener filter. Corollary 3. Let p = 1, E[x] = 0, E[y] = 0, ϕ1 = I, Q1 = I and A1 = O or A1 = ExyE † yy. Then T̄p is reduced to the filter Ť such that [Ť (y)](ω) = Ť [y(ω)] with Ť = ExyE † yy. (72) Remark 7. The unconstrained linear filter, given by (72), has been proposed in [7]. The filter (72) is treated as a generalisation of the Wiener filter. Let x̃, ṽ1, . . . , ṽp be the zero mean vectors. The transform T̄p, applied to x̃, ṽ1, . . . , ṽp, is denoted by T̄W,p. Corollary 4. The error E[‖x̃ − T̄W,p(ỹ)‖2] associated with the transform T̄W,p is smaller than the error E[‖x̃− Ť (ỹ)‖2] associated with the Wiener filter [7] by p∑ k=2 ‖Ex̃ṽk (E1/2 ṽk ṽk )†‖2, i.e. E[‖x̃− T̄W,p(ỹ)‖2] = E[‖x̃− Ť (ỹ)‖2]− p∑ k=2 ‖Ex̃ṽk (E1/2 ṽk ṽk )†‖2. (73) Proof. It is easy to show that E[‖x̃− Ť (ỹ)‖2] = ‖E1/2 x̃x̃ ‖ 2 − ‖Ex̃ṽ1(E 1/2 ṽ1ṽ1 )†‖2, (74) and then (73) follows from (66) and (74). � (iv) The KLT as a particular case of transform T 0 p (43)–(46). The KLT [7] follows from (43)–(46) as a particular case if f = O, p = 1, ϕ1 = I, Q1 = I and A1 = O. To compare the transform T 0 p with the KLT [7], we apply T 0 p , represented by (43)–(46), to the zero mean vectors x̃, ṽ1, . . . , ṽp as above. We write T ∗p for such a version of T 0 p , and TKLT for the KLT [7]. 16 A. Torokhti and P. Howlett Corollary 5. The error E[‖x̃− T ∗p (ỹ)‖2] associated with the transform T ∗p is smaller than the error E[‖x̃− TKLT(ỹ)‖2] associated with the KLT [7] by p∑ k=2 ηk∑ j=1 β2 kj, i.e. E[‖x̃− T ∗p (ỹ)‖2] = E[‖x̃− TKLT(ỹ)‖2]− p∑ k=2 ηk∑ j=1 β2 kj . (75) Proof. The error associated with FKLT [7] is represented by (46) for p = 1, E[‖x̃− TKLT(ỹ)‖2] = ‖E1/2 x̃x̃ ‖ 2 − η1∑ j=1 β2 1j . (76) Then (75) follows from (46) and (76). � (v) The transform [26] as a particular case of transform T 0 p . The transform [26] follows from (3) as a particular case if f = O, p = 2, ϕ1(y) = y, ϕ2(y) = y2 and Q1 = Q2 = I where y2 is defined by y2(ω) = [y2 1, . . . , y 2 n]T . We note that transform [26] has been generalized in [27]. (vi) The transforms [27] as particular cases of transform Tp. The transform [27] follows from (3) if Qk = I, ϕk(y) = yk where yk = (y, . . . ,y) ∈ L2(Ω, Rnk), Rnk is the kth degree of Rn, and if Fk is a k-linear operator. To compare transform T 0 p and transform T[27] [27] of rank r, we write zj = yjy, z = [z1, . . . , zn]T , s = [1 yT zT ]T and denote by α1, . . . , αr the non-zero singular values associated with the truncated SVD for the matrix Exs(E 1/2 ss )†. Such a SVD is constructed similarly to that in (39)–(41). Corollary 6. Let ∆p = p∑ k=1 ηk∑ j=1 β2 kj− r∑ j=1 α2 j and let ∆p ≥ 0. The error E[‖x−T 0 p (y)‖2] associated with the transform T 0 p is less than the error E[‖x−T[27](y)‖2] associated with the transform T[27] by ∆p, i.e. E[‖x− T 0 p (y)‖2] = E[‖x− T[27](y)‖2]−∆p. (77) Proof. It follows from [27] that E[‖x− T[27](y)‖2] = ‖E1/2 xx ‖2 − r∑ j=1 α2 j . (78) Then (77) follows from (46) and (78). � We note that, in general, a theoretical verification of the condition ∆p ≥ 0 is not straightfor- ward. At the same time, for any particular x and y, ∆p can be estimated numerically. Although the transform T 0 p includes the transform T[27], the accuracy of T[27] is, in general, better than that of T 0 p for the same degrees of T 0 p and T[27]. This is because T[27] implies more terms. For instance, T[27] of degree two consists of n + 1 terms while T 0 p , for p = 2, consists of three terms only. If for a given p, the condition ∆p ≥ 0 is not fulfilled, then the accuracy E[‖x−T 0 p (y)‖2] can be improved by increasing p or by applying the iterative method presented in [28]. (vii) Unlike the techniques presented in [13, 14], our method implements simultaneous filtering and compression, and provides this data processing in probabilistic setting. The idea of implicitly mapping the data into a high-dimensional feature space [8, 12, 15] could be extended to the transform presented in this paper. We intend to develop such an extension in the future. Combined Reduced-Rank Transform 17 6 Numerical realization 6.1. Orthogonalization. Numerical realization of transforms of random vectors implies a representation of observed data and estimates of covariance matrices in the form of associated samples. For the random vector uk, we have q realizations, which are concatenated into n× q mat- rix Uk. A column of Uk is a realization of uk. Thus, a sequence of vectors u1, . . . ,up is represented by a sequence of matrices U1, . . . , Up. Therefore the transformation of u1, . . . ,up to orthonormal or orthogonal vectors v1, . . . ,vp (by Lemmata 1 and 3) is reduced to a procedure for matrices U1, . . . , Up and V1, . . . , Vp. Here, Vk ∈ Rn×q is a matrix formed from realizations of the random vector vk for each k = 1, . . . , p. Alternatively, matrices V1, . . . , Vp can be determined from known procedures for matrix or- thogonalization [43]. In particular, the QR decomposition [43] can be exploited in the following way. Let us form a matrix U = [UT 1 . . . UT p ]T ∈ Rnp×q where p and q are chosen such that np = q, i.e. U is square3. Let U = V R be the QR decomposition for U with V ∈ Rnp×q orthogonal and R ∈ Rnp×q upper triangular. Next, we write V = [V T 1 . . . V T p ]T ∈ Rmp×q where Vk ∈ Rn×q for k = 1, . . . , p. The submatrices V1, . . . , Vp of V are orthogonal, i.e. ViV T j = { O, i 6= j, I, i = j, for i, j = 1, . . . , p, as required. Other known procedures for matrix orthogonalization can be applied to U1, . . . , Up in a similar fashion. Remark 8. For the cases when v1, . . . ,vp are orthonormal or orthogonal but not orthonormal, the associated accuracies (21), (38), (46) and (66) differ for the factors depending on (E1/2 vkvk)†. In the case of orthonormal v1, . . . ,vp, (E1/2 vkvk)† = I and this circumstance can lead to an increase in the accuracy. 6.2. Covariance matrices. The expectations and covariance matrices in Lemmata 1–3 and Theorems 1–2 can be estimated, for example, by the techniques developed in [36, 37, 38, 39, 40, 41]. We note that such estimation procedures represent specific problems which are not considered here. 6.3. T 0 p , T̂p and T̄p for zero mean vectors. The computational work for T 0 p (Theorems 1 and 2), T̂p and T̄p (Corollaries 1 and 2) can be reduced if T 0 p , T̂p and T̄p are applied to the zero mean vectors x̃, ṽ1, . . . , ṽp given by x̃ = x−E[x], ṽ1 = v1 −E[v1], . . . , ṽp = vp −E[vp]. Then f0 = O and f̄ = O. The estimates of the original x are then given by x̌ = E[x] + p∑ k=1 F0 k (ṽk), x̂ = E[x] + p∑ k=1 F̂k(ṽk) and x̄ = E[x] + p∑ k=1 F̄k(ṽk) respectively. Here, F0 k , F̂k and F̄k are defined similarly to (20), (36), (44), (45), (64) and (65). 7 Discussion Some distinctive features of the proposed techniques are summarized as follows. Remark 9. It follows from Theorems 1 and 2, and Corollaries 1 and 2 that the accuracy associated with the proposed transform improvs when p increases. 3Matrix U can also be presented as U = [U1 . . . Up] with p and q such that n = pq. 18 A. Torokhti and P. Howlett Remark 10. Unlike the approaches based on Volterra polynomials [27, 29, 34] our method does not require computation of pseudo-inverses for large N×N matrices with N = n+n2+· · ·+np−1. Instead, the proposed transforms use pseudo-inverses of n × n matrix Evkvk . See Theorems 1 and 2. This leads to a substantial reduction in computational work. Remark 11. The idea of the recurrent transform [28] can be extended for the proposed trans- form in a way similar to that considered in [28]. The authors intend to develop a theory for such an extension in a feasible future. 8 Conclusions The new results obtained in the paper are summarized as follows. We have proposed a new approach to constructing optimal nonlinear transforms for random vectors. The approach is based on a representation of a transform in the form of the sum of p reduced-rank transforms. Each particular transform is formed by the linear reduced-rank oper- ator Fk, and by operators ϕk and Qk with k = 1, . . . , p. Such a device allows us to improve the numerical characteristics (accuracy, compression ration and computational work) of the known transforms based on the Volterra polynomial structure [27, 29, 34]. These objectives are achieved due to the special “intermediate” operators ϕ1, . . . ,ϕp and Q1, . . . ,Qp. In particular, we have proposed two types of orthogonalizing operators Q1, . . . ,Qp (Lemmata 1 and 3) and a specific method for determining ϕ1, . . . ,ϕp (Section 5.2.4). Such operators reduce the determination of optimal linear reduced-rank operators F0 1 , . . . ,F0 p to the computation of a sequence of relatively small matrices (Theorems 1 and 2). Particular cases of the proposed transform, which follow from the solution of the uncon- strained minimization problem (6), have been presented in Corollaries 1 and 2. Such transforms are treated as new optimal nonlinear filters and, therefore, are important in their own right. The explicit representations of the accuracy associated with the proposed transforms have been rigorously justified in Theorems 1 and 2, and Corollaries 1 and 2. It has been shown that the proposed approach generalizes the Fourier series in Hilbert space (Section 5.2.5), the Wiener filter, the Karhunen–Loève transform (KLT) and the known optimal transforms [26, 27, 29]. See Corollaries 3, 4 and 5, and Section 5.2.5 in this regard. In particular, it has been shown that the accuracies associated with the proposed transforms are better than those of the Wiener filter (Corollary 4) and the KLT (Corollary 5). A Appendix Proof of Lemma 1. Let us write w1 = u1 and wi = ui − i−1∑ k=1 Uik(wk) for i = 1, . . . , p, with Uik : L2(Ω, Rn) → L2(Ω, Rn) chosen so that, for k = 1, . . . , i− 1, Ewiwk = O if i 6= k. (79) We wish (79) is true for any k, i.e. Ewiwk = Ewiwk − E[wi]E[wT k ] = E [( ui − i−1∑ l=1 Uil(wl) ) wT k ] − E [( ui − i−1∑ l=1 Uil(wl) )] E[wT k ] Combined Reduced-Rank Transform 19 = Euiwk − UikEwkwk − E[ui]E[wT k ] + E[wk]E[wT k ] = Euiwk − UikEwkwk = O. Thus, Uik = Euiwk E−1 wkwk , and the statement (i) is true. It is clear that vectors v1, . . . ,vp, defined by (12), are orthogonal. For Qk, defined by (13), we have Qk = (E1/2 wkwk)−1 and Evkvk = E [ (E1/2 wkwk )−1wkw T k (E1/2 wkwk )−1 ] − E [ (E1/2 wkwk )−1wk]E[wT k (E1/2 wkwk )−1 ] = (E1/2 wkwk )−1Ewkwk (E1/2 wkwk )−1 = I. Hence, v1, . . . ,vp, defined by (12), are orthonormal. Proof of Lemma 3. We wish that Evivk = O for i 6= k. If Zik has been chosen so that this condition is true for all k = 1, . . . , i− 1 then we have E [( ui − i−1∑ l=1 Zil(vl) ) vT k ] = Euivk − i−1∑ l=1 ZilEvlvk = Euivk − ZikEvkvk = O. (80) Thus, ZikEvkvk = Euivk . (81) The necessary and sufficient condition [45] for the solution of the matrix equation (81) is given by Euivk E†vkvk Evkvk = Euivk . (82) By Lemma 2, (82) is true. Then, on the basis of [45], the general solution to (81) is given by (16). Acknowledgements The first co-author is grateful to Oliver Cappé for useful discussions related to the structure of the proposed transform. [1] Hotelling H., Analysis of a complex of statistical variables into Principal Components, J. Educ. Psychol., 1933, V.24, 417–441, 498–520. [2] Karhunen K., Über Lineare Methoden in der Wahrscheinlichkeitsrechnung, Ann. Acad. Sci. Fennicae, Ser. A, 1947, V.137. [3] Loève M., Fonctions aléatoires de second order, in P. Lévy, Processus Stochastiques et Mouvement Brownien, Paris, Hermann, 1948. [4] Jolliffe I.T., Principal component analysis, New York, Springer Verlag, 1986 (2 ed., 2002). [5] Scharf L.L., The SVD and reduced rank signal processing, Signal Processing, 1991, V.25, 113–133. [6] Yamashita Y., Ogawa H., Relative Karhunen–Loéve transform, IEEE Trans. on Signal Processing, 1996, V.44, 371–378. [7] Hua Y., Liu W.Q., Generalized Karhunen–Loève transform, IEEE Signal Processing Letters, 1998, V.5, 141–143. [8] Vapnik V., Statistical Learning Theory, Wiley, 1998. [9] Ocaña F.A., Aguilera A.M., Valderrama M.J., Functional principal componenets analysis by choice of norm, J. Multivariate Anal., 1999, V.71, 262–276. [10] Tipping M.E., Bishop C.M., Probabilistic principal component analysis, J. of the Royal Statistical Society, Ser. A, 1999, V.61, 611–619. [11] Tipping M.E., Bishop C.M., Mixtures of probabilistic principal component analysers, Neural Computation, 1999, V.11, 443–482. 20 A. Torokhti and P. Howlett [12] Schölkopf B., Smola A.J., Müller K.-R., Kernel principal component analysis, in Advances in Kernel Meth- ods. Support Vector Learning, Editors B. Schölkopf, C.J.C. Burges and A.J. Smola, Cambridge, MIT Press, 1999, 327–352. [13] Tenenbaum J.B., de Silva V., Langford J.C., A global geometric framework for nonlinear dimensionality reduction, Science, 2000, V.290, Issue 5500, 2319–2323. [14] Rowers S.T., Saul L.K., Nonlinear dimensionality reduction by locally linear embedding, Science, 2000, V.290, Issue 5500, 2323–2326. [15] Cristianini N., Shawe-Taylor J., An introduction to support vector machines and other kernel-based learning methods, Cambridge, Cambridge University Press, 2000. [16] Yamada I., Sekiguchi T., Sakaniwa K., Reduced rank Volterra filter for robust identification of nonlinear systems, in Proc. 2nd Int. Workshop on Multidimensional (ND) Systems – NDS2000, Poland, Czocha Castle, 2000, 171–175. [17] Hua Y., Nikpour M., Stoica P., Optimal reduced-rank estimation and filtering, IEEE Trans. on Signal Processing, 2001, V.49, 457–469. [18] Kneip A., Utikal K.J., Inference for density families using functional principal component analysis, Journal of the American Statistical Association, 2001, V.96, 519–542. [19] Honig M.L., Xiao W., Performance of reduced-rank linear interferrence suppression, IEEE Trans. on Infor- mation Theory, 2001, V.47, 1928–1946. [20] Chen W., Mitra U., Schniter P., On the equivalence of three rediced rank linear estimators with applications to DS-CDMA, IEEE Trans. on Information Theory, 2002, V.48, 2609–2614. [21] Honig M.L., Goldstein J.S., Adaptive reduced-rank interference suppression based on multistage Wiener filter, IEEE Trans. on Communications, 2002, V.50, 986–994. [22] Stock J.H., Watson M.W., Forecasting using principal components from a large number of predictors, Journal of the American Statistical Association, 2002, V.97, 1167–1179. [23] Fukunaga K., Introduction to statistical pattern recognition, Boston, Academic Press, 1990. [24] Kraut S., Anderson R.H., Krolik J.L., A generalized Karhunen–Loève basis for efficient estimation of tro- pospheric refractivity using radar clutter, IEEE Trans. on Signal Processing, 2004, V.52, 48–60. [25] Torokhti A., Howlett P., An optimal filter of the second order, IEEE Trans. on Signal Processing, 2001, V.49, 1044–1048. [26] Torokhti A., Howlett P., Optimal fixed rank transform of the second degree, IEEE Trans. on Circuits and Systems. Part II, Analog & Digital Signal Processing, 2001, V.48, 309–315. [27] Torokhti A., Howlett P., Pearce C., New perspectives on optimal transforms of random vectors, Optimization: Theory and Applications, to appear. [28] Torokhti A., Howlett P., Constructing fixed rank optimal estimators with method of recurrent best approx- imations, J. Multivariate Analysis, 2002, V.86, 293–309. [29] Torokhti A., Howlett P., Best operator approximation in modelling of nonlinear Systems, IEEE Trans. on Circuits and Systems. Part I, Fundamental Theory and Applications, 2002, V.49, 1792–1798. [30] Torokhti A., Howlett P., Method of recurrent best estimators of second degree for optimal filtering of random signals, Signal Processing, 2003, V.83, 1013–1024. [31] Torokhti A., Howlett P., Best causal mathematical models for a nonlinear system, IEEE Trans. on Circuits and Systems. Part I, Fundamental Theory and Applications, to appear. [32] Sontag E.D., Polynomial response maps, Lecture Notes in Control and Information Sciences, 1979. Vol. 13. [33] Chen S., Billings S.A., Representation of non-linear systems: NARMAX model, Int. J. Control, 1989, V.49, 1013–1032. [34] Howlett P.G., Torokhti A.P., Pearce C.E.M., A philosophy for the modelling of realistic non-linear systems, Proc. of Amer. Math. Soc., 2003, V.132, 353–363. [35] Cotlar M., Cignoli R., An introduction to functional analysis, Amsterdam – London, North-Holland Pub- lishing Company, 1974, 114–116. [36] Perlovsky L.I., Marzetta T.L., Estimating a covariance matrix from incomplete realizations of a random vector, IEEE Trans. on Signal Processing, 1992, V.40, 2097-2100. [37] Kauermann G., Carroll R.J., A note on the efficiency of Sandwich covariance matrix estimation, Journal of the American Statistical Association, 2001, V.96, 1387–1396. [38] Schneider M.K., Willsky A.S., A Krylov subspace method for covariance approximation and simulation of a random process and fields, Int. J. Multidim. Syst. & Signal Processing, 2003, V.14, 295–318. Combined Reduced-Rank Transform 21 [39] Kubokawa T., Srivastava M.S., Estimating the covariance matrix: a new approach, J. Multivariate Analysis, 2003, V.86, 28–47. [40] Ledoit O., Wolf M., A well-conditioned estimator for large-dimensional covariance matrices, J. Multivariate Analysis, 2004, V.88, 365–411. [41] Leung P.L., Ng F.Y., Improved estimation of a covariance matrix in an elliptically contoured matrix distri- bution, J. Multivariate Analysis, 2004, V.88, 131–137. [42] Higham N.J., Stable iterations for the matrix square root, Numerical Algorithms, 1997, V.15, 227–241. [43] Golub G.H., van Loan C.F., Matrix computations, Baltimore, Johns Hopkins University Press, 1996. [44] Kowalski M.A., Sikorski K.A., Stenger F., Selected topics in approximation and computations, New York – Oxford, Oxford University Press, 1995. [45] Ben-Israel A., Greville T.N.E., Generalized inverses: theory and applications, New York, John Wiley & Sons, 1974. [46] Mathews V.J., Sicuranza G.L., Polynomial signal processing, J. Wiley & Sons, 2001. [47] Goldstein J.S., Reed I., Scharf L.L., A multistage representation of the Wiener filter based on orthogonal projections, IEEE Trans. on Information Theory, 1998, V.44, 2943–2959. 1 Introduction 2 Method description 3 Structure of the proposed transform 3.1 Generic form 3.2 Some particular cases 4 Statement of the problem 5 Solution of the problem 5.1 Determination of orthogonalizing operators Q1,…,Qp 5.2 Determination of f0, F10, …, Fp0 satisfying (6)--(7) 5.2.1 The case when matrix Evivi is invertible for i=1,…,p 5.2.2 The case when matrix Evk vk is not invertible for k=1,…,p 5.2.3 Compression procedure by T0p 5.2.4 A special case of transform Tp 5.2.5 Other particular cases of transform Tp and comparison with known transforms 6 Numerical realization 7 Discussion 8 Conclusions A Appendix

Combined Reduced-Rank Transform

Institution

Similar Items