Combined Reduced-Rank Transform
We propose and justify a new approach to constructing optimal nonlinear transforms of random vectors. We show that the proposed transform improves such characteristics of rank-reduced transforms as compression ratio, accuracy of decompression and reduces required computational work. The proposed tra...
Saved in:
| Published in: | Symmetry, Integrability and Geometry: Methods and Applications |
|---|---|
| Date: | 2006 |
| Main Authors: | , |
| Format: | Article |
| Language: | English |
| Published: |
Інститут математики НАН України
2006
|
| Online Access: | https://nasplib.isofts.kiev.ua/handle/123456789/146176 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Journal Title: | Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| Cite this: | Combined Reduced-Rank Transform / A. Torokhti, P. Howlett // Symmetry, Integrability and Geometry: Methods and Applications. — 2006. — Т. 2. — Бібліогр.: 47 назв. — англ. |
Institution
Digital Library of Periodicals of National Academy of Sciences of Ukraine| id |
nasplib_isofts_kiev_ua-123456789-146176 |
|---|---|
| record_format |
dspace |
| spelling |
Torokhti, A. Howlett, P. 2019-02-07T20:20:34Z 2019-02-07T20:20:34Z 2006 Combined Reduced-Rank Transform / A. Torokhti, P. Howlett // Symmetry, Integrability and Geometry: Methods and Applications. — 2006. — Т. 2. — Бібліогр.: 47 назв. — англ. 1815-0659 2000 Mathematics Subject Classification: 41A29 https://nasplib.isofts.kiev.ua/handle/123456789/146176 We propose and justify a new approach to constructing optimal nonlinear transforms of random vectors. We show that the proposed transform improves such characteristics of rank-reduced transforms as compression ratio, accuracy of decompression and reduces required computational work. The proposed transform Tp is presented in the form of a sum with p terms where each term is interpreted as a particular rank-reduced transform. Moreover, terms in Tp are represented as a combination of three operations Fk, Qk and φk with k = 1,...,p. The prime idea is to determine Fk separately, for each k = 1,...,p, from an associated rank-constrained minimization problem similar to that used in the Karhunen-Loève transform. The operations Qk andφk are auxiliary for finding Fk. The contribution of each term in Tp improves the entire transform performance. A corresponding unconstrained nonlinear optimal transform is also considered. Such a transform is important in its own right because it is treated as an optimal filter without signal compression. A rigorous analysis of errors associated with the proposed transforms is given. The first co-author is grateful to Oliver Capp´e for useful discussions related to the structure of the proposed transform. en Інститут математики НАН України Symmetry, Integrability and Geometry: Methods and Applications Combined Reduced-Rank Transform Article published earlier |
| institution |
Digital Library of Periodicals of National Academy of Sciences of Ukraine |
| collection |
DSpace DC |
| title |
Combined Reduced-Rank Transform |
| spellingShingle |
Combined Reduced-Rank Transform Torokhti, A. Howlett, P. |
| title_short |
Combined Reduced-Rank Transform |
| title_full |
Combined Reduced-Rank Transform |
| title_fullStr |
Combined Reduced-Rank Transform |
| title_full_unstemmed |
Combined Reduced-Rank Transform |
| title_sort |
combined reduced-rank transform |
| author |
Torokhti, A. Howlett, P. |
| author_facet |
Torokhti, A. Howlett, P. |
| publishDate |
2006 |
| language |
English |
| container_title |
Symmetry, Integrability and Geometry: Methods and Applications |
| publisher |
Інститут математики НАН України |
| format |
Article |
| description |
We propose and justify a new approach to constructing optimal nonlinear transforms of random vectors. We show that the proposed transform improves such characteristics of rank-reduced transforms as compression ratio, accuracy of decompression and reduces required computational work. The proposed transform Tp is presented in the form of a sum with p terms where each term is interpreted as a particular rank-reduced transform. Moreover, terms in Tp are represented as a combination of three operations Fk, Qk and φk with k = 1,...,p. The prime idea is to determine Fk separately, for each k = 1,...,p, from an associated rank-constrained minimization problem similar to that used in the Karhunen-Loève transform. The operations Qk andφk are auxiliary for finding Fk. The contribution of each term in Tp improves the entire transform performance. A corresponding unconstrained nonlinear optimal transform is also considered. Such a transform is important in its own right because it is treated as an optimal filter without signal compression. A rigorous analysis of errors associated with the proposed transforms is given.
|
| issn |
1815-0659 |
| url |
https://nasplib.isofts.kiev.ua/handle/123456789/146176 |
| citation_txt |
Combined Reduced-Rank Transform / A. Torokhti, P. Howlett // Symmetry, Integrability and Geometry: Methods and Applications. — 2006. — Т. 2. — Бібліогр.: 47 назв. — англ. |
| work_keys_str_mv |
AT torokhtia combinedreducedranktransform AT howlettp combinedreducedranktransform |
| first_indexed |
2025-11-25T10:17:53Z |
| last_indexed |
2025-11-25T10:17:53Z |
| _version_ |
1850512476929523712 |
| fulltext |
Symmetry, Integrability and Geometry: Methods and Applications Vol. 2 (2006), Paper 039, 21 pages
Combined Reduced-Rank Transform
Anatoli TOROKHTI and Phil HOWLETT
School of Mathematics and Statistics, University of South Australia, Australia
E-mail: anatoli.torokhti@unisa.edu.au, phil.howlett@unisa.edu.au
URL: http://people.unisa.edu.au/Anatoli.Torokhti
http://people.unisa.edu.au/Phil.Howlett
Received November 25, 2005, in final form March 22, 2006; Published online April 07, 2006
Original article is available at http://www.emis.de/journals/SIGMA/2006/Paper039/
Abstract. We propose and justify a new approach to constructing optimal nonlinear trans-
forms of random vectors. We show that the proposed transform improves such characteristics
of rank-reduced transforms as compression ratio, accuracy of decompression and reduces re-
quired computational work. The proposed transform Tp is presented in the form of a sum
with p terms where each term is interpreted as a particular rank-reduced transform. More-
over, terms in Tp are represented as a combination of three operations Fk, Qk and ϕk with
k = 1, . . . , p. The prime idea is to determine Fk separately, for each k = 1, . . . , p, from an as-
sociated rank-constrained minimization problem similar to that used in the Karhunen–Loève
transform. The operations Qk and ϕk are auxiliary for finding Fk. The contribution of each
term in Tp improves the entire transform performance. A corresponding unconstrained non-
linear optimal transform is also considered. Such a transform is important in its own right
because it is treated as an optimal filter without signal compression. A rigorous analysis of
errors associated with the proposed transforms is given.
Key words: best approximation; Fourier series in Hilbert space; matrix computation
2000 Mathematics Subject Classification: 41A29
1 Introduction
Methods of data dimensionality reduction [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24] have been applied successfully to many applied problems. The diversity of
applications has stimulated a considerable increase in the study of data dimensionality reduction
in recent decades. Significant recent results in this challenging research area are described, in
particular, in references [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24].
The known methods concern both a probabilistic setting (as in [5, 6, 7, 8, 9, 10, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28]) and deterministic setting (as in [12, 13, 14, 15]) in the
dimensionality reduction. The associated techniques are often based on the use of reduced-rank
operators.
In this paper, a further advance in the development of reduced-rank transforms is presented.
We study a new approach to data dimensionality reduction in a probabilistic setting based on
the development of ideas presented in [5, 6, 7, 26, 27, 28, 29].
Motivation for the proposed approach arises from the following observation. In general, the
reduced-rank transform consists of the three companion operations which are filtering, compres-
sion and reconstruction [5, 6, 7, 16, 26]. Filtering and compression are performed simultaneously
to estimate a reference signal x with m components from noisy observable data y and to filter
and reduce the data to a shorter vector x̂ with η components, η < m. Components of x̂ are
often called principal components [4]. The quotient η/m is called the compression ratio. Re-
construction returns a vector x̃ with m components so that x̃ should be close to the original x.
mailto:anatoli.torokhti@unisa.edu.au
mailto:phil.howlett@unisa.edu.au
http://people.unisa.edu.au/Anatoli.Torokhti
http://people.unisa.edu.au/Phil.Howlett
http://www.emis.de/journals/SIGMA/2006/Paper039/
2 A. Torokhti and P. Howlett
It is natural to perform these three operations so that the reconstruction error and the related
computational burden are minimal.
As a result, the performance of the reduced-rank transform is characterized by three issues
which are (i) associated accuracy, (ii) compression ratio, and (iii) computational work.
For a given compression ratio, the Karhunen–Loève transform (KLT) [5, 6, 7] minimizes the
reconstruction error over the class of all linear reduced-rank transforms. Nevertheless, it may
happen that the accuracy and compression ratio associated with the KLT are still not satisfac-
tory. In such a case, an improvement in the accuracy and compression ratio can be achieved by
a transform with a more general structure than that of the KLT. Special non-linear transforms
have been studied in [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]
using transform structures developed from the generalised Volterra polynomials. Nevertheless,
the transforms [16, 26, 27, 28, 29] imply a substantial computational burden associated with the
large number N of terms required by the underlying Volterra polynomial structure.
Our objective is to justify a new transform that may have both accuracy and compression
ratio better than those of the known transforms [5, 6, 7, 26, 27, 28, 29]. A related objective is
to find a way to reduce the associated computational work compared with that implied by the
transforms [26, 27, 28, 29]. The analysis of these issues is given in Sections 4, 5.2.2 (Remark 4),
5.2.3 and 5.2.4.
In Section 5.2.5, we show that the proposed approach generalizes the Fourier series in Hilbert
space, the Wiener filter, the Karhunen–Loève transform and the transforms given in [26, 27, 29].
2 Method description
We use the following notation:
(Ω,Σ, µ) is a probability space, where Ω = {ω} is the set of outcomes, Σ a σ-field of mea-
surable subsets of Ω and µ : Σ → [0, 1] an associated probability measure on Σ with µ(Ω) = 1;
x ∈ L2(Ω, Rm) and y ∈ L2(Ω, Rn) are random vectors with realizations x = x(ω) ∈ Rm and
y = y(ω) ∈ Rn, respectively.
Each matrix M ∈ Rm×n defines a bounded linear transformationM : L2(Ω, Rn) → L2(Ω, Rm)
via the formula [My](ω) = My(ω) for each ω ∈ Ω. We note that there are many bounded
linear transformations from L2(Ω, Rn) into L2(Ω, Rm) that cannot be written in the form
[My](ω) = My(ω) for each ω ∈ Ω. A trivial example is A : L2(Ω, Rn) → L2(Ω, Rm) given
by A(y) =
∫
Ω
y(ω)dµ(ω).
Throughout the paper, the calligraphic character letters denote operators defined similarly
to M.
Let g = [g1 . . . gm]T ∈ L2(Ω, Rm) and h = [h1 . . .hn]T ∈ L2(Ω, Rn) be random vectors with
gi,hk ∈ L2(Ω, R) for i = 1, . . . ,m, k = 1, . . . , n. For all i = 1, . . . ,m and k = 1, . . . , n, we set
E[gi] =
∫
Ω
gi(ω)dµ(ω), E[gihk] =
∫
Ω
gi(ω)hk(ω)dµ(ω), (1)
Egh = E
[
ghT
]
= {E[gihk]} ∈ Rm×n and Eg = E[g] = {E[gi]} ∈ Rm. (2)
We also write
Egh = E
[
(g − Eg)(h− Eh)T
]
= Egh − E[g]E
[
hT
]
.
Achievement of the above objectives is based on the presentation of the proposed transform in
the form of a sum with p terms (3) where each term is interpreted as a particular rank-reduced
transform. Moreover, terms in (3) are represented as a combination of three operations Fk,
Qk and ϕk for each k = 1, . . . , p, where ϕk is nonlinear. The prime idea is to determine Fk
Combined Reduced-Rank Transform 3
separately, for each k = 1, . . . , p, from an associated rank-constrained minimization problem
similar to that in the KLT. The operations Qk and ϕk are auxiliary for finding Fk. It is natural
to expect that a contribution of each term in (3) will improve the entire transform performance.
To realize such a scheme, we choose the Qk as orthogonal/orthonormal operators (see Sec-
tion 3). Then each Fk can be determined independently for each individual problem (33)
or (56) below. Next, operators ϕk are used to reduce the number of terms from N (as in
[16, 26, 27, 28, 29]) to p with p � N . For example, this can be done when we choose ϕk in the
form presented in Section 5.2.4. Moreover, the composition of operators Qk and ϕk allows us to
reduce the related covariance matrices to the identity matrix or to a block-diagonal form with
small blocks. Remark 4 in Section 5.2.2 gives more details in this regard. The computational
work associated with such blocks is much less than that for the large covariance matrices in
[16, 26, 27, 28, 29].
To regulate accuracy associated with the proposed transform and its compression ratio, we
formulate the problem in the form (6)–(7) where (7) consists of p constraints. It is shown
in Remark 2 of Section 4, and in Sections 5.2.1, 5.2.2 and 5.2.4 that such a combination of
constraints allows us to equip the proposed transforms with several degrees of freedom.
The structure of our transform is presented in Section 3 and the formal statement of the
problem in Section 4. In Section 5, we determine operators Qk and Fk (Lemmata 1 and 3, and
Theorems 1 and 2, respectively).
3 Structure of the proposed transform
3.1 Generic form
The proposed transform Tp is presented in the form
Tp(y) = f +
p∑
k=1
FkQkϕk(y) = f + F1Q1ϕ1(y) + · · ·+ FpQpϕp(y), (3)
where f ∈ Rm, ϕk : L2(Ω, Rn) → L2(Ω, Rn), Q1, . . . ,Qp : L2(Ω, Rn) → L2(Ω, Rn) and Fk :
L2(Ω, Rn) → L2(Ω, Rm).
In general, one can put x ∈ L2(Ω,HX), y ∈ L2(Ω,HY ), ϕk : L2(Ω,HY ) → L2(Ω,Hk),
Qk : L2(Ω,Hk) → L2(Ω, H̃k) and Fk : L2(Ω, H̃k) → L2(Ω,HX) with HX , HY , Hk and H̃k
separable Hilbert spaces, and k = 1, . . . , p.
In (3), the vector f and operators F1, . . . ,Fp are determined from the minimization problem
(6)–(7) given in the Section 4. Operators Q1, . . . ,Qp in (3) are orthogonal (orthonormal) in the
sense of the Definition 1 in Section 4 (in this regard, see also Remark 3 in Section 5.1).
To demonstrate and justify flexibility of the transform Tp with respect to the choice of
ϕ1, . . . ,ϕp in (3), we mainly study the case where ϕ1, . . . ,ϕp are arbitrary. Specifications of
ϕ1, . . . ,ϕp are presented in Sections 3.2, 5.2.4 and 5.2.5 where we also discuss the benefits
associated with some particular forms of ϕ1, . . . ,ϕp.
3.2 Some particular cases
Particular cases of the model Tp are associated with specific choices of ϕk, Qk and Fk. Some
examples are given below.
(i) If HX = HY = Rn and Hk = H̃k = Rnk where Rnk is the kth degree of Rn, then (3)
generalises the known transform structures [16, 26, 27, 28, 29]. The models [16, 26, 27, 28, 29]
follow from (3) if ϕk(y) = yk where yk = (y, . . . ,y) ∈ L2(Ω, Rnk), Qk = I, where I is the
identity operator, and if Fk is a k-linear operator. It has been shown in [16, 26, 27, 28, 29] that
4 A. Torokhti and P. Howlett
such a form of ϕk leads to a significant improvement in the associated accuracy. See Section 5.2.5
for more details.
(ii) If ϕk : L2(Ω,HY ) → L2(Ω,HX) and {u1,u2, . . .} is a basis in L2(Ω,HX) then ϕk and Qk
can be chosen so that ϕk(y) = uk and Qk = I, respectively. As a result, in this particular case,
Tp(y) = f +
p∑
k=1
Fk(uk).
(iii) A similar case follows if ϕk : L2(Ω,HY ) → L2(Ω,Hk) is arbitrary but Qk : L2(Ω,Hk) →
L2(Ω, H̃k) is defined so that Qk[ϕk(y)] = vk with k = 1, . . . , p where {v1,v2, . . .} is a basis in
L2(Ω, H̃k). Then Tp(y) = f +
p∑
k=1
Fk(vk).
(iv) Let x̃(1), . . . , x̃(p) be estimates of x by the known transforms [7, 25, 30]. Then we can put
ϕ1(y) = x̃(1), . . . , ϕp(y) = x̃(p). In particular, one could choose ϕ1(y) = y. In such a way, the
vector x is pre-estimated from y, and therefore, the overall x estimate by Tp will be improved.
A new recursive method for finding x̃(1), . . . , x̃(p) is given in Section 5.2.4 below.
Other particular cases of the proposed transform are considered in Sections 5.2.4 and 5.2.5.
Remark 1. The particular case of Tp considered in the item (iii) above can be interpreted as
an operator form of the Fourier polynomial in Hilbert space [35]. The benefits associated with
the Fourier polynomials are well known. In item (ii) of Section 5.2.5, this case is considered in
more detail.
4 Statement of the problem
First, we define orthogonal and orthonormal operators as follows.
Definition 1. Let uk ∈ L2(Ω, Rn) and vk = Qk(uk). The operators Q1, . . . ,Qp are called
pairwise orthonormal if Evivj =
{
O, i 6= j,
I, i = j
for any i, j = 1, . . . , p. Here, O and I are the zero
matrix and identity matrix, respectively. If Evivj = O for i 6= j with i, j = 1, . . . , p, and if
Evivj is not necessarily equal to I for i = j then Q1, . . . ,Qp are called pairwise orthogonal.
Hereinafter, we suppose that Fk is linear for all k = 1, . . . , p and that the Hilbert spaces are
the finite dimensional Eucledian spaces, HX = Rm and HY = Hk = H̃k = Rn. For any vector
g ∈ L2(Ω, Rm), we set
E[‖g‖2] =
∫
Ω
‖g(ω)‖2dµ(ω) < ∞, (4)
where ‖g(ω)‖ is the Euclidean norm of g(ω).
Let us denote
J(f,F1, . . .Fp) = E
[
‖x− Tp(y)‖2
]
. (5)
The problem is
(i) to find operators Q1, . . . ,Qp satisfying Definition 1, and
(ii) to determine the vector f0 and operators F0
1 , . . . ,F0
p such that
J(f0,F0
1 , . . .F0
p ) = min
f,F1,...,Fp
J(f,F1, . . . ,Fp) (6)
subject to
rankF1 = η1, . . . , rankFp = ηp, (7)
where η1 + · · ·+ ηp = η ≤ min{m,n}.
Combined Reduced-Rank Transform 5
Here, for k = 1, . . . , p, (see, for example, [44])
rank(Fk) = dimFk(L2(Ω, Rn)).
We write
T 0
p (y) = f0 +
p∑
k=1
F0
k (vk) (8)
with vk defined by Definition 1.
It is supposed that covariance matrices formed from vectors Q1ϕ1(y), . . . ,Qpϕp(y) in (3) are
known or can be estimated. Various estimation methods can be found in [36, 37, 38, 39, 40, 41].
We note that such an assumption is traditional [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
in the study of optimal transforms. The effective estimate of covariance matrices represents a
specific task [36, 37, 38, 39, 40, 41] which is not considered in this paper.
Remark 2. Unlike known rank-constrained problems, we consider p constraints (7). The num-
ber p of the constraints and the ranks η1, . . . , ηp form the degrees of freedom for T 0
p . Variation
of p and η1, . . . , ηp allows us to regulate accuracy associated with the transform T 0
p (see (21) in
Section 5.2.1 and (46) in Section 5.2.2) and its compression ratio (see (67) in Section 5.2.4). It
follows from (21) and (46) that the accuracy increases if p and η1, . . . , ηp increase. Conversely,
by (67), the compression ratio is improved if η1, . . . , ηp decrease.
5 Solution of the problem
The problem (6)–(7) generalises the known rank-constrained problems where only one constraint
has been considered. Our plan for the solution is as follows. First, in Section 5.1, we will
determine the operators Q1, . . . ,Qp. Then, in Section 5.2, we will obtain f0 and F0
1 , . . . ,F0
p
satisfying (6) and (7).
5.1 Determination of orthogonalizing operators Q1, . . . , Qp
If M is a square matrix then we write M1/2 for a matrix such that M1/2M1/2 = M. We note
that the matrix M1/2 can be computed in various ways [42]. In this paper, M1/2 is determined
from the singular value decomposition (SVD) [43] of M .
For the case when matrix Evkvk
is invertible for any k = 1, . . . , p, the orthonormalization
procedure is as follows. For uk ∈ L2(Ω, Rn), we write
[Qk(uk)](ω) = Qkuk(ω), (9)
where Qk ∈ Rn×n. For uk,vj ,wj ∈ L2(Ω, Rn), we also define operators Eukvj , E−1
vjvj
: L2(Ω, Rn)→
L2(Ω, Rn) by the equations
[Eukvj (wj)](ω) = Eukvjwj(ω) and [E−1
vjvj
(wj)](ω) = E−1
vjvj
wj(ω), (10)
respectively.
Lemma 1. Let
w1 = u1 and wi = ui −
i−1∑
k=1
Euiwk
E−1
wkwk
(wk) for i = 1, . . . , p, (11)
where E−1
wkwk
exists. Then
6 A. Torokhti and P. Howlett
(i) the vectors w1, . . . ,wp are pairwise orthogonal, and
(ii) the vectors v1, . . . ,vp, defined by
vi = Qi(ui) (12)
with
Qi(ui) =
(
E1/2
wiwi
)−1(wi) (13)
for i = 1, . . . , p, are pairwise orthonormal.
Proof. The proof is given in the Appendix. �
For the case when matrix Evkvk
is singular for k = 1, . . . , p, the orthogonalizing operators
Q1, . . . ,Qp are determined by Lemma 3 below. Another difference from Lemma 1 is that the
vectors v1, . . . ,vp in Lemma 3 are pairwise orthogonal but not orthonormal. An intermediate
result is given in Lemma 2.
The symbol † is used to denote the pseudo-inverse operator [45]. It is supposed that the
pseudo-inverse M † for matrix M is determined from the SVD of M .
Lemma 2 ([26]). For any random vectors g ∈ L2(Ω, Rm) and h ∈ L2(Ω, Rn),
EghE†hhEhh = Egh. (14)
Lemma 3. Let vi = Qi(ui) for i = 1, . . . , p, where Q1, . . . ,Qp are such that
Q1(u1) = u1 and Qi(ui) = ui −
i−1∑
k=1
Zik(vk) for i = 2, . . . , p (15)
with Zik : L2(Ω, Rn) → L2(Ω, Rn) defined by
Zik = Euivk
E†vkvk
+ Aik(I − Evkvk
E†vkvk
) (16)
with Aik ∈ Rn×n arbitrary. Then the vectors v1, . . . ,vp are pairwise orthogonal.
Proof. The proof is given in the Appendix. �
We note that Lemma 3 does not require invertibility of matrix Evkvk
. At the same time, if
E−1
vkvk
exists, then vectors w1, . . . ,wp and v1, . . . ,vp defined by (11) and Lemma 3 respectively,
coincide.
Remark 3. Orthogonalization of random vectors is not, of course, a new idea. In particular,
generalizations of the Gram–Schmidt orthogonalization procedure have been considered in [46,
47]. The proposed orthogonalization procedures in Lemmata 1 and 3 are different from those
in [46, 47]. In particular, Lemma 3 establishes the vector orthogonalization in terms of pseudo-
inverse operators. A particular case of the practical implementation of the random vector
orthogonalization is considered in Section 6.
Combined Reduced-Rank Transform 7
5.2 Determination of f0, F0
1 , . . . , F0
p satisfying (6)–(7)
5.2.1 The case when matrix Evivi is invertible for i = 1, . . . , p
We consider the simpler case when Evivi is invertible for all i = 1, . . . , p. Then the vector f0
and operators F0
1 , . . . ,F0
p satisfying (6)–(7) are defined from the following Theorem 1. For each
i = 1, . . . , p, let UiΣiV
T
i be the SVD of Exvi ,
UiΣiV
T
i = Exvi , (17)
where Ui ∈ Rm×n, Vi ∈ Rn×n are orthogonal and Σi ∈ Rn×n is diagonal,
Ui = [si1, . . . , sin], Vi = [di1, . . . , din] and Σi = diag (αi1, . . . , αin) (18)
with αi1 ≥ · · · ≥ αir > 0, αi,r+1 = · · · = αin = 0 and r = 1, . . . , n where r = r(i). We set
Uiηi = [si1, . . . , siηi ], Viηi = [di1, . . . , diηi ] and Σiηi = diag(αi1, . . . , αiηi),
where Uiηi ∈ Rm×ηi , Viηi ∈ Rn×ηi and Σiηi ∈ Rηi×ηi . Now we define Kiηi ∈ Rm×n and Kiηi :
L2(Ω, Rn) → L2(Ω, Rm) by
Kiηi = UiηiΣiηiV
T
iηi
and [Kiηi(wi)](ω) = Kiηi [wi(ω)], (19)
respectively, for any wi ∈ L2(Ω, Rn).
Theorem 1. Let v1, . . . ,vp be determined by Lemma 1. Then the vector f0 and operators
F0
1 , . . . ,F0
p , satisfying (6)–(7), are determined by
f0 = E[x]−
p∑
k=1
F 0
k E[vk] and F0
1 = K1η1 , . . . , F0
p = Kpηp . (20)
The accuracy associated with transform T 0
p , determined by (8) and (20), is given by
E[‖x− T 0
p (y)‖2] = ‖E1/2
xx ‖2 −
p∑
k=1
ηk∑
j=1
α2
kj . (21)
Proof. The functional J(f,F1, . . . ,Fp) is written as
J(f,F1, . . . ,Fp) = tr
[
Exx − E[x]fT −
p∑
i=1
ExviF
T
i − fE[xT ] + ffT + f
p∑
i=1
E[vT
i ]F T
i
−
p∑
i=1
FiEvix +
p∑
i=1
FiE[vi]fT + E
(
p∑
i=1
Fi(vi)
[
p∑
k=1
Fi(vi)
]T)]
. (22)
We remind (see Section 2) that here and below, Fi is defined by [Fi(vi)](ω) = Fi[vi(ω)] so that,
for example, E[Fk(vk)xT
k ] = FkEvkxk
. In other words, the right hand side in (22) is a function
of f , F1, . . . ,Fp.
Let us show that J(f,F1, . . . ,Fp) can be represented as
J(f,F1, . . . ,Fp) = J0 + J1 + J2, (23)
where
J0 = ‖E1/2
xx ‖2 −
p∑
i=1
‖Exvi‖2, (24)
8 A. Torokhti and P. Howlett
J1 = ‖f − E[x] +
p∑
i=1
FiE[vi]‖2 and J2 =
p∑
i=1
‖Fi − Exvi‖2. (25)
Indeed, J1 and J2 are rewritten as follows
J1 = tr
(
ffT − fE[xT ] +
p∑
i=1
fE[vT
i ]Fi + E[x]E[xT ]− E[x]fT −
p∑
i=1
E[x]E[vT
i ]F T
i
+
p∑
i=1
FiE[vi]fT −
p∑
i=1
FiE[vi]E[xT ] +
p∑
i=1
FiE[vi]
p∑
k=1
E[vT
k ]F T
k
)
(26)
and
J2 =
p∑
i=1
tr (Fi − Exvi)(F
T
i − Evix) =
p∑
i=1
tr (FiF
T
i − FiEvix − ExviF
T
i + ExviEvix). (27)
In (27),
p∑
i=1
tr (FiF
T
i ) can be represented in the form
p∑
i=1
tr (FiF
T
i ) = tr
[
E
(
p∑
i=1
Fivi
p∑
k=1
vT
k F T
k
)]
− tr
(
p∑
i=1
FiE[vi]
p∑
k=1
E[vT
k ]F T
k
)
(28)
because
E[viv
T
k ]− E[vi]E[vT
k ] =
{
O, i 6= k,
I, i = k
(29)
due to the orthonormality of vectors v1, . . . ,vp.
Then
J0 + J1 + J2 = tr(Exx − E[x]E[xT ])−
p∑
i=1
tr[ExviEvix] (30)
+ tr
(
ffT − fE[xT ] +
p∑
i=1
fE[vT
i ]Fi + E[x]E[xT ]− E[x]fT
−
p∑
i=1
E[x]E[vT
i ]F T
i +
p∑
i=1
FiE[vi]fT −
p∑
i=1
FiE[vi]E[xT ]
+
p∑
i=1
FiE[vi]
p∑
k=1
E[vT
k ]F T
k
)
+ tr
[
E
(
p∑
i=1
Fivi
p∑
k=1
vT
k F T
k
)]
− tr
(
p∑
i=1
FiE[vi]
p∑
k=1
E[vT
k ]F T
k
)
−
p∑
i=1
tr(FiEvix − FiE[vi]E[xT ] + ExviF
T
i − E[x]E[vT
i ]F T
i − ExviEvix)
= J(f,F1, . . . ,Fp). (31)
Hence, (23) is true. Therefore,
J(f,F1, . . . ,Fp) = ‖E1/2
xx ‖2 −
p∑
k=1
‖Exvk
‖2 + ‖f − E[x] +
p∑
k=1
FkE[vk]‖2
Combined Reduced-Rank Transform 9
+
p∑
k=1
‖Fk − Exvk
‖2. (32)
It follows from (32) that the constrained minimum (6)–(7) is achieved if f = f0 with f0 given
by (20), and if F 0
k is such that
Jk(F 0
k ) = min
Fk
Jk(Fk) subject to rank (Fk) = ηk, (33)
where Jk(Fk) = ‖Fk − Exvk
‖2. The solution to (33) is given [43] by
F 0
k = Kkηk
. (34)
Then
E[‖x− T 0
p (y)‖2] = ‖E1/2
xx ‖2 −
p∑
k=1
(‖Exvk
‖2 − ‖Kkηk
− Exvk
‖2).
Here [43],
‖Exvk
‖2 =
r∑
j=1
α2
kj and ‖Kkηk
− Exvk
‖2 =
r∑
j=ηk+1
α2
kj (35)
with r = r(k). Thus, (21) is true. The theorem is proved. �
Corollary 1. Let v1, . . . ,vp be determined by Lemma 1. Then the vector f̂ and operators
F̂1, . . . , F̂p satisfying the unconstrained problem (6), are determined by
f̂ = E[x]−
p∑
k=1
F̂kE[vk] and F̂1 = Exv1 , . . . , F̂p = Exvp (36)
with F̂k such that [F̂k(vk)](ω) = F̂kvk(ω) where F̂k ∈ Rn×m and k = 1, . . . , p.
The accuracy associated with transform T̂p given by
T̂p(y) = f̂ +
p∑
k=1
F̂k(vk) (37)
is such that
E[‖x− T̂p(y)‖2] = ‖E1/2
xx ‖2 −
p∑
k=1
‖Exvk
‖2. (38)
Proof. The proof follows directly from (32). �
5.2.2 The case when matrix Evkvk is not invertible for k = 1, . . . , p
We write Ak ∈ Rm×n for an arbitrary matrix, and define operators Ak : L2(Ω, Rn) → L2(Ω, Rm)
and Evkvk
, E†vkvk , (E1/2
vkvk)† : L2(Ω, Rn) → L2(Ω, Rn) similarly to those in (9) and (10).
For the case under consideration (matrix Evkvk
is not invertible), we introduce the SVD of
Exvk
(E1/2
vkvk)†,
UkΣkV
T
k = Exvk
(E1/2
vkvk
)†, (39)
10 A. Torokhti and P. Howlett
where, as above, Uk ∈ Rm×n, Vk ∈ Rn×n are orthogonal and Σk ∈ Rn×n is diagonal,
Uk = [sk1, . . . , skn], Vk = [dk1, . . . , dkn] and Σk = diag(βk1, . . . , βkn) (40)
with βk1 ≥ · · · ≥ βkr > 0, βk,r+1 = · · · = βkn = 0, r = 1, . . . , n and r = r(k).
Let us set
Ukηk
= [sk1, . . . , skηk
], Vkηk
= [dk1, . . . , dkηk
] and
Σkηk
= diag (βk1, . . . , βkηk
), (41)
where Ukηk
∈ Rm×ηk , Vkηk
∈ Rn×ηk and Σkηk
∈ Rηk×ηk . Now we define Gkηk
∈ Rm×n and
Gkηk
: L2(Ω, Rn) → L2(Ω, Rm) by
Gkηk
= Ukηk
Σkηk
V T
kηk
and [Gkηk
(wk)](ω) = Gkηk
[wk(ω)], (42)
respectively, for any wk ∈ L2(Ω, Rn).
As noted before, we write I for the identity operator.
Theorem 2. Let v1, . . . ,vp be determined by Lemma 3. Then f0 and F0
1 , . . . ,F0
p , satisfying
(6)–(7), are determined by
f0 = E[x]−
p∑
k=1
F 0
k E[vk] (43)
and
F0
1 = G1η1(E1/2
v1v1
)† +A1[I − E1/2
v1v1
(E1/2
v1v1
)†], (44)
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
F0
p = Gpηp(E1/2
vpvp
)† +Ap[I − E1/2
vpvp
(E1/2
vpvp
)†], (45)
where for k = 1, . . . , p, Ak is any linear operator such that rankF0
k ≤ ηk
1.
The accuracy associated with transform T 0
p given by (8) and (43)–(45) is such that
E[‖x− T 0
p (y)‖2] = ‖E1/2
xx ‖2 −
p∑
k=1
ηk∑
j=1
β2
kj . (46)
Proof. For v1, . . . ,vp determined by Lemma 3, J(f,F1, . . . ,Fp) is represented by (22) as well.
Let us consider J0, J1 and J2 given by
J0 = ‖E1/2
xx ‖2 −
p∑
k=1
‖Exvk
(E1/2
vkvk
)†‖2, (47)
J1 = ‖f − E[x] +
p∑
k=1
FkE[vk]‖2 and J2 =
p∑
k=1
‖FkE1/2
vkvk
− Exvk
(E1/2
vkvk
)†‖2. (48)
To show that
J(f,F1, . . . ,Fp) = J0 + J1 + J2 (49)
with J(f,F1, . . . ,Fp) defined by (22), we use the relationships (see [26])
Exvk
E†vkvk
Evkvk
= Exvk
and E†vkvk
E1/2
vkvk
= (E1/2
vkvk
)† (50)
1In particular, Ak can be chosen as the zero operator.
Combined Reduced-Rank Transform 11
Then
J1 = tr
(
ffT − fE[xT ] +
p∑
k=1
fE[vT
k ]Fk + E[x]E[xT ]− E[x]fT −
p∑
k=1
E[x]E[vT
k ]F T
k
+
p∑
k=1
FkE[vk]fT −
p∑
k=1
FkE[vk]E[xT ] +
p∑
k=1
FkE[vk]
p∑
i=1
E[vT
i ]F T
i
)
(51)
and
J2 =
p∑
k=1
tr(Fk − Exvk
E†vkvk
)Evkvk
(F T
k − E†vkvk
Evkx)
=
p∑
k=1
tr(FkEvkvk
F T
k − FkEvkx − Exvk
F T
k + Exvk
E†vkvk
Evkx), (52)
where
p∑
k=1
tr(FkEvkvk
F T
k ) = tr
[
E
(
p∑
k=1
Fkvk
p∑
i=1
vT
i F T
i
)]
− tr
(
p∑
k=1
FkE[vk]
p∑
i=1
E[vT
i ]F T
i
)
(53)
because
E[viv
T
k ]− E[vi]E[vT
k ] = O for i 6= k (54)
due to orthogonality of the vectors v1, . . . ,vs. On the basis of (50)–(53) and similarly to (30)–
(31), we establish that (49) is true. Hence,
J(f,F1, . . . ,Fp) = ‖E1/2
xx ‖2 −
p∑
k=1
‖Exvk
(E1/2
vkvk
)†‖2 + ‖f − E[x]
+
p∑
k=1
FkE[vk]‖2 +
p∑
k=1
‖FkE1/2
vkvk
− Exvk
(E1/2
vkvk
)†‖2. (55)
It follows from the last two terms in (55) that the constrained minimum (6)–(7) is achieved if
f = f0 with f0 given by (43), and F 0
k is such that
Jk(F 0
k ) = min
Fk
Jk(Fk) subject to rank (Fk) = ηk, (56)
where Jk(Fk) = ‖FkE1/2
vkvk − Exvk
(E1/2
vkvk)†‖2. The constrained minimum (6)–(7) is achieved if
f = f0 is defined by (43), and if [43]
FkE1/2
vkvk
= Gηk
. (57)
The matrix equation (57) has the general solution [45]
Fk = F 0
k = Gηk
(E1/2
vkvk
)† + Ak[I − E1/2
vkvk
(E1/2
vkvk
)†] (58)
if and only if
Gηk
(E1/2
vkvk
)†E1/2
vkvk
= Gηk
. (59)
The latter is satisfied on the basis of the following derivation2.
2Note that the matrix I−E1/2
vkvk (E1/2
vkvk )† is simply a projection onto the null space of Evkvk and can be replaced
by I − Evkvk (Evkvk )†.
12 A. Torokhti and P. Howlett
As an extension of the technique presented in the proving Lemmata 1 and 2 in [26], it can
be shown that for any matrices Q1, Q2 ∈ Rm×n,
N (Q1) ⊆ N (Q2) ⇒ Q2(I −Q†
1Q1) = O, (60)
where N (Qi) is the null space of Qi for i = 1, 2. In regard of the equation under consideration,
N ([E1/2
vkvk
]†) ⊆ N (Exvk
[E1/2
vkvk
]†). (61)
The definition of Gηk
implies that
N (Exvk
[E1/2
vkvk
]†) ⊆ N (Gηk
) and then N ([E1/2
vkvk
]†) ⊆ N (Gηk
).
On the basis of (60), the latter implies Gηk
[I − (E1/2
vkvk)†E1/2
vkvk ] = O, i.e. (59) is true. Hence, (58)
and (44)–(45) are true as well.
Next, similar to (35),
‖Exvk
(E1/2
vkvk
)†‖2 − ‖Gηk
− Exvk
(E1/2
vkvk
)†‖2 =
ηk∑
j=1
β2
kj . (62)
Then (46) follows from (55), (58), (43) and (62). �
Remark 4. The known reduced-rank transforms based on the Volterra polynomial structure [16,
27, 29] require the computation of a covariance matrix similar to Evv, where v = [v1, . . . ,vp]T ,
but for p = N where N is large (see Sections 1 and 2). The relationships (30)–(33) and (51)–(56)
illustrate the nature of the proposed method and its difference from the techniques in [16, 27, 29]:
due to the structure (3) of the transform Tp, the procedure for finding f0, F0
1 , . . ., F0
p avoids
direct computation of Evv which could be troublesome due to large N . If operators Q1, . . . ,Qp
are orthonormal, as in Theorem 1, then (29) is true and the covariance matrix Evv is reduced to
the identity. If operators Q1, . . . ,Qp are orthogonal, as in Theorem 2, then (54) holds and the
covariance matrix Evv is reduced to a block-diagonal form with non-zero blocks Ev1v1 , . . . , Evpvp
so that
Evv =
Ev1v1 O . . . O
O Ev2v2 . . . O
. . . . . . . . . . . .
O O . . . Evpvp
with O denoting the zero block. As a result, the procedure for finding f0, F0
1 , . . . ,F0
p is reduced
to p separate rank-constrained problems (33) or (56). Unlike the methods in [16, 27, 29], the
operators F0
1 , . . . ,Fp
0 are determined with much smaller m× n and n× n matrices given by the
simple formulae (20) and (43)–(45). This implies a reduction in computational work compared
with that required by the approach in [27, 29, 34].
Corollary 2. Let v1, . . . ,vp be determined by Lemma 3. Then the vector f̄ and operators
F̄1, . . . , F̄p, satisfying the unconstrained minimum (6), are determined by
f̄ = E[x]−
p∑
k=1
F̄kE[vk] (63)
and
F̄1 = Exv1E†v1v1
+A1[I − Ev1v1E†v1v1
], (64)
Combined Reduced-Rank Transform 13
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
F̄p = ExvpE†vpvp
+Ap[I − EvpvpE†vpvp
]. (65)
The associated accuracy for transform T̄p, defined by
T̄p(y) = f̄ +
p∑
k=1
F̄k(vk),
is given by
E[‖x− T̄p(y)‖2] = ‖E1/2
xx ‖2 −
p∑
k=1
‖Exvk
(E1/2
vkvk
)†‖2. (66)
Proof. It follows from (55) that the unconstrained minimum (6) is achieved if f is defined
by (63) and if Fk satisfies the equation FkE1/2
vkvk − Exvk
(E1/2
vkvk)† = O for each k = 1, . . . , p.
Similar to (57)–(58), its general solution is given by
Fk = F̄k = Exvk
E†vkvk
+ Ak[I − Evkvk
E†vkvk
]
because E1/2
vkvk(E1/2
vkvk)† = Evkvk
E†vkvk . We define F̄k by [F̄k(wk)](ω) = F̄k[wk(ω)] for all k =
1, . . . , p, and then (64)–(65) are true. The relation (66) follows from (55) and (63)–(65). �
Remark 5. The difference between the transforms given by Theorems 1 and 2 is that F0
k by (20)
(Theorem 1) does not contain a factor associated with (E1/2
vkvk)† for all k = 1, . . . .p. A similar
observation is true for Corollaries 1 and 2.
Remark 6. The transforms given by Theorems 1 and 2 are not unique due to arbitrary operators
A1, . . . ,Ap. A natural particular choice is A1 = · · · = Ap = O.
5.2.3 Compression procedure by T 0
p
Let us consider transform T 0
p given by (8), (43)–(45) with Ak = O for k = 1, . . . , p where Ak is
the matrix given in (58). We write [T 0
p (y)](ω) = T 0
p (y) with T 0
p : Rn → Rm.
Let
B
(1)
k = Skηk
Vkηk
DT
kηk
and B
(2)
k = DT
kηk
(E1/2
vkvk
)†
so that B
(1)
k ∈ Rm×ηk and B
(2)
k ∈ Rηk×n. Here, η1, . . ., ηp are determined by (7). Then
T 0
p (y) = f +
p∑
k=1
B
(1)
k B
(2)
k vk,
where vk = vk(ω) and B
(2)
k vk ∈ Rηk for k = 1, . . . , p with η1 + · · · + ηp < m. Hence, matrices
B
(2)
1 , . . . , B
(2)
p perform compression of the data presented by v1, . . . , vp. Matrices B
(1)
1 , . . . , B
(1)
p
perform reconstruction of the reference signal from the compressed data.
The compression ratio of transform T 0
p is given by
r0 = (η1 + · · ·+ ηp)/m. (67)
14 A. Torokhti and P. Howlett
5.2.4 A special case of transform Tp
The results above have been derived for any operators ϕ1, . . . ,ϕp in the model Tp. Some
specializations for ϕ1, . . . ,ϕp were given in Section 3.2. Here and in Section 5.2.5, we consider
alternative forms for ϕ1, . . . ,ϕp.
(i) Operators ϕ1, . . . ,ϕp can be determined by a recursive procedure given below. The
motivation follows from the observation that performance of the transform Tp is improved if y
in (5) is replaced by an estimate of x.
First, we set ϕk(y) = y and determine estimate x(1) of x from the solution of problem (6)
(with no constraints (7)) by Corollaries 1 or 2 with p = 1. Next, we put ϕ1(y) = y and
ϕ2(y) = x(1), and find estimate x(2) from the solution of unconstrained problem (6) with
p = 2. In general, for j = 1, . . . , p, we define ϕj(y) = x(j−1), where x(j−1) has been determined
similarly to x(2) from the previous steps. In particular, x(0) = y.
(ii) Operators ϕ1, . . . ,ϕp can also be chosen as elementary functions. An example is given in
item (i) of Section 3.2 where ϕk(y) was constructed from the power functions. An alternative
possibility is to choose trigonometric functions for constructing ϕk(y). For instance, one can
put
[ϕ1(y)](ω) = y and [ϕk+1(y)](ω) = [cos(ky1), . . . , cos(kyn)]T (68)
with y = [y1, . . . , yn]T and k = 1, . . . , p − 1. In this paper, we do not analyse such a possible
choice for ϕ1, . . . ,ϕp.
5.2.5 Other particular cases of transform Tp
and comparison with known transforms
(i) Optimal non-linear filtering. The transforms T̂p (36)–(37) and T̄p (63)–(65), which are
particular cases of the transforms given in Theorems 1 and 2, represent optimal filters that
perform pure filtering with no signal compression. Therefore they are important in their own
right.
(ii) The Fourier series as a particular case of transform T̄p. For the case of the
minimization problem (6) with no constraint (7), F1, . . . ,Fp are determined by the expressions
(36) and (63)–(65) which are similar to those for the Fourier coefficients [35]. The structure of
the model Tp presented by (3) is different, of course, from that for the Fourier series and Fourier
polynomial (i.e. a truncated Fourier series) in Hilbert space [35]. The differences are that Tp
transforms y (not x as the Fourier polynomial does) and that Tp consists of a combination of
three operators ϕk, Qk and Fk where Fk : L2(Ω, H̃k) → L2(Ω,HX) is an operator, not a scalar
as in the Fourier series [35]. The solutions (36) and (63)–(65) of the unconstrained problem (6)
are given in terms of the observed vector y, not in terms of the basis of x as in the Fourier
series/polynomial. The special features of Tp require special computation methods as described
in Section 5.
Here, we show that the Fourier series is a particular case of the transform Tp.
Let x ∈ L2(Ω,H) with H a Hilbert space, and let {v1,v2, . . .} be an orthonormal basis in
L2(Ω,H). For any g,h ∈ L2(Ω,H), we define the scalar product 〈·, ·〉 and the norm ‖ · ‖E in
L2(Ω,H) by
〈g,h〉 =
∫
Ω
g(ω)h(ω)dµ(ω) and ‖g‖E = 〈g, g〉1/2, (69)
respectively. In particular, if H = Rm then
‖g‖2
E
=
∫
Ω
g(ω)[g(ω)]T dµ(ω) =
∫
Ω
‖g(ω)‖2dµ(ω) = E[‖g‖2], (70)
i.e. E[‖g‖2] is defined similarly to that in (4).
Combined Reduced-Rank Transform 15
Let us consider the special case of transform Tp presented in item (iii) of Section 3.2 and
let us also consider the unconstrained problem (6) formulated in terms of such a Tp where we
now assume that x has the zero mean, f = O, p = ∞, {v1,v2, . . .} is an orthonormal basis in
L2(Ω,H) and Fk is a scalar, not an operator as before. We denote αk = Fk with αk ∈ R. Then
similar to (36) in Corollary 1, the solution to unconstrained problem (6) is defined by α̂k such
that
α̂k = Exvk
with k = 1, 2, . . . .
Here, Exvk
= E[xvk] − E[x]E[vk] = E[xvk] = 〈x,vk〉 since E[x] = 0 by the assumption.
Hence, α̂k = Exvk
is the Fourier coefficient and the considered particular case of Tp(y) with Fk
determined by α̂k is given by
Tp(y) =
∞∑
k=1
〈x,vk〉vk. (71)
Thus, the Fourier series (71) in Hilbert space follows from (3), (6) and (36) when Tp has the
form given in item (iii) of Section 3.2 with x, f , p, {v1,v2, . . .} and Fk as above.
(iii) The Wiener filter as a particular case of transform T̄p (63)–(65). In the following
Corollaries 3 and 4, we show that the filter T̄p guarantees better accuracy than that of the Wiener
filter.
Corollary 3. Let p = 1, E[x] = 0, E[y] = 0, ϕ1 = I, Q1 = I and A1 = O or A1 = ExyE
†
yy.
Then T̄p is reduced to the filter Ť such that
[Ť (y)](ω) = Ť [y(ω)]
with
Ť = ExyE
†
yy. (72)
Remark 7. The unconstrained linear filter, given by (72), has been proposed in [7]. The
filter (72) is treated as a generalisation of the Wiener filter.
Let x̃, ṽ1, . . . , ṽp be the zero mean vectors. The transform T̄p, applied to x̃, ṽ1, . . . , ṽp, is
denoted by T̄W,p.
Corollary 4. The error E[‖x̃ − T̄W,p(ỹ)‖2] associated with the transform T̄W,p is smaller than
the error E[‖x̃− Ť (ỹ)‖2] associated with the Wiener filter [7] by
p∑
k=2
‖Ex̃ṽk
(E1/2
ṽk ṽk
)†‖2, i.e.
E[‖x̃− T̄W,p(ỹ)‖2] = E[‖x̃− Ť (ỹ)‖2]−
p∑
k=2
‖Ex̃ṽk
(E1/2
ṽk ṽk
)†‖2. (73)
Proof. It is easy to show that
E[‖x̃− Ť (ỹ)‖2] = ‖E1/2
x̃x̃ ‖
2 − ‖Ex̃ṽ1(E
1/2
ṽ1ṽ1
)†‖2, (74)
and then (73) follows from (66) and (74). �
(iv) The KLT as a particular case of transform T 0
p (43)–(46). The KLT [7] follows
from (43)–(46) as a particular case if f = O, p = 1, ϕ1 = I, Q1 = I and A1 = O.
To compare the transform T 0
p with the KLT [7], we apply T 0
p , represented by (43)–(46), to
the zero mean vectors x̃, ṽ1, . . . , ṽp as above. We write T ∗p for such a version of T 0
p , and TKLT
for the KLT [7].
16 A. Torokhti and P. Howlett
Corollary 5. The error E[‖x̃− T ∗p (ỹ)‖2] associated with the transform T ∗p is smaller than the
error E[‖x̃− TKLT(ỹ)‖2] associated with the KLT [7] by
p∑
k=2
ηk∑
j=1
β2
kj, i.e.
E[‖x̃− T ∗p (ỹ)‖2] = E[‖x̃− TKLT(ỹ)‖2]−
p∑
k=2
ηk∑
j=1
β2
kj . (75)
Proof. The error associated with FKLT [7] is represented by (46) for p = 1,
E[‖x̃− TKLT(ỹ)‖2] = ‖E1/2
x̃x̃ ‖
2 −
η1∑
j=1
β2
1j . (76)
Then (75) follows from (46) and (76). �
(v) The transform [26] as a particular case of transform T 0
p . The transform [26]
follows from (3) as a particular case if f = O, p = 2, ϕ1(y) = y, ϕ2(y) = y2 and Q1 = Q2 = I
where y2 is defined by y2(ω) = [y2
1, . . . , y
2
n]T . We note that transform [26] has been generalized
in [27].
(vi) The transforms [27] as particular cases of transform Tp. The transform [27]
follows from (3) if Qk = I, ϕk(y) = yk where yk = (y, . . . ,y) ∈ L2(Ω, Rnk), Rnk is the kth
degree of Rn, and if Fk is a k-linear operator.
To compare transform T 0
p and transform T[27] [27] of rank r, we write zj = yjy, z =
[z1, . . . , zn]T , s = [1 yT zT ]T and denote by α1, . . . , αr the non-zero singular values associated
with the truncated SVD for the matrix Exs(E
1/2
ss )†. Such a SVD is constructed similarly to that
in (39)–(41).
Corollary 6. Let ∆p =
p∑
k=1
ηk∑
j=1
β2
kj−
r∑
j=1
α2
j and let ∆p ≥ 0. The error E[‖x−T 0
p (y)‖2] associated
with the transform T 0
p is less than the error E[‖x−T[27](y)‖2] associated with the transform T[27]
by ∆p, i.e.
E[‖x− T 0
p (y)‖2] = E[‖x− T[27](y)‖2]−∆p. (77)
Proof. It follows from [27] that
E[‖x− T[27](y)‖2] = ‖E1/2
xx ‖2 −
r∑
j=1
α2
j . (78)
Then (77) follows from (46) and (78). �
We note that, in general, a theoretical verification of the condition ∆p ≥ 0 is not straightfor-
ward. At the same time, for any particular x and y, ∆p can be estimated numerically.
Although the transform T 0
p includes the transform T[27], the accuracy of T[27] is, in general,
better than that of T 0
p for the same degrees of T 0
p and T[27]. This is because T[27] implies more
terms. For instance, T[27] of degree two consists of n + 1 terms while T 0
p , for p = 2, consists
of three terms only. If for a given p, the condition ∆p ≥ 0 is not fulfilled, then the accuracy
E[‖x−T 0
p (y)‖2] can be improved by increasing p or by applying the iterative method presented
in [28].
(vii) Unlike the techniques presented in [13, 14], our method implements simultaneous
filtering and compression, and provides this data processing in probabilistic setting. The idea of
implicitly mapping the data into a high-dimensional feature space [8, 12, 15] could be extended
to the transform presented in this paper. We intend to develop such an extension in the future.
Combined Reduced-Rank Transform 17
6 Numerical realization
6.1. Orthogonalization. Numerical realization of transforms of random vectors implies a
representation of observed data and estimates of covariance matrices in the form of associated
samples.
For the random vector uk, we have q realizations, which are concatenated into n× q mat-
rix Uk. A column of Uk is a realization of uk. Thus, a sequence of vectors u1, . . . ,up is
represented by a sequence of matrices U1, . . . , Up. Therefore the transformation of u1, . . . ,up to
orthonormal or orthogonal vectors v1, . . . ,vp (by Lemmata 1 and 3) is reduced to a procedure
for matrices U1, . . . , Up and V1, . . . , Vp. Here, Vk ∈ Rn×q is a matrix formed from realizations of
the random vector vk for each k = 1, . . . , p.
Alternatively, matrices V1, . . . , Vp can be determined from known procedures for matrix or-
thogonalization [43]. In particular, the QR decomposition [43] can be exploited in the following
way. Let us form a matrix U = [UT
1 . . . UT
p ]T ∈ Rnp×q where p and q are chosen such that np = q,
i.e. U is square3. Let
U = V R
be the QR decomposition for U with V ∈ Rnp×q orthogonal and R ∈ Rnp×q upper triangular.
Next, we write V = [V T
1 . . . V T
p ]T ∈ Rmp×q where Vk ∈ Rn×q for k = 1, . . . , p. The submatrices
V1, . . . , Vp of V are orthogonal, i.e. ViV
T
j =
{
O, i 6= j,
I, i = j,
for i, j = 1, . . . , p, as required.
Other known procedures for matrix orthogonalization can be applied to U1, . . . , Up in a similar
fashion.
Remark 8. For the cases when v1, . . . ,vp are orthonormal or orthogonal but not orthonormal,
the associated accuracies (21), (38), (46) and (66) differ for the factors depending on (E1/2
vkvk)†.
In the case of orthonormal v1, . . . ,vp, (E1/2
vkvk)† = I and this circumstance can lead to an increase
in the accuracy.
6.2. Covariance matrices. The expectations and covariance matrices in Lemmata 1–3
and Theorems 1–2 can be estimated, for example, by the techniques developed in [36, 37, 38,
39, 40, 41]. We note that such estimation procedures represent specific problems which are not
considered here.
6.3. T 0
p , T̂p and T̄p for zero mean vectors. The computational work for T 0
p (Theorems 1
and 2), T̂p and T̄p (Corollaries 1 and 2) can be reduced if T 0
p , T̂p and T̄p are applied to the zero
mean vectors x̃, ṽ1, . . . , ṽp given by x̃ = x−E[x], ṽ1 = v1 −E[v1], . . . , ṽp = vp −E[vp]. Then
f0 = O and f̄ = O. The estimates of the original x are then given by
x̌ = E[x] +
p∑
k=1
F0
k (ṽk), x̂ = E[x] +
p∑
k=1
F̂k(ṽk) and x̄ = E[x] +
p∑
k=1
F̄k(ṽk)
respectively. Here, F0
k , F̂k and F̄k are defined similarly to (20), (36), (44), (45), (64) and (65).
7 Discussion
Some distinctive features of the proposed techniques are summarized as follows.
Remark 9. It follows from Theorems 1 and 2, and Corollaries 1 and 2 that the accuracy
associated with the proposed transform improvs when p increases.
3Matrix U can also be presented as U = [U1 . . . Up] with p and q such that n = pq.
18 A. Torokhti and P. Howlett
Remark 10. Unlike the approaches based on Volterra polynomials [27, 29, 34] our method does
not require computation of pseudo-inverses for large N×N matrices with N = n+n2+· · ·+np−1.
Instead, the proposed transforms use pseudo-inverses of n × n matrix Evkvk
. See Theorems 1
and 2. This leads to a substantial reduction in computational work.
Remark 11. The idea of the recurrent transform [28] can be extended for the proposed trans-
form in a way similar to that considered in [28]. The authors intend to develop a theory for such
an extension in a feasible future.
8 Conclusions
The new results obtained in the paper are summarized as follows.
We have proposed a new approach to constructing optimal nonlinear transforms for random
vectors. The approach is based on a representation of a transform in the form of the sum of p
reduced-rank transforms. Each particular transform is formed by the linear reduced-rank oper-
ator Fk, and by operators ϕk and Qk with k = 1, . . . , p. Such a device allows us to improve the
numerical characteristics (accuracy, compression ration and computational work) of the known
transforms based on the Volterra polynomial structure [27, 29, 34]. These objectives are achieved
due to the special “intermediate” operators ϕ1, . . . ,ϕp and Q1, . . . ,Qp. In particular, we have
proposed two types of orthogonalizing operators Q1, . . . ,Qp (Lemmata 1 and 3) and a specific
method for determining ϕ1, . . . ,ϕp (Section 5.2.4). Such operators reduce the determination of
optimal linear reduced-rank operators F0
1 , . . . ,F0
p to the computation of a sequence of relatively
small matrices (Theorems 1 and 2).
Particular cases of the proposed transform, which follow from the solution of the uncon-
strained minimization problem (6), have been presented in Corollaries 1 and 2. Such transforms
are treated as new optimal nonlinear filters and, therefore, are important in their own right.
The explicit representations of the accuracy associated with the proposed transforms have
been rigorously justified in Theorems 1 and 2, and Corollaries 1 and 2.
It has been shown that the proposed approach generalizes the Fourier series in Hilbert space
(Section 5.2.5), the Wiener filter, the Karhunen–Loève transform (KLT) and the known optimal
transforms [26, 27, 29]. See Corollaries 3, 4 and 5, and Section 5.2.5 in this regard. In particular,
it has been shown that the accuracies associated with the proposed transforms are better than
those of the Wiener filter (Corollary 4) and the KLT (Corollary 5).
A Appendix
Proof of Lemma 1. Let us write
w1 = u1 and wi = ui −
i−1∑
k=1
Uik(wk) for i = 1, . . . , p,
with Uik : L2(Ω, Rn) → L2(Ω, Rn) chosen so that, for k = 1, . . . , i− 1,
Ewiwk
= O if i 6= k. (79)
We wish (79) is true for any k, i.e.
Ewiwk
= Ewiwk
− E[wi]E[wT
k ]
= E
[(
ui −
i−1∑
l=1
Uil(wl)
)
wT
k
]
− E
[(
ui −
i−1∑
l=1
Uil(wl)
)]
E[wT
k ]
Combined Reduced-Rank Transform 19
= Euiwk
− UikEwkwk
− E[ui]E[wT
k ] + E[wk]E[wT
k ]
= Euiwk
− UikEwkwk
= O.
Thus, Uik = Euiwk
E−1
wkwk
, and the statement (i) is true.
It is clear that vectors v1, . . . ,vp, defined by (12), are orthogonal. For Qk, defined by (13),
we have Qk = (E1/2
wkwk)−1 and
Evkvk
= E
[
(E1/2
wkwk
)−1wkw
T
k (E1/2
wkwk
)−1
]
− E
[
(E1/2
wkwk
)−1wk]E[wT
k (E1/2
wkwk
)−1
]
= (E1/2
wkwk
)−1Ewkwk
(E1/2
wkwk
)−1 = I.
Hence, v1, . . . ,vp, defined by (12), are orthonormal.
Proof of Lemma 3. We wish that Evivk
= O for i 6= k. If Zik has been chosen so that this
condition is true for all k = 1, . . . , i− 1 then we have
E
[(
ui −
i−1∑
l=1
Zil(vl)
)
vT
k
]
= Euivk
−
i−1∑
l=1
ZilEvlvk
= Euivk
− ZikEvkvk
= O. (80)
Thus,
ZikEvkvk
= Euivk
. (81)
The necessary and sufficient condition [45] for the solution of the matrix equation (81) is given
by
Euivk
E†vkvk
Evkvk
= Euivk
. (82)
By Lemma 2, (82) is true. Then, on the basis of [45], the general solution to (81) is given by (16).
Acknowledgements
The first co-author is grateful to Oliver Cappé for useful discussions related to the structure of
the proposed transform.
[1] Hotelling H., Analysis of a complex of statistical variables into Principal Components, J. Educ. Psychol.,
1933, V.24, 417–441, 498–520.
[2] Karhunen K., Über Lineare Methoden in der Wahrscheinlichkeitsrechnung, Ann. Acad. Sci. Fennicae, Ser. A,
1947, V.137.
[3] Loève M., Fonctions aléatoires de second order, in P. Lévy, Processus Stochastiques et Mouvement Brownien,
Paris, Hermann, 1948.
[4] Jolliffe I.T., Principal component analysis, New York, Springer Verlag, 1986 (2 ed., 2002).
[5] Scharf L.L., The SVD and reduced rank signal processing, Signal Processing, 1991, V.25, 113–133.
[6] Yamashita Y., Ogawa H., Relative Karhunen–Loéve transform, IEEE Trans. on Signal Processing, 1996,
V.44, 371–378.
[7] Hua Y., Liu W.Q., Generalized Karhunen–Loève transform, IEEE Signal Processing Letters, 1998, V.5,
141–143.
[8] Vapnik V., Statistical Learning Theory, Wiley, 1998.
[9] Ocaña F.A., Aguilera A.M., Valderrama M.J., Functional principal componenets analysis by choice of norm,
J. Multivariate Anal., 1999, V.71, 262–276.
[10] Tipping M.E., Bishop C.M., Probabilistic principal component analysis, J. of the Royal Statistical Society,
Ser. A, 1999, V.61, 611–619.
[11] Tipping M.E., Bishop C.M., Mixtures of probabilistic principal component analysers, Neural Computation,
1999, V.11, 443–482.
20 A. Torokhti and P. Howlett
[12] Schölkopf B., Smola A.J., Müller K.-R., Kernel principal component analysis, in Advances in Kernel Meth-
ods. Support Vector Learning, Editors B. Schölkopf, C.J.C. Burges and A.J. Smola, Cambridge, MIT Press,
1999, 327–352.
[13] Tenenbaum J.B., de Silva V., Langford J.C., A global geometric framework for nonlinear dimensionality
reduction, Science, 2000, V.290, Issue 5500, 2319–2323.
[14] Rowers S.T., Saul L.K., Nonlinear dimensionality reduction by locally linear embedding, Science, 2000,
V.290, Issue 5500, 2323–2326.
[15] Cristianini N., Shawe-Taylor J., An introduction to support vector machines and other kernel-based learning
methods, Cambridge, Cambridge University Press, 2000.
[16] Yamada I., Sekiguchi T., Sakaniwa K., Reduced rank Volterra filter for robust identification of nonlinear
systems, in Proc. 2nd Int. Workshop on Multidimensional (ND) Systems – NDS2000, Poland, Czocha Castle,
2000, 171–175.
[17] Hua Y., Nikpour M., Stoica P., Optimal reduced-rank estimation and filtering, IEEE Trans. on Signal
Processing, 2001, V.49, 457–469.
[18] Kneip A., Utikal K.J., Inference for density families using functional principal component analysis, Journal
of the American Statistical Association, 2001, V.96, 519–542.
[19] Honig M.L., Xiao W., Performance of reduced-rank linear interferrence suppression, IEEE Trans. on Infor-
mation Theory, 2001, V.47, 1928–1946.
[20] Chen W., Mitra U., Schniter P., On the equivalence of three rediced rank linear estimators with applications
to DS-CDMA, IEEE Trans. on Information Theory, 2002, V.48, 2609–2614.
[21] Honig M.L., Goldstein J.S., Adaptive reduced-rank interference suppression based on multistage Wiener
filter, IEEE Trans. on Communications, 2002, V.50, 986–994.
[22] Stock J.H., Watson M.W., Forecasting using principal components from a large number of predictors, Journal
of the American Statistical Association, 2002, V.97, 1167–1179.
[23] Fukunaga K., Introduction to statistical pattern recognition, Boston, Academic Press, 1990.
[24] Kraut S., Anderson R.H., Krolik J.L., A generalized Karhunen–Loève basis for efficient estimation of tro-
pospheric refractivity using radar clutter, IEEE Trans. on Signal Processing, 2004, V.52, 48–60.
[25] Torokhti A., Howlett P., An optimal filter of the second order, IEEE Trans. on Signal Processing, 2001,
V.49, 1044–1048.
[26] Torokhti A., Howlett P., Optimal fixed rank transform of the second degree, IEEE Trans. on Circuits and
Systems. Part II, Analog & Digital Signal Processing, 2001, V.48, 309–315.
[27] Torokhti A., Howlett P., Pearce C., New perspectives on optimal transforms of random vectors, Optimization:
Theory and Applications, to appear.
[28] Torokhti A., Howlett P., Constructing fixed rank optimal estimators with method of recurrent best approx-
imations, J. Multivariate Analysis, 2002, V.86, 293–309.
[29] Torokhti A., Howlett P., Best operator approximation in modelling of nonlinear Systems, IEEE Trans. on
Circuits and Systems. Part I, Fundamental Theory and Applications, 2002, V.49, 1792–1798.
[30] Torokhti A., Howlett P., Method of recurrent best estimators of second degree for optimal filtering of random
signals, Signal Processing, 2003, V.83, 1013–1024.
[31] Torokhti A., Howlett P., Best causal mathematical models for a nonlinear system, IEEE Trans. on Circuits
and Systems. Part I, Fundamental Theory and Applications, to appear.
[32] Sontag E.D., Polynomial response maps, Lecture Notes in Control and Information Sciences, 1979. Vol. 13.
[33] Chen S., Billings S.A., Representation of non-linear systems: NARMAX model, Int. J. Control, 1989, V.49,
1013–1032.
[34] Howlett P.G., Torokhti A.P., Pearce C.E.M., A philosophy for the modelling of realistic non-linear systems,
Proc. of Amer. Math. Soc., 2003, V.132, 353–363.
[35] Cotlar M., Cignoli R., An introduction to functional analysis, Amsterdam – London, North-Holland Pub-
lishing Company, 1974, 114–116.
[36] Perlovsky L.I., Marzetta T.L., Estimating a covariance matrix from incomplete realizations of a random
vector, IEEE Trans. on Signal Processing, 1992, V.40, 2097-2100.
[37] Kauermann G., Carroll R.J., A note on the efficiency of Sandwich covariance matrix estimation, Journal of
the American Statistical Association, 2001, V.96, 1387–1396.
[38] Schneider M.K., Willsky A.S., A Krylov subspace method for covariance approximation and simulation of
a random process and fields, Int. J. Multidim. Syst. & Signal Processing, 2003, V.14, 295–318.
Combined Reduced-Rank Transform 21
[39] Kubokawa T., Srivastava M.S., Estimating the covariance matrix: a new approach, J. Multivariate Analysis,
2003, V.86, 28–47.
[40] Ledoit O., Wolf M., A well-conditioned estimator for large-dimensional covariance matrices, J. Multivariate
Analysis, 2004, V.88, 365–411.
[41] Leung P.L., Ng F.Y., Improved estimation of a covariance matrix in an elliptically contoured matrix distri-
bution, J. Multivariate Analysis, 2004, V.88, 131–137.
[42] Higham N.J., Stable iterations for the matrix square root, Numerical Algorithms, 1997, V.15, 227–241.
[43] Golub G.H., van Loan C.F., Matrix computations, Baltimore, Johns Hopkins University Press, 1996.
[44] Kowalski M.A., Sikorski K.A., Stenger F., Selected topics in approximation and computations, New York –
Oxford, Oxford University Press, 1995.
[45] Ben-Israel A., Greville T.N.E., Generalized inverses: theory and applications, New York, John Wiley &
Sons, 1974.
[46] Mathews V.J., Sicuranza G.L., Polynomial signal processing, J. Wiley & Sons, 2001.
[47] Goldstein J.S., Reed I., Scharf L.L., A multistage representation of the Wiener filter based on orthogonal
projections, IEEE Trans. on Information Theory, 1998, V.44, 2943–2959.
1 Introduction
2 Method description
3 Structure of the proposed transform
3.1 Generic form
3.2 Some particular cases
4 Statement of the problem
5 Solution of the problem
5.1 Determination of orthogonalizing operators Q1,…,Qp
5.2 Determination of f0, F10, …, Fp0 satisfying (6)--(7)
5.2.1 The case when matrix Evivi is invertible for i=1,…,p
5.2.2 The case when matrix Evk vk is not invertible for k=1,…,p
5.2.3 Compression procedure by T0p
5.2.4 A special case of transform Tp
5.2.5 Other particular cases of transform Tp and comparison with known transforms
6 Numerical realization
7 Discussion
8 Conclusions
A Appendix
|