laboratoire - unice.frmh/rr/2007/rr-07.06-p.comon.pdf · orthogonal change of bases. in particular,...

29
LABORATOIRE INFORMATIQUE, SIGNAUX ET SYSTÈMES DE SOPHIA ANTIPOLIS UMR 6070 T ENSOR D IAGONALIZATION BY O RTHOGONAL T RANSFORMS Pierre Comon, Mikael Sorensen Projet ASTRE Rapport de recherche ISRN I3S/RR–2007-06–FR February 2007 Laboratoire d’Informatique de Signaux et Systèmes de Sophia Antipolis - UNSA-CNRS 2000, rte.des Lucioles – Les Algorithmes – Bât Euclide B – B.P. 121 – 06903 Sophia-Antipolis Cedex – France Tél.: 33 (0)4 92 94 27 01 – Fax: 33 (0)4 92 94 28 98 – www.i3s.unice.fr UMR6070

Upload: others

Post on 28-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

LABORATOIRE

INFORMATIQUE, SIGNAUX ET SYSTÈMESDE SOPHIA ANTIPOLIS

UMR 6070

TENSORDIAGONALIZATION BY ORTHOGONAL

TRANSFORMS

Pierre Comon, Mikael Sorensen

Projet ASTRE

Rapport de rechercheISRN I3S/RR–2007-06–FR

February 2007

Laboratoire d’Informatique de Signaux et Systèmes de Sophia Antipolis - UNSA-CNRS2000, rte.des Lucioles – Les Algorithmes – Bât Euclide B – B.P. 121 – 06903 Sophia-Antipolis Cedex – France

Tél.: 33 (0)4 92 94 27 01 – Fax: 33 (0)4 92 94 28 98 – www.i3s.unice.frUMR6070

Page 2: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

RÉSUMÉ :Les techniques tensorielles sont utilisées de plus en plus en traitement du signal. En particulier, il est souvent utile de

transformer un tenseur en un autre qui soit le plus diagonal possible. Nous proposons un algorithme qui fournit 3 matricesorthogonales, chacune opérant sur un des trois modes d’un tenseur d’ordre trois, de façon à maximiser sa trace. Un autrealgorithme est proposé pour maximiser la somme des carrés des éléments diagonaux. De tels algorithmes ont déjà été proposésdans le cas de tenseurs symétriques, et utilisés dans le cadre de l’Analyse en Composantes Indépendantes basée sur les cumulants.Notre contribution réside dans l’extension des algorithmes existants au cas non symétrique. On prouve que la solution peut êtreobtenue en un nombre fini de décompositions en éléments propres de faible dimension, et qu’aucune recherche exhaustive n’estnécessaire.

MOTS CLÉS :tenseur diagonalisation

ABSTRACT:Tensor techniques are increasingly used in Signal Processing. In particular, it is often of interest to transform a tensor into

another that is as diagonal as possible. We propose an algorithm that yields 3 orthogonal matrices, each acting on every of thethree modes of a third order tensor, so that its trace is maximized. Another algorithm is proposed, which maximizes the sum ofsquares of diagonal entries. Such algorithms have been already proposed in the case of symmetric tensors, and used in the frameof cumulant-based Independent Component Analysis. Our contribution extends existing algorithms to the non symmetric case.It is proved that the solution can be obtained within a finite number of low-dimensional eigenvalue decompositions, and that noexhaustive search is necessary.

KEY WORDS :tensor diagonalization

Page 3: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

Tensor Diagonalization by Orthogonal TransformsReport I3S-RR-2007-06

Pierre Comon, Mikael Sorensenwww.i3s.unice.fr/˜pcomon/Astre/equipe.htm

February 28, 2007

Abstract

Tensor techniques are increasingly used in Signal Processing andFactor Analysis. In particular, it is often of interest to transform atensor into another that is as diagonal as possible. We propose in thispaper an algebraic algorithm that yields 3 orthogonal matrices, eachacting on every of the three modes of a third order tensor, so that itstrace is maximized. Another algorithm is proposed, which maximizesthe sum of squares of diagonal entries. Such algorithms have been al-ready proposed in the case of symmetric tensors, and used in the frameof cumulant-based Independent Component Analysis. Our contributionextends existing algorithms to the non symmetric case. It is proved thatthe solution can be obtained within a finite number of low-dimensionaleigenvalue decompositions, and that no exhaustive search is necessary.

Keywords: tensor, canonical decomposition, Parafac, tensor rank, congru-ent diagonalization.

1 Introduction

Tensors have been used in Signal Processing for more than a decade, firstmore or less implicitly through High-Order Statistics [18] [11], in particularfor Blind Techniques. Second, orthogonal Tensor Diagonalization has beenrequired in Independent Component Analysis [3]; such tensors were builtof cumulants and were symmetric. More recently, a deterministic BlindIdentification technique has been proposed and decomposes the data tensor[19]; the decomposition was run iteratively with the help of an AlternatingLeast Squares algorithm.

1

Page 4: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

Tensors have thus become de facto useful tools in various applicationareas including signal processing and data analysis, even if a reliable theo-retical framework and associated numerical algorithms are still lacking.

After general statements related to tensors, including notation and ter-minology we focus our discussion on the reduction to diagonal arrays byorthogonal change of bases. In particular, our contribution concerns nonsymmetric tensors, defined on the tensor product of three or more differentEuclidian spaces.

Outer product Let Ai..j and Bk..` be two arrays, of any dimensions. Theouter product of these two arrays is the array C whose entries are definedas:

Ci..j k..` = Ai..j Bk..`

If arrays A and B have rA and rB indices, respectively, then C has rA + rBindices. One denotes this outer product as:

C = A B (1)

Now let T be a L-way array of dimensions N`, 1 ≤ ` ≤ L. This arrayalways admits a decomposition into a sum of outer products as:

T =R∑

r=1

u(1)r u(2)

r . . . u(L)r (2)

where u(`)r is a N` × 1 array, ∀r. This writing is not unique, especially if

nothing is imposed to limit the value of integer R.

Tensors If T takes its values in a field K, which can be the real or thecomplex field, arrays u

(`)r may be considered as vectors of the linear space

KN` . Thus, as a combination of tensor products of vectors, T may beconsidered as a tensor. Under a linear change of coordinate system in eachspace KN` , defined by a matrix A(`), the tensor is represented by anotherarray, obtained by the multi-linear transform A(1), A(2), . . .A(L). Since itis legitimate once a basis has been defined in the space, no distinction willbe made in the remainder between the tensor and its array representation.

Similarly, as soon as a canonical basis is fixed in the linear space KN1N2..NL ,no distinction needs to be made between the tensor product between vec-tors u`

r appearing in (2), and the array obtained by making the Kroneckerproduct between the u`

r’s considered as vectors of coordinates.

2

Page 5: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

The number of indices necessary to describe the tensor coordinates iscalled the order of the tensor. Thus, a tensor of order L has L dimensions,N`.

The outer product between two tensors A and B is often referred toas tensor product, and often denoted A ⊗B; we shall denote it as A Bbecause of a possible confusion. In fact, an array associated with a tensorcan always be stored in matrix format. Following this practice, the matrixrepresentation of the outer product A B is given by the Kronecker productof their corresponding matrix representations, and denoted as A ⊗B. Wefound it less confusing to use different notations, especially when tensors areof order larger than 2.

Consistency of terminology In physics, the rank of an array sometimesrefers to the number of indices minus 1. This is very confusing, since matricesare particular tensors, and with this terminology they would always haverank 1. Yet, the rank of a matrix refers to a totally different object, namelythe number of non zero singular values. Hence, it is mandatory to avoid thisinappropriate wording, and use order to refer to the number of indices.

Valence In some disciplines (including physics), the distinction is madebetween covariant and contravariant indices in array representations of ten-sors. This is relevant when tensors are mappings from an Euclidian spaceto another, and when their entries are not changed in the same way by achange of basis. The scalar product between vectors may be seen as theimage of a vector of the primal space by a linear form of the dual space. Acontraction is always represented as a scalar product, hence by the actionof a linear form of the dual on a vector of the primal. Consequently, itmay be relevant to distinguish between indices corresponding to the primal(covariant indices, appearing as subscripts) and those corresponding to thedual (contravariant indices, appearing as superscripts).

In statistics, tensors of order L are generally symmetric, and are obtainedby taking Lth derivatives of some scalar function, like a characteristic func-tion. All indices are thus of same nature. For computational purposes,putting some indices as subscripts and others as superscripts may ease thecalculus [15].

In Data Analysis, it is somewhat less obvious, but it appears that thereis no general rule that could apply, and that could tell us that some indicesshould be contravariant. On the contrary, there might be more than twoEuclidian spaces under consideration (e.g. unsymmetric tensors with non

3

Page 6: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

equal dimensions) [19] [20]. Thus again, all indices should be of same naturein a given array, unless reliably justified.

There is however one exception where the distinction might be relevantin statistics, data analysis or signal processing. This occurs if variables taketheir values in the complex field. In fact, complex conjugation may be seen asa duality operation, which transforms a vector into a linear form in the dual.More generally for a complex tensor, covariant indices would be transformedinto contravariant ones under a complex conjugation. Therefore, it may berelevant and meaningful to make this distinction [14].

In order to simplify the presentation, mainly third order real tensors willbe subsequently considered. Let T be a three-way tensor with entries Tijk,1 ≤ i ≤ I, 1 ≤ j ≤ J , 1 ≤ k ≤ K. Such a 3-way tensor is of dimensionsI × J ×K.

Change of basis For our purposes, tensors will merely denote arrays thatenjoy the so-called multi-linearity property by linear change of coordinates.More precisely, let A, B and C be three matrices of size I ′× I, J ′× J , andK ′×K, respectively. Then a tensor T is transformed into a tensor T ′ givenby:

T ′ijk =∑`mn

Ai`BjmCkn T`mn (3)

Contraction Contraction is the operation that consists of summing overone of the indices in an expression. For instance, for given tensors A andB of orders α and β, having a common kth dimension, one can define thetensor C = A •k B of order α+ β − 2 as:

Ci1..iα,j1..jβ=

Nk∑nk=1

Ai1..nk..iα Bj1..nk..jβ

The contraction allows to define the inner product between two tensors ofsame order and dimensions. For instance, for two third order tensors ofdimensions I × J ×K, we have:

〈A,B〉 = A •1•2•3B

This inner product induces the Frobenius norm:

||A||2 = 〈A,A〉 =∑ij..k

|Aij..k|2

4

Page 7: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

Note that attention must be paid when several tensors are involved in aseries of contractions, like:

A •1B •

2C

In fact, contraction denoted this way is not associative. As an illustration,the multi-linearity property (3) is often written as:

T ′ = T •1A •

2B •

3C

but it means that the index k appearing in the contraction operation •k

corresponds to the kth index of tensor T , being understood that the sum-mation is always performed on the second index of matrices A, B and C.This is just a matter of convention. The contraction operation notation ispleasant because compact, but must always be redefined every time it isused in order to avoid any ambiguity.

Tensor rank Carroll and Chang [1] and Harshman [13] independentlyproposed a decomposition that they named Candecomp and Parafac,respectively. More precisely, given a third order tensor T of size I × J ×K,this decomposition consists of writing (2) with a minimal number R(T ) ofterms:

T =R(T )∑r=1

ar br cr (4)

In other words, there exist three matrices of size I ×R, J ×R, and K ×R,respectively, such that

Tijk =R(T )∑r=1

AirBjrCkr (5)

The rank of a given tensor T (and by extension, of the array defining itscoordinates in a given basis) is the minimal integer R(T ) such that decompo-sition (4-5) –or more generally decomposition (2)– is exactly satisfied. Thisminimal decomposition is referred to as the tensor Canonical Decomposition(CanD).

Symmetry A tensor is said to be symmetric if the value of its entries donot change by any permutation of its indices: Tij..k = Tσ(ij..k). It is stillan open problem to prove that the rank of a symmetric tensor is the samewhether the constraint of symmetry is imposed in every rank-one tensor inthe CanD or not.

5

Page 8: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

L N 2 3 4 5 6 7 8 9 103 2 5 7 10 14 19 24 30 364 4 9 20 37 62 97

Table 1: Generic rank of unconstrained arrays of dimension N and orderL.

Generic rank An important fact to emphasize is that, contrary to matri-ces, the rank of a tensor can exceed its dimensions. In order to demonstratethis, one can consider random arrays that are generated by drawing inde-pendently the entries according to a continuous distribution. We shall callsuch arrays generic arrays. In the complex field, such arrays always havethe same rank with probability one [6].

For instance, a generic matrix of size I × J has a rank min(I, J). More-oever, the generic rank of matrices is maximal. These statements do nothold true anymore for higher order tensors.

As an example, a 8× 8× 8 tensor has generically a rank equal to 24 (cf.table 1 in the complex field. This generic rank is obtained by computing theCanD of a tensor whose entries are randomly drawn according to a contin-uous probability distribution [9] [7]. For symmetric tensors, the number ofdegrees of freedom is smaller, and so is the generic rank: a 8× 8× 8 tensorhas generically a rank equal to 15 in the complex field.

If the CanD is computed in the real field, then it may happen thatrandom tensors do not always have the same rank: the generic rank doesnot exist, and we must talk about typical ranks. Typical ranks are thecollection of ranks that can be obtained with non zero probability. Thesmallest typical rank computed in the real field is equal to the generic rankcomputed in the complex field [6] [8]. Table 2 and 1 report generic ranksof tensors with equal dimensions, in the symmetric [7] and unconstrained[9] cases, respectively. These values cannot be computed with the help ofsimple arithmetic relations, unfortunately.

Another striking fact is that the maximal value of the tensor rank isgenerally larger than the generic rank; it is however unknown for most valuesof order and dimensions.

Uniqueness Uniqueness of the CanD is to be understood up to a scalingand a permutation of the columns of each mode matrix. By counting thenumber of degrees of freedom in both sides of (5), one may naively think

6

Page 9: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

L N 2 3 4 5 6 7 8 9 103 2 4 5 8 10 12 15 19 224 3 6 10 15 21 30 42 55 72

Table 2: Generic rank of symmetric arrays of dimension N and order L.

that this would give generic uniqueness conditions. Unfortunately, this ruleis often true, but there are exceptions.

On the other hand, from (5), one can tell that the decomposition cannotbe unique if IJK < R(I+J+K−2). More generally, the number of degreesof freedom of a rank-1 tensor of order r and dimensions Ni is:

F (r,N) = (r∑

i=1

Ni)− r + 1 (6)

because of scale ambiguities. A rule solely based on counting the number ofdegrees of freedom would tell us that uniqueness of the CanD of a tensorT is reached if and only if

R(T )[r∑

i=1

Ni − r + 1] ≤r∏

i=1

Ni

It turns out that this condition is sufficient. However, it does not give thegeneric rank of tensors, as can be checked out by comparing the quantitybelow (which is actually a lower bound) with the values reported in thetables 2 or 1 for r ∈ 3, 4:

R ≥⌈ ∏r

i=1Ni∑ri=1Ni − r + 1

⌉Congruent diagonalization If relation (5) is invertible, that is, if ma-trices A, B, and C are square and admit inverses A′, B′, and C ′, then themulti-linear transform (A′,B′,C ′) brings tensor T into a diagonal tensorwith ones in the diagonal.

We have thus a congruent transformation that diagonalizes T .It is clear that such a transformation may exist only if the rank of T is

at most equal to its smaller dimension. When it is not the case, it is stillpossible to define a multi-linear invertible transform that minimizes all nondiagonal entries in the obtained tensor, T ′:

(A′,B′,C ′) = Arg MinA,B,C

||T ′ −Diag(T ′)||2 (7)

7

Page 10: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

tensor T ′ is not diagonal, but as diagonal as possible, according to somenorm.

Orthogonal diagonalization We particularize now the congruent diag-onalization to norm-preserving multi-linear transforms. Let U , V , and Wbe three orthogonal real matrices (or unitary matrices in the complex field),of size I × I, J × J , and K ×K, respectively.

By the orthogonal change of bases defined by the triplet (U ,V ,W ),tensor G is transformed into a tensor T with entries:

Tijk =∑pqr

UipVjqWkrGpqr (8)

Under orthogonal transforms, the tensor Frobenius norm is invariant, so thatmaximizing the sum of squares of diagonal entries is equivalent to minimizingthe non diagonal ones. Therefore, the criterion below is appropriate in orderto obtain a tensor as diagonal as possible [3]:

Υ2(U ,V ,W ) =∑

i

T 2iii (9)

Again, note that contrary to matrices, for which the Singular ValueDecomposition (SVD) yields an exact diagonal form via a maximization ofdiagonal entries, criterion (9) does not in general lead to a diagonal tensor.The reason is not due to the criterion, but to the fact that tensors havegenerically a rank that is larger than the smallest dimension [5]. Hence,tensor diagonalization by change of bases (general congruent, or unitary)can only be an approximation.

2 Jacobi sweeping

Finding the absolute maximum of criterion (9) is a complicated problem,since these criteria are trigonometric functions in many variables. However,we shall show in this paper that it is possible to solve several much simplerproblems in cascade instead.

The first step is to decompose each orthogonal matrix into a productof plane rotations, the so-called Givens rotations, which is possible up to amultiplicative diagonal matrix with unit modulus entries. By doing this, weare left with a single unknown to characterize every Givens rotation. If allGivens rotations are kept fixed except in one given plane in each mode, thenthe optimization criterion reduces to a rational function in three variables,

8

Page 11: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

ψ(x, y, z), where we can decide that x, y and z denote the tangent of eachangle. For instance for the first mode:

Q[α] =(

c s−s c

)=

1√1 + x2

(1 x

−x 1

)where c and s denote cos(α) and sin(α), respectively. Of course, this proce-dure is iterative, even if it is not a relaxation in the strict sense because theoptimization is executed over successive elements of a multiplicative group.

In order to be able to carry out such an optimization, it is necessary tosolve the problem in the 2-dimensional case, as for matrices. This sweepingstrategy, well known for matrices, has already been utilized for symmetrictensors [12] [3], giving birth to the so-called CoM (Contrast Maximization)algorithms. Stationary points of such Jacobi sweeping algorithms is ad-dressed in Appendix.

3 Symmetric tensors

We recall in this section the main results that have been obtained so far fororthogonal diagonalization of symmetic tensors; most of them are reportedin [3].

3.1 Maximization of the sum of squares

Invariance property First of all, it can be noticed that the Frobeniusnorm of a tensor does not change under the action of orthogonal transforms.For the sake of simplicity, let’s prove it in the case of real 3rd order tensors,being understood that exactly the same proof can be derived for tensors ofany order (possibly complex, under unitary transforms).

Let Q be an orthogonal matrix, that is,∑

j QijQjk = δik, where δij isnull except when the two indices are equal, in which case δii = 1. Then asymmetric tensor G is transformed into a tensor T , whose entries can bewritten, according to the multi-linearity property, as:

Tijk =∑`mn

Qi`QjmQknG`mn

Next, the calculation of ||T ||2 yields∑ijk

T 2ijk =

∑`mn

∑`′m′n′

∑i

Qi`Qi`′∑

j

QjmQjm′∑

k

QknQkn′G`mnG`′m′n′

9

Page 12: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

which leads eventually to:∑ijk

T 2ijk =

∑`mn

∑`′m′n′

δ``′δmm′δnn′G`mnG`′m′n′

by using the fact that Q is orthogonal. This is nothing but∑

`mnG2`mn =

||G||2.The consequence is that, as for matrices, minimizing the sum of squares

of non diagonal terms is equivalent to the maximization of the sum of squaresof diagonal ones, hence the optimization criterion:

Υ1(Q) =∑

i

|Tiiii|2 (10)

Symmetry property Now consider the 2-dimensional case. First observethat

Q[α− π/2] = Q[α](

0 −11 0

)=

(0 −11 0

)Q[α]

In other words, when changing the rotation angle α into α − π/2, the tan-gent x is transformed into −1/x, so that (T1..1, T2..2) is transformed into(−T2..2, T1..1). Define the optimization criterion ψ2(x) = Υ(Q[α]), withx = tan(α). Then the above symmetry property means:

ψ2(−1/x) = ψ2(x)

This allows not only to reduce the the domain to search for stationary points,but also allows to reduce the degree of the polynomial to root. In fact, itcan be seen that ψ2(x) is a rational function in x of the form

ψ2(x) =ρ(x)

(1 + x2)2

where ρ(x) is a polynomial of degree 2r in x, if symmetric tensors of order rare considered. The equation defining stationary points is given by ω(x) =(1+x2)ρ′(x)−4xρ(x) This polynomial is of degree 2r, and not 2r+1 as onemay think at first glance. For r ∈ 3, 4 such polynomials (of degree 6 or8) are generally not solvable algebraically. But it turns out that they are inthe present case, because of the particular symmetry property that we justpointed out.

10

Page 13: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

Algebraic solution Since ω(x) = 0 must yield the same roots as x2rω(−1/x) =0, the roots xk can be paired:

ω(x) = xrr∏

k=1

(x− xk)(x+1xk

) = xrr∏

k=1

(x2 − ξkx− 1)

if we define ξk = xk−1/xk. So let the new variable ξ = x−1/x. Then, besidespossible roots at the origin, which can be easily checked out, polynomial ω(x)vanishes if and only if polynomial

Ω(ξ) =r∏

k=1

(ξ − ξk)

vanishes. This polynomial is now of degree r only, and can be solved al-gebraically. Once roots ξk have been calculated, roots (xk,−1/xk) can bededuced by rooting the polynomial x2 − ξk − 1 = 0.

For symmetric tensors of order r = 3, one obtains a polynomial of degree2 (and not 3 as expected) [3]:

Ω3(ξ; g) = d2 ξ2 + d1 ξ − 4 d2, (11)

with

a3 = G2111 +G2

222,

a2 = 6 (G122G222 −G111G112),a1 = 9 (G2

122 +G2112) + 6 (G112G222 +G111G122);

d2 = a2/6 = G122G222 −G111G112,

d1 = a1/3− a3.

For symmetric tensors of order r = 4, one obtains the polynomial ofdegree 4 [3] [2]:

Ω4(ξ; g) =4∑

i=0

ci ξi (12)

11

Page 14: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

with

b4 = G21111 +G2

2222,

b3 = −8 (G1111G1112 −G1222G2222),b2 = 4 b4 + t+ 2w,b1 = 4 b3 − 2u v,b0 = 2 (b4 + t+ 2w + 36G2

1122 + 2G1111G2222

+ 32G1112G1222);c4 = −b3/8 = G1111G1112 −G2222G1222,

c3 = 2 b4 − b2/4 = b4 − (t+ 2w)/4,c2 = 3 b3/2− 3 b1/8 = 3u v/4,c1 = b2 − b0/2,c0 = b1/2 = 2 b3 − u v.

where

t = 16 (G21112 +G2

1222),u = G1111 +G2222 − 6G1122,

v = 4 (G1222 −G1112),w = 6G1122 (G1111 +G2222).

As a conclusion, thanks to symmetry properties, we have been able tosolve algebraically the search for absolute extrema of Υr, r = 3, 4.

3.2 Maximization of the trace

Under some assumptions involving the signs of rank-one terms [16], it islegitimate to drop the squares. This can be easily shown under the followingassumptions: (i) tensor T is of even order, (ii) it is exactly diagonalizableby congruent transform, (iii) its diagonal form contains entries having thesame sign ε.

Proof. Without restricting the generality, let’s derive the proof in thecase of a 4th order tensor. Define the optimization criterion

Υ1(U) =∑

i

|Tiiii| (13)

Then, by assumption (ii) and using the multilinearity property, there exista diagonal tensor D and an invertible matrix A such that

Tiiii =∑

j

A4ijDjjjj

12

Page 15: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

Yet, by assumption (iii), Djjjj = ε|Djjjj |. As a consequence, Tiiii =ε∑

j A4ij |Djjjj |, which shows that |Tiiii| = εTiiii. Hence, Υ1(U) = ε traceT ,

which completes the proof. If ε > 0, one can thus maximize the trace inorder to diagonalize T (and minimize its trace if ε < 0).

This result a priori does not hold true for 3rd order tensors. We describenow an algebraic approach allowing to compute the absolute extrema of thetrace of a fourth order tensor.

Consider a symmetric tensor G of dimension 2 × 2 × 2 × 2. After anorthogonal transform defined by the matrix:

Q =1√

1 + x2

(1 x−x 1

)Since the expressions are quite simple in the present case, let’s give themexplicitly. Tensor G is transformed into a tensor T whose diagonal entriesare:[T1111

T2222

]=

1(1 + x2)2

[G1111 + 4xG1112 + 6x2G1122 + 4x3G1222 + x4G2222

G1111x4 − 4x3G1112 + 6x2G1122 − 4xG1222 +G2222

]The criterion to maximize (or minimize) is ψ(x) = T1111+T2222, that we candenote ψ(x) = ρ(x)/(1+x2)2. This time, ρ(x) is of degree 4. Now stationarypoints of ψ(x) are given by the roots of ω(x) = (1+x2)ρ′(x)−4xρ(x), whichis actually of degree 4. Thus its roots can be algebraically computed, byresorting to Ferrari’s algorithm for instance.

This presentation of the solution hides an important property, so thatit is not clear whether the maximization is still feasible algebraically in thecomplex case, where we have two unknowns. It turns out that it is indeedfeasible, as demonstrated in [4]. Let’s then present another solution in thereal case following the same lines.

Notice that cos2 2α = (1 + x4− 2x2)(1 + x2)−2, sin2 2α = 4x2(1 + x2)−2,and that sin 2α cos 2α = 2(x − x3)(1 + x2)−2. By plugging back these ex-pressions in that of ψ(x), one can notice that the criterion may be viewedas a quadratic form:

ψ(x) = vT

(G1111 +G2222 G1222 −G1112

G1222 −G1112 3G1122 + (G1111 +G2222)/2

)v

where vector vT = [cos 2α, sin 2α]. The angle α of the rotation can thus befound by computing the eigenvalue decomposition of this 2× 2 matrix, andthen solving a simple trigonometric equation.

13

Page 16: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

In fact, as in section 3.1, we have rooted two lower degree polynomi-als, instead of rooting ω(x) directly. Bear in mind that the computationalcomplexity is an important issue, since in dimension N the elementary max-imization will be executed N(N − 1)/2 times per sweep, and that there willbe in general several sweeps (experimental results suggest O(

√N) sweeps as

a rule of thumb).

4 Unsymmetric tensors

In this section, we try to derive similar results for tensors of more generalform. The Jacobi sweeping technique clearly holds, and it suffices to ad-dress the 2-dimensional problem. For the sake of convenience and withoutrestricting too much the generality, we shall limit our attention mainly tothe 2× 2× 2 tensors.

Even if, as far as diagonalization is concerned, maximizing the traceis not meaningful for 3rd order unsymmetric tensors (except perhaps fora quite restricted subset), we start with the least difficult problem: theminimization of the trace. For this purpose, consider the criterion:

Υ1(U ,V ,W ) =∑

i

Tiii = traceT (14)

Remind that the goal is to devise an routine that is able to solve the 2-dimensional problem with a computational complexity as small as possible(the routine is called every time a pair is processed, that is, many timeswithin a sweep). Therefore, it is desirable to find the absolute maximum ina non iterative manner.

4.1 Maximization of the trace

By using the multi-linear relation (8), and after some simple manipulations,the trace of a two-dimensional tensor can be written after an orthogonalchange of bases:√

(1 + x2)(1 + y2)(1 + z2)ψ1(x, y, z) = gT ζ (15)

where ζ = [xyz, xy, xz, yz, x, y, z, 1]T and

gT = [G222 −G111, G221 +G112, G212 +G121, G122 +G211

G211 −G122, G121 −G212, G112 −G221, G111 +G222]

14

Page 17: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

Stationary points of criterion ψ1(x, y, z) are thus solutions of the polynomialsystem below:

(g4yz + g6y + g7z + g8)x− (g1yz + g2y + g3z + g5) = 0 (16)(g3xz + g5x+ g7z + g8)y − (g1xz + g2x+ g4z + g6) = 0 (17)(g2xy + g5x+ g6y + g8)z − (g1xy + g3x+ g4y + g7) = 0 (18)

where gi denote the entries of vector g. This system can be alternativelywritten as

gTxζ = 0, gT

y ζ = 0, gTz ζ = 0 (19)

with obvious notation for vectors gx, gy and gz. This system could be solvedby resorting to standard elimination tools [10], and would yield 27 solutionsfor the triplet (x, y, z). This would be computationally intensive and wouldignore the particular structure of the system, namely its sparsity.

4.1.1 Algebraic solution via resultants

The idea consists of remarking that the system (19) is multi-linear in vari-ables x, y, z, so that one variable can be easily eliminated. In other words,gT

xζ can be written as ax(x, y) z + bx(x, y) where

ax(x, y) = (g4xy + g7x− g1y − g3) (20)bx(x, y) = (g6xy + g8x− g2y − g5) (21)

We have similar expressions for gy and gz. For instance, the elimination ofz yields a system of 2 polynomial equations of degree 4 in two unknowns:

ay bx − ax by = 0, az bx − ax bz = 0 (22)

hence having at most 16 distinct solutions. This is a great progress, andthis system can be solved with the help of resultants. To be more concrete,this resultant is a determinant of the form∣∣∣∣∣∣∣∣

A(y) 0 0 D(y)B(y) A(y) D(y) E(y)C(y) B(y) E(y) F (y)

0 C(y) F (y) 0

∣∣∣∣∣∣∣∣ = 0

where A(y), B(y), C(y), D(y) are polynomials of degree 2 in y, such thatay bx−ax by = A(y)x2 +B(y)x+C(y) and az bx−ax bz = D(y)x2 +E(y)x+F (y). Note that the system (22) was particular, since only nine monomials

15

Page 18: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

of degree 4 in two variables were present over the fifteen possible ones.The system was indeed sparse, which makes it possible to have a so smallresultant. The rooting of this resultant of degree 8 yields the values of y. Byplugging them back in (22), one gets two possible solutions for x, for everyvalue of y, and hence 16 solutions for (x, y). The corresponding value of zis eventually obtained by using (18).

The last step is to find one of the two absolute maxima. In order to dothis, we just need to compute Υ1(x, y, z) at every of the 16 solutions, andpick up the one that yields the maximum.

4.1.2 Algebraic solution via heuristic manipulations

An algebraic solution can be obtained by first eliminate one of the variables,just as in the reultant based approach, say x, then the stationary points hasto satisfy the equations

A(z) +B(z)y + C(z)y2 = 0 (23)D(z) + E(z)y + F (z)y2 = 0

where A,B,C,D,E, F are 2nd order degree polynomials in z. This is equiv-alent to the equation

F (z)(A(z) +B(z)y + C(z)y2

)− C(z)

(D(z) + E(z)y + F (z)y2

)= 0 ⇔(

F (z)A(z)− C(z)D(z))

+(F (z)B(z)− C(z)E(z)

)y = 0 ⇔

G(z) +H(z)y = 0

where G,H are 4th order degree polynomials in z.Assume that H(z) 6= 0 then

y = −G(z)H(z)

. (24)

Substitute (24) into (23) we get

A(z) +B(z)(−G(z)H(z)

)+ C(z)

(−G(z)H(z)

)2

= 0 ⇔

A(z)H(z)2 −B(z)G(z)H(z) + C(z)G(z)2 = 0

which is a 10th order degree polynomial in z.In summary to find the maximum we need to solve a 10th order degree

polynomial in z next calculate two 4th order degree polynomials in z tentimes in order to find the corresponding y and finally the corresponding xcan be found from (16). This approach reduces the number of roots to 10.

16

Page 19: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

4.1.3 Eigenvector approach

This approach, sometimes attributed to Macaulay, is based on results bor-rowed from Algebraic Geometry. In order to introduce them, first define theconcept of ideal, in the ring R of polynomials in several variables.

Ideal An ideal I is a subring of R, such that ∀q ∈ I and ∀p ∈ R, thenthe product pq ∈ I.

Consider a system P of polynomial equations in several variables, qn(x) =0, 1 ≤ n ≤ N , where each polynomial qn(x) is of global degree dn. Here xstands for the set of unknowns xnN

n=1. Now denote I the ideal spannedby 〈qn〉Nn=1. The problem is to find the variety associated with this ideal,assuming that it is zero-dimensional (which means that the variety is con-stituted of a finite number of points). We know from a Bezout theorem thatif the number of solutions of P is finite, then it must be at most equal tothe product of the degrees,

∏n dn.

The first key result is that any ideal of R is finitely generated, whichmeans that there exists a family of polynomials 1 ≤ qn, 1 ≤ n ≤ N , suchthat ∀q ∈ I, ∃pn ∈ R, q =

∑n pnqn. In other words, any ideal is entirely

characterized by a finite generating family 〈qi〉ni=1, often called basis of theideal. This is known as Hilbert’s basis theorem. An ideal may have manydifferent bases.

Quotient Then we define the quotient ring modulo I, denoted A = R/I,as the set of equivalence classes as follows: two polynomials p1 and p2 belongto the same class, which we write p1 ≡ p2, iff p1 − p2 ∈ I.

Dual Next, R and hence A are also linear spaces. We can then define thedual space A. For this purpose, let p1 and p2 be two polynomials of thesame class, p; we have by definition p1 − p2 ∈ I. Let ` be a linear form ofA. This form maps p to a number `(p), and hence p1 and p2 to the samenumber. By linearity, `(p1) = `(p2) yields `(p1 − p2) = 0. Thus, we see thatA is the subspace of R of linear forms vanishing on I.

In particular, let 1α be the linear form that maps a polynomial to itsvalue at a point α, p(α). because of what we have just seen, 1α is in A iff αis a common root of P, that is:

1α ∈ A ⇔ qn(α) = 0,∀n ∈ 1, ..N (25)

17

Page 20: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

Multiplication operator For any fixed polynomial a of A, define nowthe multiplication operator Ma, which maps any polynomial p of A to theproduct Ma(p) = pa.

The transpose operator MTa is, by definition of the transposition, map-

ping any linear form ` of A to the linear form MTa` defined by:

∀q ∈ A,MTa`(q) = `(Ma(q)) = `(qa).

Eigenvalue decomposition The key result on which eigenvalue tech-niques are based is the following: eigenvectors of MT

a are the linear forms1α, where α is any root of P. Let’s prove this result. By definition of MT

a ,we have ∀q ∈ R:

MTa1α(q) = 1α(Ma(q)) = 1α(qa) = a(α)1α(q)

As a consequence, MTa1α = a(α)1α, which indeed shows that forms 1α are

eigenvectors of MTa associated with eigenvalues a(α). Note that from (25),

α needs to be a root of P in order for 1α to belong to A.

Basis To summarize, in order to solve polynomial system P, it is sufficientto build a matrix representing operator MT

a in an appropriate basis of A,and then to compute all its eigenvectors. If eigenvalues are distinct (whichwill occur generically), each eigenvector will yield a solution.

This approach is attractive if the construction of this matrix is not tootime consuming. In particular, if we have a series of similar polynomialsystems to solve, in which most of the work can be done once for all sym-bolically, the numerical computations will be limited to a minimum, that is,mainly to the calculation of the eigenvalue decomposition.

One possibility to build the basis is suggested in [17]. It is formed of allmonomials of the form

∏k x

βkk , where 0 ≤ βk ≤ dk−1. Other bases could be

thought of, but independently of the basis chosen, we know that the matrixobtained will be of size at most (and generically equal to)

∏n dn. The

matrix representing the multiplication operator MTa will always have that

size. The conditioning of the EVD calculation, and also the computationaleffort necessary to build the matrix itself, will depend on the choice of thebasis.

Algebraic maximization of the trace Let’s turn now to our practicalapplication, and assume the basis

B = 1, x, x2, y, xy, x2y, z, xz, x2z, yz, xyz, x2yz

18

Page 21: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

The eliminiation of the monomial xyz is indeed possible in two of the threeequations in system (16-18), so that

∏n dn = 12.

4.2 Maximization of the absolute trace

Let the contrast function be

Υ1(U ,V ,W ) =∑

i

|Kiii| = trace|K|. (26)

Assume that r(K) ≥ 2 then

∇Υ1 =2∑

i=1

sign(Kiii)∇Kiii

=2∑

i=1

si∇Ki

=

± (∇K1 +∇K2) , s1 = s2

± (∇K1 −∇K2) , s1 = −s2

The minimizer of (26) can be found by evaluating the stationary points of∇K1 +∇K2 and ∇K1 −∇K2.

When r(K) = 1 then Υ1 = s1∇K1 = 0 ⇔ ∇K1 = 0 which implies that∇Υ1 = ∇Υ1 = 0. Hence max

∑i |Kiii| = max

∑iKiii when r(K) = 1.

4.3 Maximization of the sum of squares

Now consider criterion Υ2(U ,V ,W ) defined in (9). Its numerator is notmulti-linear in the unknowns anymore, neither the equations defining itsstationary points.

Symmetry properties As in section 3.1, we can try to make use of sym-metry properties. As before, observe that

Q[αi − π/2] = Q[αi](

0 −11 0

)=

(0 −11 0

)Q[αi]

In other words, when changing the rotation angles αi in each of the threemodes into αi − π/2, (x, y, z) is transformed into (−1/x,−1/y,−1/z), sothat (T111, T222) is transformed into (−T222, T111). The consequence is that

ψ2(−1/x,−1/y,−1/z) = ψ2(x, y, z)

19

Page 22: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

which allows to reduce the search for stationary points. In fact, if (xo, yo, zo)is a stationary point then so is (−1/xo,−1/yo,−1/zo).

Unfortunately, contrary to the symmetric case, it is not obvious to reducethe degree of the polynomial system by using this symmetry property, evenat even orders. Only the domain of search is divided by half.

Algebraic solution When U,V and W are restricted to be plane rota-tions the functional (9) can be expressed as the rational function

Υ2(x, y, z) =2∑

i,k,l=0

αi,j,kxiyjzk

(1 + x2)(1 + y2)(1 + z2), αi,j,k ∈ R (27)

The stationary points of (27) are the solutions to

∇Υ2(x, y, z) =

∂Υ2(x,y,z)

∂x

∂Υ2(x,y,z)∂y

∂Υ2(x,y,z)∂z

=

p1

p2

p3

=

∑2

i,j,k=0 µijkxiyjzk∑2

i,j,k=0 βijkxiyjzk∑2

i,j,k=0 γijkxiyjzk

= 0

where µijk, βijk, γijk ∈ R. Let I =< p1, p2, p3 >⊆ R[x, y, z] denote the idealgenerated by p1, p2 and p3 and where R[x, y, z] denotes the set of polynomialsin x, y and z with coefficients in R.

Furthermore let G = gisi=1 ⊂ R[x, y, z] be a Grobner basis of I wrt. the

lexographic ordering x > y > z. If Iyz = I ∩ R[y, z] and Iz = I ∩ R[z] thenaccording to the elimination theorem Gyz = G ∩ R[y, z] and Gz = G ∩ R[z]are grobner bases for the ideals Iyz and Iz respectively.

Since I =< p1, p2, p3 >=< g1, . . . , gs > implies that V (p1, p2, p3) =V (g1, . . . , gs), where V (I) = (x, y, z) ∈ R3|f(x, y, z) = 0∀f ∈ I, theproblem to solve p1 = p2 = p3 = 0 when given the Grobner basis G of I canbe solved by backsubstitution since the new system g1 = · · · = gs = 0 canbe put into a triangular form.

Numerical solutions Due to the complexity of optimizing the functional(9) some suboptimal solutions will be considered next. First an algorithmcalled ALS1 will be presented. The ALS1 algorithm will estimate U,V andW in the successive manner:

(U(i),V(i−1),W(i−1) → (U(i),V(i),W(i−1)) → (U(i),V(i),W(i))

where the upper index denotes the iteration number.

20

Page 23: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

We have the equivalence

∂Υ2(x, y, z)∂x

∣∣∣y=y,z=z

= 0 ⇔2∑

i=0

βixi = 0 , βi ∈ R , (28)

and hence finding the stationary points is equivalent to solving a seconddegree polynomial. Due to symmetry of (27) similar expressions are obtainedwhen setting the partial derivative of (27) wrt. y or z equal to zero.

Another possibility is to estimate two out of the three transform matricessimultaneously and this approach will be called ALS2. In this scheme thetransform matrices will be estimated in the successive manner

(U(i−1),V(i),W(i) → (U(i),V(i),W(i+1)) → (U(i+1),V(i+1),W(i+1))

Assume that x is fixed, then the candidates for the pair (y, z) can be foundvia the equivalence

∂Υ2(x, y, z)∂y

∣∣∣x=x

= 0 ⇔2∑

i,j,k=0

αijkxiyjzk =

2∑i=0

ai(z)yi = 0 (29)

∂Υ2(x, y, z)∂z

∣∣∣x=x

= 0 ⇔2∑

i,j,k=0

αijkxiyjzk =

2∑i=0

bi(z)yi = 0 (30)

where ai(z) and bi(z) are second degree polynomials in z. Applying themethod of resultants we end up with the polynomial

Res(Υ2y ,Υ2z , y) =8∑

i=0

γizi

The roots of this eight degree polynomial can be estimated by an EVD of itscorresponding companion matrix. After the roots has been estimated theywill be plugged back into (29) and hence estimating the corresponding y isreduced to finding the roots of a second degree polynomial. Again due tosymmetry of (27) similar expressions are obtained when fixing y or z.

Other ways to find the roots will be reported in a future report.

5 Computer results

To diagonalize a tensor T ∈ RN×N×N with N > 2 a cyclic Jacobi-diagonali-zation approch is taken. In each step a 2 × 2 × 2-tensor, denoted T is

21

Page 24: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

approximately diagonalized and the 2× 2 transform matrices that approxi-mately diagonalizes T are denoted U, V and W, and they are estimated inan iterative manner as described earlier.

T is extracted from T as follows:

T(:, :, 1) =[

Tp,p,p Tp,q,p

Tq,p,p Tq,q,p

], T(:, :, 2) =

[Tp,p,q Tp,q,q

Tq,p,q Tq,q,q

]Let K ∈ RN×N×N be the transformed tensor after each step then

K = T •1U •

2V •

3W

where

U = δi,nδn,j + δi,pδp,j(U1,1 − 1) + δi,qδq,j(U2,2 − 1) + δi,pδq,jU1,2 + δi,qδp,jU2,1

V = δi,nδn,j + δi,pδp,j(V1,1 − 1) + δi,qδq,j(V2,2 − 1) + δi,pδq,jV1,2 + δi,qδp,jV2,1

W = δi,nδn,j + δi,pδp,j(W1,1 − 1) + δi,qδq,j(W2,2 − 1) + δi,pδq,jW1,2 + δi,qδp,jW2,1

It should be noted that this algorithm is monotonic since ||K||2 = ||T||2,K2

ppp + K2qqq ≥ T2

ppp + T2qqq and K2

iii = T2iii , i 6= p, q.

The outline of the cyclic Jacobi diagonalization algorithm can be seenin algorithm 1, where the function bestbU, bV,cW(T ) tries to diagonalize thesubtensor based on either the ALS1 algorithm or the ALS2 algorithm.

To see if the cyclic Jacobi diagonalization algorithm is capable of approxi-mately diagonalizing an arbitrary tensor some simulations will be conducted.A measure of how diagonal a tensor is, is the following

γ =∑N

n=1 T2nnn∑N

i,j,k=1 T2ijk

In the first simulation, a tensor T ∈ R8×8×8 with elements randomlydrawn from a uniform distribution is diagonalized. In the first case a tensorwere Tijk ∈ U(−100, 100) and in the second case a tensor with Tijk ∈U(0, 100) are diagonalized and the results can be seen in figure 1(a) and1(b) respectively.

In the second simulation the objective is to try to diagonalize the tensorT =

∑8r=1 ur vr wr where ui,vi,wi ∈ U(−100, 100) in the first case and

ui,vi,wi ∈ U(0, 100) in the second case. The results can be seen in figure2(a) and 2(b) respectively.

Finally a diagonalization of an orthogonally diagonizable tensor has beencarried out where U,V and W consists of orthonormal bases of the subspace

22

Page 25: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

Algorithm 1 Cyclic Jacobi Diagonalizationfor p = 1 to N − 1 do

for q = p+ 1 to N do

T(:, :, 1) =[

Tp,p,p Tp,q,p

Tq,p,p Tq,q,p

], T(:, :, 2) =

[Tp,p,q Tp,q,q

Tq,p,q Tq,q,q

][U, V,W] = bestbU, bV,cW(T )

U = δi,nδn,j + δi,pδp,j(U1,1 − 1) + δi,qδq,j(U2,2 − 1) + δi,pδq,jU1,2 +δi,qδp,jU2,1

V = δi,nδn,j + δi,pδp,j(V1,1 − 1) + δi,qδq,j(V2,2 − 1) + δi,pδq,jV1,2 +δi,qδp,jV2,1

W = δi,nδn,j + δi,pδp,j(W1,1 − 1) + δi,qδq,j(W2,2 − 1) + δi,pδq,jW1,2 +δi,qδp,jW2,1

T = T •1U •

2V •

3W

X = UXY = VYZ = WZ

end forend for

spanned by the columns of random matrices contained in R8×8 and theirentries are elements in U(−100, 100) in the first case and U(0, 100) in thesecond case. The results can be seen in figure 3(a) and 3(b) respectively.

(a) Tijk ∈ U(−100, 100) (b) Tijk ∈ U(0, 100)

Figure 1: Diagonalization of a random tensor.

23

Page 26: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

(a) ui,vi,wi ∈ U(−100, 100) (b) ui,vi,wi ∈ U(0, 100)

Figure 2: Diagonalization of the random tensor of size 8 × 8 × 8, T =∑8r=1 ur vr wr.

(a) Real valued orthogonally diagonizabletensor.

(b) Nonnegative valued orthogonally diago-nizable tensor.

Figure 3: Diagonalization of an orthogonally diagonizable tensor.

Another –equivalent– diagonality measure is the following

off(T) =

∑Ni,j,k=1 T2

ijk −∑N

n=1 T2nnn∑N

i,j,k=1 T2ijk

.

Furthermore the performance of the algorithms will be measured on the

24

Page 27: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

complexity of the algorithms which is measured by the number of multipli-cations used by the given algorithm.

To compare the different 2×2×2-tensor diagonalization algorithms, theperformance will be measured on tensors in R5×5×5 3 T = D + E, whereD is an orthogonally diagonalizable tensor and E ∈ R5×5×5 is a randomtensor where Eijk ∈ ρU(−100, 100), and where the real positive parameterρ controls the noise level. D is generated as D =

∑5r=1 ur vr wr, where

U,V and W consists of the orthonormal bases of the subspaces spanned bythe columns of random matrices contained in R5×5 and which entries arerandomly drawn elements in U(−100, 100).

In the first simulation the algorithms will sweep 5 times. The perfor-mance of the algorithm as a function of ‖E‖2

‖D‖2 can be seen at figure 4 wherethe plotted curves are the average values over 5 runs.

ALS1 refers to the sum of squares tensor diagonalization algorithm whereonly one plane rotation are estimated at the time whereas the ALS2res andALS2heu refers to the sum of squares ALS2 tensor diagonalization algo-rithms where the resultant and the heuristic solution has been used re-spectively and in both case two plane rotations are estimated at the time.Similary TraceRes and TraceHeu refers to the absolute trace tensor diag-onalization algorithms where the resultant and the heuristic solution hasbeen used respectively.

Figure 4: The average diagonalization of random tensors T = D + E as afunction of ‖E‖2

‖D‖2 .

The computational complexity of various algorithms will be studied in a

25

Page 28: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

forthcoming report, as well as stationary values of the Jacobi CoM algorithmin the unsymmetric case (the symmetric case has already reported in [3]).

References

[1] J. D. CARROLL, J. J. CHANG, “Analysis of individual differencesin multidimensional scaling via n-way generalization of Eckart-Youngdecomposition”, Psychometrika, vol. 35, no. 3, pp. 283–319, Sept. 1970.

[2] P. COMON, “Independent Component Analysis, a new concept ?”, Sig-nal Processing, Elsevier, vol. 36, no. 3, pp. 287–314, Apr. 1994, Specialissue on Higher-Order Statistics.

[3] P. COMON, “Tensor diagonalization, a useful tool in signal process-ing”, in IFAC-SYSID, 10th IFAC Symposium on System Identification,M. Blanke, T. Soderstrom, Eds., Copenhagen, Denmark, July 4-6 1994,vol. 1, pp. 77–82, invited session.

[4] P. COMON, “From source separation to blind equalization, contrast-based approaches”, in Int. Conf. on Image and Signal Processing(ICISP’01), Agadir, Morocco, May 3-5, 2001, invited plenary.

[5] P. COMON, “Tensor decompositions”, in Mathematics in Signal Pro-cessing V, J. G. McWhirter, I. K. Proudler, Eds., pp. 1–24. ClarendonPress, Oxford, UK, 2002.

[6] P. COMON, G. GOLUB, L-H. LIM, B. MOURRAIN, “Symmetrictensors and symmetric tensor rank”, SIAM J. matrix Ana. Appl., 2007,to appear.

[7] P. COMON, B. MOURRAIN, “Decomposition of quantics in sums ofpowers of linear forms”, Signal Processing, Elsevier, vol. 53, no. 2, pp.93–107, Sept. 1996, special issue on High-Order Statistics.

[8] P. COMON, B. MOURRAIN, L-H. LIM, G. GOLUB, “Genericityand rank deficiency of high order symmetric tensors”, in ICASSP’06,Toulouse, May 14-19 2006.

[9] P. COMON, J. ten BERGE, “Generic and typical ranks of three-way arrays”, Research Report ISRN I3S/RR-2006-29-FR, I3S, Sophia-Antipolis, France, Sept. 4 2006, submitted for publication.

26

Page 29: LABORATOIRE - unice.frmh/RR/2007/RR-07.06-P.COMON.pdf · orthogonal change of bases. In particular, our contribution concerns non symmetric tensors, defined on the tensor product

[10] D. COX, J. LITTLE, D. O’SHEA, Ideals, Varieties, and Algorithms:An Introduction to Computational Algebraic Geometry and Commuta-tive Algebra, Undergraduate Texts in Mathematics. Springer Verlag,New York, 1992, 2nd ed. in 1996.

[11] L. de LATHAUWER, B. de MOOR, J. VANDEWALLE, “A multilinearsingular value decomposition”, SIAM Jour. Matrix Ana. Appl., vol. 21,no. 4, pp. 1253–1278, 2000.

[12] L. de LATHAUWER, B. de MOOR, J. VANDEWALLE, “IndependentComponent Analysis and (simultaneous) third-order tensor diagonal-ization”, IEEE Trans. Sig. Proc., pp. 2262–2271, Oct. 2001.

[13] R. A. HARSHMAN, “Foundations of the Parafac procedure: Mod-els and conditions for an explanatory multimodal factor analy-sis”, UCLA Working Papers in Phonetics, vol. 16, pp. 1–84, 1970,http://publish.uwo.ca/ harshman.

[14] J. L. LACOUME, P. O. AMBLARD, P. COMON, Statistiquesd’ordre superieur pour le traitement du signal, Collection Sci-ences de l’Ingenieur. Masson, 1997, freely downloadable fromhttp://www.i3s.unice.fr/~comon/livreSOS.html.

[15] P. McCULLAGH, Tensor Methods in Statistics, Monographs on Statis-tics and Applied Probability. Chapman and Hall, 1987.

[16] E. MOREAU, O. MACCHI, “High order contrasts for self-adaptivesource separation”, Int. J. of Adaptive Control and Signal Processing,vol. 10, no. 1, pp. 19–46, Jan. 1996.

[17] B. MOURRAIN, P. TREBUCHET, “Solving projective complete in-tersection faster”, in Proc. Intern. Symp. on Symbolic and AlgebraicComputation, C. Traverso, Ed. 2000, pp. 231–238, New York, ACMPress.

[18] C. L. NIKIAS, A. P. PETROPULU, Higher-Order Spectra Analysis,Signal Processing Series. Prentice-Hall, Englewood Cliffs, 1993.

[19] N. D. SIDIROPOULOS, R. BRO, G. B. GIANNAKIS, “Parallel factoranalysis in sensor array processing”, IEEE Trans. Sig. Proc., vol. 48,no. 8, pp. 2377–2388, Aug. 2000.

[20] A. SMILDE, R. BRO, P. GELADI, Multi-Way Analysis, Wiley, 2004.

27