By

Zhenyu LIAO

at

CentraleSupelec

Based on le lecture on RMT of

Prof。Romain Couillet

@19/Feb/2016

Part 1: Motivation

If we have

x_1,x_2,\cdots,x_n\in \mathcal{R}^p

with

x_i

iid and

\sim \mathcal{N}(0,C_p)

From the law of large number(

LLN

), the sample covariance matrix(

SCM

) is:

\hat{C}_p = \frac{1}{n}{\sum_{i=1}^{n}{x_ix_i^*}}\rightarrow C_p

when

n\rightarrow \infty

we have a convergence almost sure(

a。s

)。

But in fact, correctly speaking, the condition is not really

n\rightarrow\infty

, but should be

n>>p

Which means

n

should be much larger than

p

!(in fact we can see later this “much larger” means at least 100 times!)

We can write this

SCM

in a different way, with

X = [x_1,x_2,\cdots,x_n]

, we have

\hat{C}_p = \frac{1}{n}{[XX^*]}

So something interesting:

If

C_p = I_p

(identity), and

p>n

, we will have:

[\hat{C}_p]_{i,j}\rightarrow 1,0

In fact, it will be

1

if

i=j

and

0

else。

But we can also see at the same time that, as the matrix

\hat{C}_p

is of size

p\times p

, but only contains at most

n

independant vector

x_i

, we have:

rank(\hat{C}_p)\le n<p

Which indicates that

\hat{C}_p

will have at least

p-n

zeros eigenvalues

Which means that if we evaluate

\|\hat{C}_p-C_P\|

, the result

may not go to zero

!(In fact it‘s due to the slow convergence of

1\over\sqrt{n}

)。

Conclusion, our original statistics break down when the condition of

n>>p

is not satisfied。 Or, in other words, when

p

goes up, we can not use our traditional strategy to approximate some matrix, e。g。the convariance matrix。

But with the RMT, we know what it will be like when in the certain regime that

p,n\rightarrow \infty,\frac pn \rightarrow c\in (0,+\infty)

, that is the Marchenko-Pastur law as follow。

Part 2: The Marchenko-Pastur law and the Stieltjes Transform

The MP law

DEF

: The empirical spectre distribution (

ESD

\mu_A

of a Hermitian matrix

A\in R^{p\times p}

is defined as the following measure:

\mu_A = \frac 1p \sum_{i=1}^{p}\delta_{\lambda(A)}

Notice that for

a\le b \in R

, we have:

\mu([a,b])=\int_{a}^{b}\frac 1p \sum_{i=1}^{p}\delta_{\lambda_i(A)}(x)dx = \frac 1p \#\{\lambda(A)\in [a,b]\}

(and it is kind of a probability distribution, isn’t it?)

THEO

: (the Marchenko-Pastur disctribution for large dimension covariance matrix)

let

X = (X_{i,j})_{1\le i\le p,1\le j \le n}

have iid entries with zeros means and 1 variance, then:

\mu_{\frac 1n XX^*}\rightarrow \mu_C

as

p,n\rightarrow \infty,\frac pn \rightarrow c\in (0,+\infty)

, we have:

\mu(\{0\}) = (1-\frac 1c)^+

\mu(dx) = f(x) dx

with

f(x) = \frac{1}{2\pi cx}\sqrt{[x-[1-\sqrt{c})^2]^+\cdot [(1+\sqrt{c})^2-x]^+}

, for all

\ne0

Clearly we can see that this result is bounded between

[(1-\sqrt{c})^2,(1+\sqrt{c})^2]

and will tend to 0 when

c\rightarrow 0

,with means

n>>p

#####Historical recall:

First RMT result dates from 1953:

let

X

in

R^{p\times p}

be Hermitian, iid with zeros means and one var。 Then:

\mu_{\frac{1}{\sqrt{p}}X}\rightarrow \mu_{s.c}

where

=\mu_{s.c} = \frac{1}{2\pi}\sqrt{(4-x^2)^+}

Method

: by calculating its different moments。(more details later)

Remark

: This convergence is a little complicated: it is a convergence in law for the measure

\mu

, but a

a。s

for the variable

X

####The Stieltjes Transform

DEF

: let

\mu

be a finite measure with

Supp(\mu)\in R

。 Then the

Stieltjes Transform

m_\mu

of

\mu

is defined as :

m_\mu(z) = \int \frac{\mu(dt)}{t-z} \quad \text{for all }z \notin Supp(\mu)

Important remark

: if

\mu= \mu_{A}

(see its definition in the previous section), then

m_{\mu_A}(z) = \int \frac{1}{t-z}\frac 1p \sum_{i=1}^{p}\delta_{\lambda_i(A)}(t)dt \\= \frac 1p \sum_{i=1}^{p}\frac{1}{\lambda_i(A) - z} = \frac 1p tr(\Lambda - zI_p)^{-1}

(only by changing integer and sum)

So we have:

\frac 1p tr[(\Lambda - zI_p)^{-1}UU^*] = \frac 1p tr[U^*(\Lambda - zI_p)^{-1}U] \\= \frac 1p tr(U^*\Lambda U - zI_p)^{-1}, \quad \forall \ U\ \text{unitary}

with

U

unitaire

UU^* = U^*U = I_d

and ==a trick here==

tr(AB) = tr(BA)

and particularly if we take

U

eigenvector of

A

, we will have

=\frac 1p tr(A - zI_p)^{-1}

, we will call this important matrix

(A - zI_p)

the

resolvent

of

A

Idea

: bring the computing of

eigenvalues

to the calcul of

inverse

, with which we have much more known results。

PROP

m_\mu(z)

the ST of a real-supported finite measure of

\mu

the for all

a,b

continuity point of

\mu

, we have:

\mu([a,b]) = \lim_{y\rightarrow 0}\frac 1\pi \int_{a}^{b}\Im[m_\mu(x+iy)dx]

Proof

: By a little trick of:

\frac{d}{dx}{\arctan(x)} = \frac{1}{x^2+1}

change of variable

\lim_{y\to 0}

Attention

! Use with care because

m_\mu(z)

is not defined

\in Supp(\mu)

but this limit can exist anyway!

Remark

: “discret case”?:

f(x) = \lim_{y\rightarrow 0}\frac 1\pi \Im[m_\mu(x+iy)]

a direct result is:

\mu\{x\} = \lim_{y\rightarrow 0}-i\cdot y\cdot m_\mu(x+iy)

Proof

: use definition directly with

DCT

(dominate convergence theorem) the fact that

\lim_{y\to 0}\frac{-iy}{(t-x)-iy}=\delta_{t-x}

####Proof of the MP theorem

Take

X = (X_{i,j})_{1\le i\le p,1\le j \le n}

have iid centres with zeros means and 1 variance, then:

\mu_{\frac 1n XX^*} = \frac 1p \sum_{i=1}^{p}{\delta_{\lambda_i(\frac 1n XX^*)}}

and

m_{\mu_{\frac 1n XX^*}}(z) = \frac 1p tr(\frac 1n XX^*-zI_p)^{-1}\\\text{with}\ z \in C^+ = \{z|\Im(z)>0\}

Let us start with the first entry of the matrix,

[(\frac 1n XX^*-zI_p)^{-1}]_{11}

, let

X = [x^*, \tilde{X}]^T

with

x^* \in \mathcal{R}^{1\times n}

and

\tilde{X}\in \mathcal{R}^{(p-1)\times n}

, so we can decompose our matrix in the following form:

=\begin{bmatrix}\frac 1n x^*x-z & \frac 1n x^*\tilde{X}^* \\ \frac 1n \tilde{X}x & \frac 1n \tilde{X}\tilde{X}^*-zI_{p-1}\end{bmatrix}

with the known result of matrix inverse by block:

\begin{bmatrix}A & B\\ C& D\end{bmatrix}^{-1} = \begin{bmatrix}[A-BD^{-1}C]^{-1} & \cdots \\\cdots & \cdots \end{bmatrix}

So we have our first entry:

= [-z-\frac znx^*(\frac 1n\tilde{X}^*\tilde{X}-zI_n)^{-1}x]^{-1} = \frac{1}{-z-\frac znx^*(\frac 1n\tilde{X}^*\tilde{X}-zI_n)^{-1}x}

with:

==a useful trick==

A(I_m+BA)^{-1} = (I_n+AB)^{-1}A

make up to simplify the huge inverse term

==An interesting trick!== With the matrix we work on

A

of bounded spectral norm (

\max|\lambda_{i}(A)|<K

) with

K

independent of

n

, and always

x

the random one, we can estimate

\frac 1n x^*Ax

by

\frac 1n tr(A)

with a convergence

a。s

。 This idea comes from the following idea:

E[x^*Ax] = E[tr(x^*Ax)] = E[tr(xx^*A)] = tr[E(xx^*)A] = tr(A)

(known as ==trace lemma==)

behind this formula: with a

rank 1

perturbation, and after the mean by

\frac 1n

, the behaviour of matrix(e。x。 trace) will not change。

With the ==lemma of rank 1 perturbation==, we can say that:

\frac 1n tr(X^*X)^{-1} \simeq \frac 1n tr(\tilde{X}^*\tilde{X})^{-1}

Finally, we have just done for the first entry, but it works the same way for others, so we have:

m_{\mu_{\frac 1n XX^*}}(z) = \frac 1p tr(\frac 1n XX^*-zI_p)^{-1}\simeq \frac{1}{-z-z\frac 1n tr(\frac 1n X^*X-zI_n)^{-1}}

Then with the funny fact of trace of a matrix, we are able to ==connect to the eigenvalues matrix== by:(always in the regime of

p\gt n

tr(\frac 1n XX^*-zI_p)^{-1} = tr(\Lambda-zI_p)^{-1} \\= \sum_{i=1}^{p}\frac{1}{\lambda_i(\frac 1n XX^*)-z} =\sum_{i=1}^{n}\frac{1}{\lambda_i(\frac 1n X^*X)-z} + (p-n)\frac{1}{0-z}

so we have:

m_\mu = \frac{1}{-z-zcm_\mu-c +1}

which means

m_\mu

is the ==solution of equation==

-zcm^2(z)+(1-c-z)m(z)-1=0

, obviously there are two possible solutions, and we should choose one of finite norm。 And with

\lim_{y \to 0}

we obtain the MP distribution!

QED

@26/Feb/2016

###Part 3: The sample covariance matrix

THEO

: let

X\in \mathcal{R}^{p\times n}

, iid

(0,1)

entries and

R\in \mathcal{R}^{p\times p}

determinist, non-negative definit Hermitian so that:

\nu_p \overset{\Delta}{=} \frac 1p \sum_{i=1}^{p}{\delta_{\lambda_i(R)}\rightarrow \nu}

Then as

p,n\to \infty,\frac pn\to c\in (0,\infty)

, we have:

\mu_p \overset{\Delta}{=} \frac 1n \sum_{i=1}^{n}{\delta_{\lambda_i(\frac 1n X^HRX)}}\to \mu

where

\mu

is a probability measure with ST

m_\mu(z)

unique solution (

C^+

for

z \in C^+ \overset{\Delta}{=} \{z\in C,\Im(z)>0\}

m_\mu(z) = (-z+c\int\frac{t\cdot\nu(dt)}{1+t\cdot m_\mu(z)})^{-1}

(a relation implicite, maybe NO analytical solution)

E.X.

= \begin{bmatrix}I_{\frac p3} & 0 & 0 \\0 & 3I_{\frac p3} & 0 \\0 & 0 & 7I_{\frac p3}\end{bmatrix}

So we have

\nu = \frac 13(\delta_1+\delta_3+\delta_7)

, which means 3 peaks at eigenvalues = 1,3 and 7。

The problem is how to plot

\mu

, which is the asymptotic value for

\mu_p

Idea is that:

compute

\frac 1\pi\Im[m_\mu(x+i\epsilon)]

with

\epsilon

rather small, say

10^{-6}

, by taking

m_\mu(x+i\epsilon) = \lim_{t\to \infty}m^{(t)}_\mu(x+i\epsilon)

, with

(t)

the number of iteration, obviously here we are using an iterative way to solve this problem, or in a more formal way:

m^{(t)}_\mu(z) = (-z+c\int\frac{t\cdot\nu(dt)}{1+t\cdot m^{(t-1)}_\mu(z)})^{-1}

and initilize with

m^{(0)}_\mu(z) = 0

approximated density of

\mu

at mass

x

is given by

\frac 1\pi\Im[m_\mu(x+i\epsilon)]

for every

x \in R

Remark

: relation to SCM(sample cov matrix): let

y_1,\cdots,y_n

be indep vectors in

R^p

, so that

y_i = R^{\frac 12}x_i

, with

x_i\in R^p

with iid (0,1) entries。

So that we have:

E[y_i] = 0

and

E[y_iy^*_i] = R

。 Then the SCM estimator of

R

is given by:

\hat{R} \overset{\Delta}{=} \frac 1n \sum_{i=1}^{n}{y_iy^*_i} = \frac 1n R^{\frac 12}{XX^*}R^{\frac 12}

with

X = [x_1,x_2,\cdots,x_p]

(The idea behind is to use the eigenvalues of

\hat{R}

to esimtate/say sth about

R

and we have:

\tilde{\mu_p} = \frac 1p \sum_{i=1}^{p}{\delta_{\lambda_i(\frac 1n R^{\frac 12}XX^*R^{\frac 12})}} = \frac np \mu_p + \frac{p-n}{p}\delta_0

so with a measure

m

, we will get:

m_{\tilde{\mu_p}}(z) = \frac np m_{\mu_p}(z) + \frac{n-p}{p}\frac 1z

so with our notation of

c = \frac pn

, we can rewrite in the following form:

\tilde{\mu_p} \to \tilde\mu = \frac 1c \mu + (1-\frac 1c)\delta_0

m_{\tilde{\mu_p}}(z) \to m_{\tilde\mu}(z) = \frac 1c m_{\mu}(z) + (\frac 1c -1)\frac 1z

Application

: in wireless communication: in a MIMO system with p-Tx and n-Rx, when trying to calculate the ergodic capacity。

###Part 4: More elaborate model and ==deterministic equivalents==

THEO

: Let

R\in \mathcal{R}^{p\times p}

and

T\in \mathcal{R}^{n\times n}

, non-negative definite

X\in \mathcal{R}^{p\times n}

, iid (0,1) entries such that:

\begin{cases}\lim\sup_{p}\|R\|<\infty \\\lim\sup_{n}\|T\| \lt \infty\end{cases}

(with

\|\|

spectral norm here)

Then as

n,p\to \infty

, with

0<lim\inf\frac pn \le \lim\sup \frac pn <\infty

\mu_p\overset{\Delta}{=}\frac 1p \sum_{i = 0}^{p}\delta_{\lambda_i (\frac 1n R^{\frac 12}XTX^*R^{\frac 12})}

is such that

\mu_p - \mathcal{\bar{\mu}}_p \overset{a.s}{\rightarrow}0

where

\mathcal{\bar{\mu}_p}

is a (determinist) proba measure, with its ST

m_{\mathcal{\bar{\mu}}_p} = \frac 1p tr(-z[\tilde{e_p}(z)R+I_p])^{-1}

with

\tilde{e_p}(z),{e_p}(z)

the unique solution (pair) in

\mathcal{C}^+

for

z

\begin{cases}\tilde{e_p}(z) = \frac 1n trT(-z[{e_p}(z)T+I_n])^{-1}\\{e_p}(z) = \frac 1n trR(-z[\tilde{e_p}(z)R+I_p])^{-1}\end{cases}

which is also called the ==fundamental equations==

Remark

the value of

m_{\mu_p}(z)

will never converge, but its difference between a determinist

function

m_{\bar{\mu}_p(z)}

will trend to 0

it is the same theo, but more general, and with an equation whose solution not explicite

Proof

: The Bai-Silverstein approche: idea of guessing the solution iteratively

A constructive iterative finding of the solution: with

m_{\mu_p}(z) = \frac 1p tr(-zI_p +\frac 1n R^{\frac 12}XTX^*R^{\frac 12})^{-1}

we start from:

\frac 1p tr(-zI_p +\frac 1n R^{\frac 12}XTX^*R^{\frac 12})^{-1} - \frac 1p trD^{-1} \\=\frac 1p tr(-zI_P+\frac 1n R^{\frac 12}XTX^*R^{\frac 12})^{-1}(D+zI_p-\frac 1n R^{\frac 12}XTX^*R^{\frac 12})D^{-1} \\= \frac 1p tr(-zI_p+B)^{-1}(D+zI_p-B)D^{-1}

with

B = \frac 1n R^{\frac 12}XTX^*R^{\frac 12}

Remark

: ==resolvent identity==:

A^{-1}-B^{-1} = -A^{-1}(A-B)B^{-1}

Proof

: Multiplied by A from the left and B from the right

So with the design of matrix D we can solve this problem iteratively。

First improvement of D:

D = -zI_p+ R^{\frac 12}FR^{\frac 12}

, so that we have:

=\frac 1p tr(-zI_p+B)^{-1}R^{\frac 12}(F-\frac 1n XTX^*)R^{\frac 12}D^{-1}

Since

X

is something with random iid entries (0,1), the value of

XTX^*

trends to be something

uniform

。 So let us try:

F = f(z)I_p

, we get:

= \frac 1p tr(-zI_p+B)^{-1}R^{\frac 12}(f(z)I_p-\frac 1n XTX^*)R^{\frac 12}D^{-1}\\= f(z)\frac 1p tr[(-zI_p+B)^{-1}RD^{-1}]-\frac 1p tr[(-zI_p+B)^{-1}R^{\frac 12}\frac 1n XTX^* R^{\frac 12}D^{-1}]

let us assume

T

diagonal

, this condition is necessary for our proof, but not necessary with more advanced technics。 Then we can decompose

\frac 1n XTX^* = \frac 1n\sum_{i=1}^{n}t_i x_ix_i^*

So we have:

\frac 1p tr(-zI_p +B)^{-1} - \frac 1p trD^{-1}\\ = f(z)\frac 1p tr[(-zI_p+B)^{-1}RD^{-1}]-\frac 1n \sum_{i=1}^{n}t_i \frac 1p tr[(-zI_p+B)^{-1}R^{\frac 12}x_ix_i^* R^{\frac 12}D^{-1}]\\ =f(z)\frac 1p tr[(-zI_p+B)^{-1}RD^{-1}]-\frac 1n \sum_{i=1}^{n}t_i \frac 1p x_i^*R^{\frac 12}D^{-1}(-zI_sp+B)^{-1}R^{\frac 12}x_i

Here with

B = \frac 1n R^{\frac 12}XTX^*R^{\frac 12}=\frac 1n \sum_{j=1}^{n}t_jR^{\frac 12}x_jx_j^*R^{\frac 12}

which depend on

x_i

but only one term, and all other with different indices!

So we can of course separate this long term into two parts:

\sum_{i=j}

and

\sum_{i\neq j}

Lemma

: ==useful trick==

(A+tvv^*)^{-1}v = \frac{A^{-1}v}{1+tv^*A^{-1}v}

Proof

: multiplied from left the matrix

(A+tvv^*)

So with:

\begin{cases}A = zI_p + \frac 1n\sum_{j\neq i}t_jR^{\frac 12}x_jx_j^*R^{\frac 12}\\ t = \frac 1n t_{j=i}\\ v = R^{\frac 12}x_{j=i}\\ v^* = x_{j=i}^*R^{\frac 12} \end{cases}

we get:

= f(z)\frac 1p tr[(-zI_p+B)^{-1}RD^{-1}]-\frac 1n \sum_{i=1}^{n}t_i \frac 1p tr[x_i^* R^{\frac 12 }D^{-1}(-zI_p+B)^{-1}R^{\frac 12}x_i]\\ =f(z)\frac 1p tr[(-zI_p+B)^{-1}RD^{-1}]-\frac 1n \sum_{i=1}^{n}t_i \frac 1p x_i^* R^{\frac 12 }D^{-1}(-zI_p+B)^{-1}R^{\frac 12}x_i\\ =f(z)\frac 1p tr[(-zI_p+B)^{-1}RD^{-1}]-\frac 1n \sum_{i=1}^{n}t_i \frac 1p \frac{x_{j=i}^*R^{\frac 12}D^{-1}(zI_p + \frac 1n\sum_{j\neq i}t_jR^{\frac 12}x_jx_j^*R^{\frac 12})^{-1}R^{\frac 12}x_{j=i}}{1+\frac 1n t_{j=i}x_{j=i}^*R^{\frac 12}A^{-1}R^{\frac 12}x_{j=i}}

with

tr(AB) = tr(BA)

and trace of a number will be itself。

And with this

x_i^*Mx_i

form on both numerator and denominator, we are thinking about using ==trace lemma== to give rid of the random part with M a bounded norm。 So we will have:

\approx f(z)\frac 1p tr[(-zI_p+B)^{-1}RD^{-1}]-\frac 1n \sum_{i=1}^{n}t_i \frac 1p \frac{tr[(-zI_p + \frac 1n\sum_{j\neq i}t_jR^{\frac 12}x_jx_j^*R^{\frac 12})^{-1}RD^{-1}]}{1+\frac 1n t_{j=i}tr[R(-zI_p + \frac 1n\sum_{j\neq i}t_jR^{\frac 12}x_jx_j^*R^{\frac 12})^{-1}]}

So once more in this form we are thinking of the ==rank 1 pertubation lemma==:

\frac 1n tr[A(B+vv^*-zI)^{-1}] - \frac 1p tr[A(B-zI)^{-1}]\le \frac{\|A\|}{n\Im[z]}

, so we have:

\approx f(z)\frac 1p tr[(-zI_p+B)^{-1}RD^{-1}]-\frac 1n \sum_{i=1}^{n}t_i \frac 1p \frac{tr[(-zI_p + B)^{-1}RD^{-1}]}{1+\frac 1n t_{j=i}tr[R(-zI_p + B)^{-1}]}

, so we are happy, because we get the same factor

\frac 1p tr[(-zI_p+B)^{-1}RD^{-1}]

at left and right。

So finally, we can say that

f(x)\approx \frac 1n \sum_{i=1}^{n}t_i \frac{1}{1+\frac 1n t_{j=i}tr[R(-zI_p + B)^{-1}]}

, and with

\tilde{f}(z) = \frac 1n tr[R(-zI_p + B)^{-1}]

, we get:

f(z) = \frac 1n tr[T(I_n+T\tilde{f}(z))^{-1}]

Conclusion

\frac 1p tr(-zI_p+B)^{-1} - \frac 1p tr(-zI_p+Rf(z))^{-1}\approx 0

with

D = -zI_p+Rf(z)

and

\begin{cases}f(z) = \frac 1n tr[T(I_n+T\tilde{f}(z))^{-1}]\\ \tilde{f}(z) = \frac 1n tr[R(-zI_p + B)^{-1}]\end{cases}

, but obviously this loop is not closed!

So by doing the same for

\frac 1p tr[R(-zI_p + B)^{-1}]

, we can get:

\frac 1p tr[R(-zI_p + B)^{-1}]\approx \frac 1p tr[R(-zI_p + Rf(z))^{-1}]

And we are happy to see this loop closed。

Final conclusion

m_{\mu_p}(z)\approx \frac 1p tr(-zI_p+f(z)R)^{-1}

where

\begin{cases}f(z) = \frac 1n tr[T(I_n+T\tilde{f}(z))^{-1}]\\ \tilde{f}(z)= \frac 1n tr[R(-zI_p + Rf(z))^{-1}]\end{cases}

and simply by replacing

\begin{cases}\tilde{e}_p(z) = -\frac{f(z)}{z}\\ e_p(z) = \tilde{f}(z)\end{cases}

we will get exactly the same thing as in our theorem。

@1/Mars/2016

With

X\in\mathcal{R}^{p\times n}

and

X_{i,j}\sim (0,1)

iid entries。 We can come up with:

white noise model

\Rightarrow

MP law:

\frac 1n XX^*

: application in signal processing: ddetection

sample covariance marix:

\frac 1n R^{\frac 12}XX^*R^{\frac 12}

, or

\frac 1n \sum_{i=1}^{n}y_iy_i^*

with

y_i = R^{\frac 12}x_i

: Application in signal processing, subspace methods, e。x。

MUSIC

time dependent SCM/wireless communication(MIMO channel model):

H = \frac{1}{\sqrt{n}}R^{\frac 12}XT^{\frac 12}

, with R the Rx cov matrix and T the Tx matrix。

Part 5: Application to wireless communication

MIMO ergodic capacity

As shown in the following figure, a MIMO system is naturally connected to RMT。

The capacity is given by

C(\sigma^2) = \sup_{P\ge 0, Tr(P)\le 1,P\in \mathcal{R}^{n\times n}} \mathcal{E}_{\alpha}[\log\det(I_p+\frac{1}{\sigma^2}HPH^*)]

So in order to solve this problem, a recap our ==fundamental equation== is as follow:

With

\mathcal{\bar{\mu}_p}

the determinist equivalent for

m_{\mu_p}(z) = \frac 1p tr(-zI_p +\frac 1n R^{\frac 12}XTX^*R^{\frac 12})^{-1}

, with its ST

m_{\mathcal{\bar{\mu}}_p} = \frac 1p tr(-z[\tilde{e_p}(z)R+I_p])^{-1}

with

\tilde{e_p}(z),{e_p}(z)

the unique solution (pair) in

\mathcal{C}^+

for

z

\begin{cases}\tilde{e_p}(z) = \frac 1n trT(-z[{e_p}(z)T+I_n])^{-1}\\{e_p}(z) = \frac 1n trR(-z[\tilde{e_p}(z)R+I_p])^{-1}\end{cases}

We want to evaluate

\int\log(1+\frac{1}{\sigma^2}t)\mu_{\frac 1n R^{\frac 12}XTX^*R^{\frac 12}}(dt)

, with the lemma (==Shannon transform== of

\mu

):

Lemma

: for

\mu

some probability mesure with support in

\mathcal{R}^+

, we have:

\int\log\left(1+\frac 1x t\right)\mu(dt) = \int_{t=x}^{\infty}\left(\frac 1t - m_{\mu}(-t)\right)dt

Proof

change variable with

y = \frac 1x

exchange integrals

try to make up the

m_{\mu}

term

pay attention when change variables within integrals

THEO

: For

H = \frac{1}{\sqrt{n}}R^{\frac 12}XT^{\frac 12}

, we have :

C(\sigma^2,I_p)-\bar{C}(\sigma^2,I_p) \overset{a.s}{\rightarrow}0

where

\bar{C}(\sigma^2,I_p)=\frac 1p \log\det(I_p + \tilde{e}_p(-\sigma^2)R) + \frac 1p \log\det(I_n + e_p(-\sigma^2)T) - \frac np \sigma^2 e_p(-\sigma^2)\tilde{e}_p(-\sigma^2)

Proof

differentiation of

\bar{C}(\sigma^2,I_p)

the chain rule, e。g。

\frac{d\bar{c}}{d\sigma^2} = \frac{d\bar{c}}{de_p}\cdot\frac{de_p}{d\sigma^2} + \frac{d\bar{c}}{d\tilde{e}_p}\cdot\frac{d\tilde{e}_p}{d\sigma^2}+ \frac{d\bar{c}}{d\sigma^2}

Important to notice that:

\frac{d}{de_p}\frac 1p \log\det(I+e_p T) = \frac {d}{de_p}\frac 1p \sum_{i = 1}^{n}\log[1+e_p\lambda_i(T)]=\frac 1p tr[T(I+e_pT)^{-1}]

, which makes latter the

\frac{\partial\bar{c}}{\partial e_p}

term vanished, the same holds for the term of

\frac{\partial\bar{c}}{\partial \tilde{e}_p}

so finally there is only

\frac{\partial\bar{c}}{\partial \sigma^2}

, which is equal to

m_{\bar{\mu}}(-\sigma^2)-\frac{1}{\sigma^2}

(with a little trick using ==a constant getting in the trace== and make-up elimination)

starting to work on the original

C(\sigma^2)

: also with differentiation while using the above lemma, we get:

\frac{d}{d\sigma^2} \int \log\left(1+\frac{t}{\sigma^2}\right)\bar{\mu}(dt) = \frac{d}{d\sigma^2}\int_{t=\sigma^2}^{\infty}\left(\frac 1t -m_{\bar{\mu}}(-t)\right)dt = -\frac{1}{\sigma^2} + m_{\bar{\mu}}(-\sigma^2)

with the above 4 and 5, we can figure out that the derive of

C

and

\bar{C}

, so next is to find the possible constant difference, which turn out to be 0。

Then replace our

T

by

T^{\frac 12}PT^{\frac 12}

in order to get the variable

P

involved in。

SO our ==fundamental equations== become more complex, but we work slways with the same idea by deriving on

P

, hoping to get

0

Then things turn out to be a weel-known problem: the

waterfilling problem

, which can be solved by the

algorithm of iterative waterfilling

, see more detailshere

Part 6: Spiked models

DEF

\frac 1n R^{\frac 12} X X^* R^{\frac 12}

If in a particular case, we take

R= I_p

only for the last diagonal element is

\alpha

In this case, the measure

v = \lim_{p\to\infty}\frac 1p\sum_{i=1}^{p}\delta_{\lambda_i (R)} = \lim_{p\to\infty}\frac 1p[(p-1)\delta_1 + \delta_{\alpha}] = \delta_1

So the limiting measure of

\mu

will be like M-P law, but with a outlier because of

\alpha

Eigenvalues of a finite rank pertubation

multiple version of spiked models

- \frac 1n(I+V\Omega V^*)^{\frac 12}XX^*(I+V\Omega V^*)^{\frac 12}

, with

V\in \mathcal{R}^{p\times k}

and

\Omega\quad \text{a diag}\in \mathcal{R}^{k\times k}

, in the regime that

P = V\Omega V^*

of small rank and when

n,p\to\infty

, while

k

stays unchanged。

\frac 1n XX^* + V\Omega V^*

\frac 1n (X+U\Omega U^*)(X+U \Omega U^*)^*

, with

U\in \mathcal{R}^{n\times k}

and unitary。

The goal is to identify the exsitance and position of the

isolated eigenvalues

with the notaion

P = V\Omega V^*

So we solve the following equation:

\det\left( \frac 1n (I_p+P)^{\frac 12}XX^*(I_p+P)^{\frac 12} - \lambda I_p\right) = 0

with

\alpha\notin [(1-\sqrt{c})^2,(1+\sqrt{c})^2]

, with of course

c = \lim(\frac pn)

with the useful ==determinant property==:

\det(AB) = \det(A)\det(B)

, and ==resolvent identity==, we get:

0 = \det(I_p + P)\cdot\det(\frac 1n XX^* - \lambda(I_p +P)^{-1}) = \det(\frac 1n XX^*-\lambda I_p)\det[I_p + (\frac 1n XX^* - \lambda I_p)^{-1}\lambda P(I_p+P)^{-1}]

And for

P(I_p+P)^{-1}

, we have:

= V\Omega V^*(I_p + V\Omega V^*)^{-1} = V\Omega(I_k + V^*V\Omega)^{-1}V^* = V\Omega (I_k + \Omega )^{-1} V^*

So finally we get:

0 = \det[I_p + (\frac 1n XX^* - \lambda I_p )^{-1} \lambda V\Omega (I_k + \Omega)^{-1}V^*] = \det[I_k + \lambda V^* (\frac 1n XX^* - \lambda I_p)^{-1}V\Omega(I_k+\Omega)^{-1}]

with ==Sylvester‘s determinant identity==:

\det(I_p + AB) = \det(I_k + BA)

Lemma

: For

a,b\in \mathcal{R}^{n},\|a\|,\|b\|=O(1)

X

with i。i。d

(0,1)

entries:

a^*(\frac 1n XX^* - zI_p)^{-1}b - a^*bm(z)\overset{a.s.}{\rightarrow}0

with

m(z)

the S。T transform of the M-P law and

z\notin [(1-\sqrt{c})^2,(1+\sqrt{c})^2]

(Remark: with

a=b

, iid entries, it is the ==trace lemma==)

Applying the lemma we get:

0 = \det(I_k + \lambda m(\lambda)\cdot \Omega(I_p+\Omega)^{-1})+ O(1)

Asymptotically the solution

\rho

are such that:

\exists i: 1+\rho\cdot m(\rho)\frac{w_i}{1+w_i}=0

So we have:

0 = \prod_{i=0}^{k}\left(1+\lambda\cdot m(\lambda)\frac{w_i}{1+w_i }\right)+O(1)\\ \to\rho \cdot m(\rho)=-\frac{1+w_i}{w_i}

Since we know that

z\cdot m(z) = \frac{z}{-z+1-c-cz\cdot m(z)}, \text{with}\ z\in \mathcal{C}\notin[(1-\sqrt{c})^2,(1+\sqrt{c})^2]

First we should check that things are well-located in the support。

\rho m(\rho)= \int \frac{\rho}{t-\rho}\mu(dt)=-\int \frac{-\rho}{t-\rho}\mu(dt) = -1 +\int\frac{t}{t-\rho}\mu(dt)

, of positive derivate。

So we can see that the two limits are:

\left\{ \begin{aligned} \rho\cdot m(\rho)\overset{\rho\to-\infty}{\longrightarrow}-1\\ \rho\cdot m(\rho)\overset{\rho\to+\infty}{\longrightarrow}-1\ \end{aligned} \right.

So it is not difficult to image that

\rho\cdot m(\rho)

turns out to be some function of

\rho

that looks like a hyperbolic outside the regime

[(1-\sqrt{c})^2,(1+\sqrt{c})^2]

So for

\rho \to (1+\sqrt{c})^2

, we have

m(\rho)\to\rho^+

, with

\rho^+ = \frac{1}{-(1+\sqrt{c})^2+1-c-c(1+\sqrt{c})^2\rho^+}=-\frac{1}{\sqrt{c}(1+\sqrt{c})}

。 In this limit case, we have:

\rho\cdot m(\rho)\to-\frac{1+\sqrt{c}}{\sqrt{c}}= -1-\frac{1}{\sqrt{c}}

So we only have solution in the regime

\left(-1-\frac{1}{\sqrt{c}},\ -1 \right)

, which means

\rho\cdot m(\rho)=-1-\frac{1}{w_i}>-1-\frac{1}{\sqrt{c}}\longrightarrow w_i > \sqrt{c}

This implies the fact that only in the situation that when the eigenvalue have enough energy:

P>\sqrt{c}

, we can have

these isolated eigenvalues

。 This phenomenon is often referred to as

phase transition

in the literature。

When we suppose that we have

w_i > \sqrt{c}

, we can go on in our calculation as:

-1-\frac{1}{w_i} = \rho\cdot m(\rho)=\frac{\rho}{-\rho +1-c-c(-1-\frac{1}{w_i})}\\\to \rho = \frac{c(1+w_i)}{w_i}+1+w_i

We consider two limit cases:

\frac 1n(I+V\Omega V^*)^{\frac 12}XX^*(I+V\Omega V^*)^{\frac 12}\overset{c\to 0}{\longrightarrow}I+V\Omega V^*

, so of course with eigs

1+w_i

when

w_i \to \sqrt{c}

\rho \to (1+\sqrt{c})^2

, which signifies kind of a continous behavior。

Study of the eigenvectors:

?what can we say about the matrix

V

?(which are commenly used in array processing)

Idea: we can estimate quantities of this type:

a^*v_iv_i^*b

with

V = [v_1,\ldots,v_k]

for

a,b

deterministic vectors。

==Results==: Let

w_1>w_2>\ldots>w_{\tilde{k}}>\sqrt{c}\ge w_{\tilde{k}+1}\ge \ldots w_k

, also

\hat{v}_1,\ldots,\hat{v}_{\tilde{k}}

the eigenvectors of

\frac 1n (I+P)^{1/2}XX^*(I+P)^{1/2}

associated with the

\tilde{k}

largest eigenvalues, then as

n,p\to\infty,\frac pn\to\in(0,\infty)

, we have:

a^*\hat{v}_i\hat{v}_i^*b - \frac{1-cw_i^{-2}}{1+cw_i^{-1}}a^*v_iv_i^*b\overset{a.s}{\to}0,\ \forall i\le \tilde{k}

and also:

a^*\hat{v}_i\hat{v}_i^*b\overset{a.s}{\to}0,\ \forall i>\tilde{k}

in particular, we have:

\left|\hat{v}_i^*\hat{v}_i\right|^2 \to\begin{cases}\frac{1-cw_i^{-2}}{1-cw_i^{-1}}, & \forall \ i \le \tilde{k} \\0 & \text{otherwise}\end{cases}

Idea of the proof comes from the fact that

a^*\hat{v}_i\hat{v}_i^*b

is a function of the ==resolvant== and with the Cauchy integral。