Talk:Eigenvalue perturbation

A graph should have been displayed here but graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at pageviews.wmcloud.org

Some improvements[edit]

I have added two examples to explain that the perturbation of eigenvectors is complex in the case of repeated eigenvalues. I have added that the generalized eigenvalue problem arises in mechanical vibrations. BRousselet (talk) 09:23, 3 August 2022 (UTC) The derivation is simplified although the original one is to be found in a second fork. A theoretical basis is presented using the implicit function theorem. I have suppresed the issues. BRousselet (talk) 08:26, 29 August 2022 (UTC)[reply]

Not Exactly Correct[edit]

This theorem is not exactly correct. You cannot assume that you can eliminate "higher order terms" such as $\delta \lambda _{i}[M_{0}]\delta \mathbf {x} _{i}$ , because this assumes that $\mathbf {x} _{i}$ is an analytic function of the matrix pair $[K],[M]$ .

This is not necessarily the case. Suppose $[K_{0}]$ is the 2x2 identity matrix. Then $[1,1]^{T}/{\sqrt {2}}$ is an eigenvector. But if one makes a small perturbation, such as $[K]=[K_{0}]+{\begin{bmatrix}\epsilon &0\\0&0\end{bmatrix}}$

Then the eigenvectors are $[1,0]^{T}$ and $[0,1]^{T}$ which shows that a small perturbation of a matrix can destroy the structure of the eigenspace, thus one cannot eliminate the higher order terms.

I will work on fixing this article if I have time.

- Bryan Smith, Ph.D. student in computational mathematics, University of Colorado Denver — Preceding unsigned comment added by Smithb32 (talk • contribs) 17:12, 22 April 2013 (UTC)[reply]

The "proof" assumes that the that the perturbed eigenvalues and eigenvectors are "close" to the original; I believe it follows that the functions are (real)-analytic. It can be shown that, if the eigenvalues are distinct, then, in a neighborhood of (K, M), then the functions are (real)-analytic. — Arthur Rubin (talk) 09:57, 17 April 2015 (UTC)[reply]

Arthur, the point is that the results which are stated for eigenvectors perturbation are only valid for a simple eigenvalue.
For the multiple eigenvalue case, @Smithb32, provides a counter example. In this case usually, we cannot bound ||x_i-x_0i || with ||δ K || and || δ M || BRousselet (talk) 19:23, 16 July 2022 (UTC)[reply]

Reference[edit]

I don't have time to edit this article, but one can derive a formula for the perturbations of eigenvalue/eigenspaces to all orders using Theorem V.3.3 of Bhatia's matrix analysis book. Note that if the distinct eigenvalues are {lamda_i} then the projection onto the eigenspace is f_i(A), where f_i is a function which is 1 in a neighborhood of lamda_i and 0 in neighborhoods of lamda_j, for j not equal to i. Then you can apply the formula in that book and get the derivative of the projection onto the eigenspace and consequently the derivative of the eigenvalue using the formula lamda_i = Tr(A x the projection onto the lamda_i eigenspace.) You can repeat the proceedure and get the derivatives to all orders if one likes. It would be better to simply find a reference that does this and cite it. — Preceding unsigned comment added by 99.235.250.152 (talk) 03:29, 3 May 2013 (UTC)[reply]

You can find this theorem in Wilkinson, J.H., The algebraic eigenvalue problem, p. 68-70 — Preceding unsigned comment added by 128.179.158.43 (talk) 14:20, 18 April 2016 (UTC)[reply]

Help wanted[edit]

I don't dispute the accuracy of this article per se. I wrote it based on my old notes from a course. It's an interesting topic but I don't have a book to reference. I'd love it if someone could edit this for accuracy. —Ben FrantzDale 18:36, 15 April 2007 (UTC)[reply]

this can be made more readable if all the terms are defined before the equations are introduced. also, what is the difference between [K] and [M], which matrix do the eigenvalues come from?? —Preceding unsigned comment added by Essap (talk • contribs)

The eigenvalues come from both in that this page is focusing on the generalized eigenvalue problem, solving

Ax=\lambda Bx

. —Ben FrantzDale 15:53, 10 May 2007 (UTC)[reply]

Would be nice to have the result for the simple eigenvalue problem, M=I, GPWagner —Preceding unsigned comment added by 68.107.245.38 (talk) 03:39, 28 September 2007 (UTC)[reply]

Yes, this page needs some cleanup and verification. Overall, it seems OK. For the simple eigenvalue problem, it is easy to put zeros and identity matrices here and there, so there's no need to add it. Tony (talk) 17:13, 24 July 2008 (UTC)[reply]

Unfortunately, there are major mathematical mistakes and therefore the conclusion of the article are completely wrong! Moreover, I wasn't able to find any of this calculation on the cited reference. — Preceding unsigned comment added by Riemannzeta81 (talk • contribs) 16:34, 24 February 2011 (UTC)[reply]

As you can see above, I'm the first to admit this page is totally sub-par. I don't have access to the full text of that reference. (I think I was given it by the professor who showed me this technique.) Can you point out some of the mistakes you mention? I recall this all being pretty straightforward, and I know I've done a number of simulations based on this math that worked correctly... —Ben FrantzDale (talk) 17:57, 24 February 2011 (UTC)[reply]

In the phrase: "When the matrix is symmetric, the unperturbed eigenvectors are orthogonal and so we use them as a basis for the perturbed eigenvectors", the first sentence is false because x0_i are generalized eigenvectors not the eigenvectors of a symmetric matrix. Eq (4), still holds because x0_1,x0_2,...,x0_N are still a complete base of eigenvectors (but not orthogonal!). However, the equation after the sentence "Because the eigenvectors are orthogonal, we can remove the summations by left multiplying by x0_i: ..." is false for the non-orthogonality of eigenvectors (and therefore, you cannot drop the summation). I think this proof can be fixed, but I still need to figure out how...

Proof is flawed[edit]

The proof in this article first assumes that the changes will be small, then eliminates the higher order terms. The proof never justifies the assume that the changes will be small. This is not a correct way of proving the theorem. Taha (talk) 21:19, 16 April 2015 (UTC)[reply]

Consider it from another viewpoint: the author is initially assuming that the change in the eigenvalues and eigenvectors is small (this is actually a tentative solution) and solves for it; after doing this, next step is to verify if the assumption is correct. For example, from the given formulas, it is clear that if the eigenvalues are not well separated then the change in the eigenvectors is not small: consequently, in this case, the tentative solution is not actually a solution. A different problem, as you have noted, is to determine under which conditions we can be sure that the changes are small. This question is addressed in some of the comments below. — Preceding unsigned comment added by 80.31.177.16 (talk) 11:18, 21 January 2016 (UTC)[reply]

Well, the theorem is false if the changes aren't small. The article isn't particularly clear about this, but the theorem isn't actually an equality. It's an approximation. The theorem ought to say that λ_i is equal to that formula plus some error term, and similarly for x_i. That error term will depend on the size of the perturbation. If the perturbation is too large, then the error term will overwhelm the main term and the theorem yields no information.

It should be easy to get the error estimates straight. However, we really need a reliable source, not our own derivation. The references cited in the article should have precise statements. Ozob (talk) 03:11, 17 April 2015 (UTC)[reply]

@Ozob: I have checked "Ren-Cang Li (2014). "Matrix Perturbation Theory". In Hogben, Leslie. Handbook of linear algebra (Second edition. ed.). ISBN 1466507284." and "Stewart, G. and Sun, J.-G. (1990). Matrix Perturbation Theory. Academic Press", there is no result like this. We need to first prove that if the matrix perturbations are small, the changes in eigenvalues and eigenvectors will be small too. The proof assumes this! It is some sort of using conclusion itself to prove the conclusion.

Sadly, I used this proof idea in my paper to prove similar claim, my advisor said it is simply unacceptable. That is why I restarted this topic. Taha (talk) 04:07, 17 April 2015 (UTC)[reply]

The article is sloppy and incomplete, but the Bauer–Fike theorem provides a bound on the change in eigenvalue that is linear in the norm of the perturbation matrix. The constant is the condition number of the matrix, as one might expect. This is just my OR, but if the matrix perturbation is infinitesimal, then the change in eigenvalue is also infinitesimal. If the matrix change is small enough that you could consider it infinitesimal, then the math holds. --Mark viking (talk) 04:42, 17 April 2015 (UTC)[reply]

@Mark viking: Do we have Bauer–Fike theorem for generalized eigenvalue problems? Because changes in the M matrix can even make the problem ill-defined, See chapter VI in Stewart, G. and Sun, J.-G. (1990). Matrix Perturbation Theory. I think there are some ways to put conditions on the perturbations, but this article fails to list those conditions. Taha (talk) 05:11, 17 April 2015 (UTC)[reply]

There are generalizations, e.g., Elser (1982), p.349, also see Chu (1987) and Chu (2006), an extension to matrix polynomials. --Mark viking (talk) 06:04, 17 April 2015 (UTC)[reply]

Changes in M cannot make the problem ill-defined, as long as M remains positive-definite, and it is relatively easy to show that the eigenvalues are analytic as long as the eigenvalues remain distinct. It's possible that, in the real symmetric case, there is a choice of listing of eigenvalues which are analytic even near where the eigenvalues are equal, but I do not know the proof. It is fairly clear that, when the eigenvalues are distinct, the eigenvectors are also analytic. I don't have any of the relevant references with me, however. — Arthur Rubin (talk) 10:23, 17 April 2015 (UTC)[reply]

@Arthur Rubin and Mark viking: Guys, if you know under what conditions the theorem in this article holds, please add them to the article. Also, I guess the proof has to significantly modified, right? Taha (talk) 16:53, 17 April 2015 (UTC)[reply]

Theorem and outline of proof[edit]

Theorem: λ_i and x_i, the generalized eigenvalues and eigenvectors of K over M, are real-analytic functions of K and M over the domain where

K and M are symmetric matrices
M is positive definite
The eigenvalues λ_i are distinct.

Some of these lemmas may be described in our article holomorphic functional calculus.

Lemma 1: If F is a real-analytic function defined on an open subset U of the reals, then F(X) (as defined in matrix function) is a real-analytic function of X, over the domain where all the eigenvalues of X are in U.

Lemma 2: The function taking the coefficients of an nth degree polynomial to its n roots, in order, is real-analytic over the domain where (1) the xⁿ coefficient is non-zero, and the n roots are distinct.

Lemma 3: Let S be the unit n-sphere, considered as column vectors. Then the function taking x in S to x x^T is analytic, and it has a locally analytic inverse function.

Proof of theorem: We apply Lemma 1 to construct an analytic function M^−1/2. (We already knew that a positive semi-definite matrix has a unique principal square root. The lemma is only needed to demonstrate the map is analytic.) The generalized eigenvalues of K over M are the eigenvalues of A = M^−1/2 K M^−1/2, and the generalized eigenvectors x_i of K over M are related to the eigenvectors of A by x_i = M^−1/2 y_i. (This reduction is only necessary because I don't know how to describe generalized Frobenius covariants.)

The eigenvalues λ_i are the roots of

\left|\mathbf {K} -\lambda \mathbf {M} \right|=0,

or

\left|\mathbf {A} -\lambda I\right|=0,

so, by Lemma 2, the eigenvalues are analytic.

Noting the representation of the Frobenius covariants of a matrix A as a polynomial in A with coefficients rational functions of the eigenvalues, the Frobenius covariants of A are analytic in A.

Finally, we use Lemma 3 to demonstrate that the eigenvectors are locally analytic.

QED.

Of course, this would need to be in a reliable source to be included. It seems to be the case that, even where the eigenvalues are equal, the generalized eigenvalues can be reordered to have directional derivatives in all directions (in 2n²-space), but that doesn't help with eigenvectors. — Arthur Rubin (talk) 11:58, 18 April 2015 (UTC)[reply]

@Arthur Rubin: Thanks for the effort. My understanding is that we need to show that the eigenvalues

\lambda _{i}(\mathbf {A} ,\mathbf {M} )

are Lipschitz continuous, right? (which I believe they are, at least based on similarities with what Terry Tao has pointed out.) Are you trying to show this fact? If we show this in your proof? Also, can you take a look at Matrix Perturbation Theory, page 15--10, Fact 1? It seems that is what we want. Taha (talk) 04:36, 19 April 2015 (UTC)[reply]

I'm not sure Lipschitz continuity of the eigenvectors is adequate to allow the "existing" "proof" to be supported. However, the rest of that "fact 1", seems to be the desired statement about generalized eigenvalues. — Arthur Rubin (talk) 06:26, 19 April 2015 (UTC)[reply]

Why not start with the non-generalised version?[edit]

The page currently deals with the generalised eigenvalue problem. Usually, I'm not interested in the generalised version, but just the usual eigenvalue problem. Of course the generalised version is a generalised version and thus the usual eigenvalue problem is merely a special case - but I can't help thinking that both for my own reference and for new users, the article would be much more useful if it presented the results for the usual eigenvalue problem first, and then had a section on the generalised problem below it. 131.112.112.6 (talk) 08:51, 14 March 2016 (UTC)[reply]

Hilbert spaces?[edit]

The proof applies to the obvious case where the perturbed Hilbert space and the original Hilbert space are the same. This is kind of obvious, but there is a physics example which kind of demonstrates this and it follows below.

My argument is based on physics reasoning. Consider the free motion Hamiltonian,

H = 1/(2*m) P^2 .

and make the quadratic potential as perturbation,

dH = 1/2 * m * omega * X^2.

Any non-zero omega would change the spectrum from continuous to discrete.

Is it relevant that these are operators on a continuum (improper) basis?

The spectrum of the quantum harmonic oscillator is (semi-)infinite and discrete. — Preceding unsigned comment added by 70.27.100.205 (talk) 01:07, 25 March 2016 (UTC)[reply]

Results section....[edit]

... does not make sense to me. There are 2 pairs of equations. The first pair contains 2 almost identical equations, one having a coefficient of λ in the right hand term and the other not. Either this is not true or I haven't understood something. — Preceding unsigned comment added by 195.213.180.112 (talk) 14:26, 20 May 2022 (UTC)[reply]