Proofs involving the MooreβPenrose inverse
Let A be an m-by-n matrix over a field π, where π, is either the field β, of real numbers or the field β, of complex numbers. Then there is a unique n-by-m matrix A+ over π, such that:
- AA+Aβ=βA,
- A+AA+β=βA+,
- (AA+)*β=βAA+,
- (A+A)*β=βA+A.
A+ is called the Moore-Penrose inverse of A. Notice that A is also the Moore-Penrose inverse of A+ . That is, (A+)+β=βA.
Useful lemmas
These results are used in the proofs below. In the following lemmas, A is a matrix with complex elements and n columns, B is a matrix with complex elements and n rows.
===Lemma 1: A*A = 0 β A = 0=== The assumption says that all elements of A*A are zero. Therefore,
$$0 = \operatorname{Tr}(A^*A) = \sum_{j=1}^n (A^*A)_{jj} = \sum_{j=1}^n \sum_{i=1}^m (A^*)_{ji}A_{ij}=\sum_{i=1}^m \sum_{j=1}^n |A_{ij}|^2$$.
Therefore, all Aij equal 0 i.e. Aβ=β0.
Lemma 2: A*AB = 0 β AB = 0
$\begin{alignat}{3} & 0\,&& = A^*AB &&&\\ \Rightarrow\,& 0\,&& = B^*A^*AB &&& \\ \Rightarrow\,& 0\,&& = (AB)^*(AB) &&& \\ \Rightarrow\,& 0\,&& = AB &&& (\text{by Lemma 1}) \end{alignat}$
Lemma 3: ABB* = 0 β AB = 0
This is proved in a manner similar to the argument of Lemma 2 (or by simply taking the Hermitian conjugate).
Existence and uniqueness
Proof of uniqueness
Suppose that B and C are two n-by-m matrices over π satisfying the Moore-Penrose criteria. Observe then thatΒ Β
ABβ=βACABβ=β(AC)*(AB)*β=βC*(ABA)*β=βC*A*β=β(AC)*β=βAC.
Analogously we conclude that BAβ=βCA. The proof is completed by observing that then
Bβ=βBABβ=βBACβ=βCACβ=βC.
Proof of existence
The proof proceeds in stages.
1-by-1 matrices
For any xβββπ, we define $x^+ := \begin{cases} x^{-1}, & \mbox{if }x \neq 0 \\ 0, & \mbox{if }x = 0 \end{cases}$
It is easy to see that x+ is a pseudoinverse of x (interpreted as a 1-by-1 matrix).
Square diagonal matrices
Let D be an n-by-n matrix over π with zeros off the diagonal. We define D+ as an n-by-n matrix over π with (D+)ijβ:=β(Dij)+ as defined above. We write simply Dij+ for (D+)ijβ=β(Dij)+.
Notice that D+ is also a matrix with zeros off the diagonal.
We now show that D+ is a pseudoinverse of D:
- (DD+D)ijβ=βDijDij+Dijβ=βDijβββDD+Dβ=βD
- (D+DD+)ijβ=βDij+DijDij+β=βDij+βββD+DD+β=βD+
- $(DD^+)^*_{ij} = \overline{(DD^+)_{ji}} = \overline{D_{ji}D^+_{ji}} = (D_{ji}D^+_{ji})^* = D_{ji}D^+_{ji} = D_{ij}D^+_{ij} \Rightarrow (DD^+)^* = DD^+$
- $(D^+D)^*_{ij} = \overline{(D^+D)_{ji}} = \overline{D^+_{ji}D_{ji}} = (D^+_{ji}D_{ji})^* = D^+_{ji}D_{ji} = D^+_{ij}D_{ij} \Rightarrow (D^+D)^* = D^+D$
General non-square diagonal matrices
Let D be an m-by-n matrix over π with zeros off the main diagonal, where m and n are unequal. That is, Dijβ=βdi for some diβββπ when iβ=βj and Dijβ=β0 otherwise.
Consider the case where nβ>βm. Then we can rewrite Dβ=β[D0ββ0mβ Γβ (nβm)] by stacking where D0 is a square diagonal m-by-m matrix, and 0mβ Γβ (nβm) is the m-by-(n-m) zero matrix. We define $D^+\equiv\begin{bmatrix}D_0^+\\\mathbf{0}_{(n-m)\times m}\end{bmatrix}$ as an n-by-m matrix over π, with D0+ the pseudoinverse of D0 defined above, and 0(nβm)β Γβ m the (n-m)-by-m zero matrix. We now show that D+ is a pseudoinverse of D:
- By multiplication of block matrices, DD+β=βD0D0+β +β 0mβ Γβ (nβm)0(nβm)β Γβ mβ=βD0D0+, so by property 1 for square diagonal matrices D0 proven in the previous section,DD+Dβ=βD0D0+[D0ββ0mβ Γβ (nβm)]β=β[D0D0+D0ββ0mβ Γβ (nβm)]β=β[D0ββ0mβ Γβ (nβm)]β=βD.
- Similarly, $D^+D=\begin{bmatrix}D_0^+D_0 & \mathbf{0}_{m\times(n-m)}\\\mathbf{0}_{(n-m)\times m} & \mathbf{0}_{(n-m)\times(n-m)}\end{bmatrix}$, so $D^+DD^+=\begin{bmatrix}D_0^+D_0 & \mathbf{0}_{m\times(n-m)}\\ \mathbf{0}_{(n-m)\times m} & \mathbf{0}_{(n-m)\times(n-m)}\end{bmatrix}\begin{bmatrix}D_0^+\\\mathbf{0}_{(n-m)\times m}\end{bmatrix}=\begin{bmatrix}D_0^+D_0D_0^+\\\mathbf{0}_{(n-m)\times m}\end{bmatrix}=D^+.$
- By 1 and property 3 for square diagonal matrices, (DD+)*β=β(D0D0+)*β=βD0D0+β=βDD+.
- By 2 and property 4 for square diagonal matrices, $(D^+D)^*=\begin{bmatrix}(D_0^+D_0)^* & \mathbf{0}_{m\times(n-m)}\\\mathbf{0}_{(n-m)\times m} & \mathbf{0}_{(n-m)\times(n-m)}\end{bmatrix}=\begin{bmatrix}D_0^+D_0 & \mathbf{0}_{m\times(n-m)}\\\mathbf{0}_{(n-m)\times m} & \mathbf{0}_{(n-m)\times(n-m)}\end{bmatrix}=D^+D.$
Existence for D such that mβ>βn follows by swapping the roles of D and D+ in the nβ>βm case and using the fact that (D+)+β=βD.
Arbitrary matrices
The singular value decomposition theorem states that there exists a factorization of the form
Aβ=βUΞ£V* where:
U is an m-by-m unitary matrix over π.
Ξ£ is an m-by-n matrix over π with nonnegative real numbers on the diagonal and zeros off the diagonal.
V is an n-by-n unitary matrix over π.
Define A+ as VΞ£+U*.
We now show that A+ is a pseudoinverse of A:
- AA+Aβ=βUΞ£V*VΞ£+U*UΞ£V*β=βUΣΣ+Ξ£V*β=βUΞ£V*β=βA
- A+AA+β=βVΞ£+U*UΞ£V*VΞ£+U*β=βVΞ£+ΣΣ+U*β=βVΞ£+U*β=βA+
- (AA+)*β=β(UΞ£V*VΞ£+U*)*β=β(UΣΣ+U*)*β=βU(ΣΣ+)*U*β=βU(ΣΣ+)U*β=βUΞ£V*VΞ£+U*β=βAA+
- (A+A)*β=β(VΞ£+U*UΞ£V*)*β=β(VΞ£+Ξ£V*)*β=βV(Ξ£+Ξ£)*V*β=βV(Ξ£+Ξ£)V*β=βVΞ£+U*UΞ£V*β=βA+A
Basic properties
A*+β=βA+*
The proof works by showing that A+* satisfies the four criteria for the pseudoinverse of A*. Since this amounts to just substitution, it is not shown here.
The proof of this relation is given as Exercise 1.18c in.
Identities
==== A+ = A+ A+* A* ==== A+β=βA+AA+ and AA+β=β(AA+)* imply that A+β=βA+(AA+)*β=βA+A+*A*.
====A+ = A* A+* A+==== A+β=βA+AA+ and A+Aβ=β(A+A)* imply that A+β=β(A+A)*A+β=βA*A+*A+.
A = A+* A* A
Aβ=βAA+A and AA+β=β(AA+)* imply that Aβ=β(AA+)*Aβ=βA+*A*A.
A = A A* A+*
Aβ=βAA+A and A+Aβ=β(A+A)* imply that Aβ=βA(A+A)*β=βAA*A+*.
A* = A* A A+
This is the conjugate transpose of Aβ=βA+*A*A above.
A* = A+ A A*
This is the conjugate transpose of Aβ=βAA*A+* above.
Reduction to the Hermitian case
The results of this section show that the computation of the pseudoinverse is reducible to its construction in the Hermitian case. It suffices to show that the putative constructions satisfy the defining criteria.
===A+ = A* (A A*)+=== This relation is given as exercise 18(d) in, for the reader to prove, "for every matrix A". Write Dβ=βA*(AA*)+. Observe that
$\begin{alignat}{3} & AA^*\, &&= AA^*(A A^*)^+ AA^* &&&\\ \Leftrightarrow\,& AA^*\,&&= ADAA^* &&& \\ \Leftrightarrow\,& \quad 0\,&&=(AD-I)AA^* &&& \\ \Leftrightarrow\,& \quad 0\,&&=ADA-A &&& (\text{by Lemma 3}) \\ \Leftrightarrow\,& \quad \!A\,&& = ADA &&& \end{alignat}$
Similarly, (AA*)+AA*(AA*)+β=β(AA*)+ implies that A*(AA*)+AA*(AA*)+β=βA*(AA*)+ i.e. DADβ=βD.
Additionally, ADβ=βAA*(AA*)+ so ADβ=β(AD)*.
Finally, DAβ=βA*(AA*)+A implies that (DA)*β=βA*((AA*)+)*Aβ=βA*((AA*)+)Aβ=βDA.
Therefore Dβ=βA+.
===A+ = (A* A)+A*=== This is proved in an analogous manner to the case above, using Lemma 2 instead of Lemma 3.
Products
For the first three proofs, we consider products C = AB.
A has orthonormal columns
If A has orthonormal columns i.e. A*Aβ=βI then A+β=βA*. Write Dβ=βB+A+β=βB+A*. We show that D satisfies the Moore-Penrose criteria.
CDCβ=βABB+A*ABβ=βABB+Bβ=βABβ=βC,
DCDβ=βB+A*ABB+A*β=βB+BB+A*β=βB+A*β=βD,
(CD)*β=βD*B*A*β=βA(B+)*B*A*β=βA(BB+)*A*β=βABB+A*β=βCD,
(DC)*β=βB*A*D*β=βB*A*A(B+)*β=β(B+B)*β=βB+Bβ=βB+A*ABβ=βDC.
Therefore Dβ=βC+.
B has orthonormal rows
If B has orthonormal rows i.e. BB*β=βI then B+β=βB*. Write Dβ=βB+A+β=βB*A+. We show that D satisfies the Moore-Penrose criteria.
CDCβ=βABB*A+ABβ=βAA+ABβ=βABβ=βC,
DCDβ=βB*A+ABB*A+β=βB*A+AA+β=βB*A+β=βD,
(CD)*β=βD*B*A*β=β(A+)*BB*A*β=β(A+)*A*β=β(AA+)*β=βAA+β=βABB*A+β=βCD,
(DC)*β=βB*A*D*β=βB*A*(A+)*Bβ=βB*(A+A)*Bβ=βB*A+ABβ=βDC.
Therefore Dβ=βC+.
A has full column rank and B has full row rank
Since A has full column rank, A*A is invertible so (A*A)+β=β(A*A)β1. Similarly, since AB has full row rank, BB* is invertible so (BB*)+β=β(BB*)β1.
Write Dβ=βB+A+β=βB*(BB*)β1(A*A)β1A*. We show that D satisfies the Moore-Penrose criteria.
CDCβ=βABB*(BB*)β1(A*A)β1A*ABβ=βABβ=βC,
DCDβ=βB*(BB*)β1(A*A)β1A*ABB*(BB*)β1(A*A)β1A*β=βB*(BB*)β1(A*A)β1A*β=βD,
CDβ=βABB*(BB*)β1(A*A)β1A*β=βA(A*A)β1A*β=β(A(A*A)β1A*)*βββ(CD)*β=βCD,
DCβ=βB*(BB*)β1(A*A)β1A*ABβ=βB*(BB*)β1Bβ=β(B*(BB*)β1B)*βββ(DC)*β=βDC.
Therefore Dβ=βC+.
Conjugate transpose
Here, Bβ=βA*, and thus Cβ=βAA* and Dβ=βA+*A+. We show that indeed D satisfies the four Moore-Penrose criteria.
CDCβ=βAA*A+*A+AA*β=βA(A+A)*A+AA*β=βAA+AA+AA*β=βAA+AA*β=βAA*β=βC
DCDβ=βA+*A+AA*A+*A+β=βA+*A+A(A+A)*A+β=βA+*A+AA+AA+β=βA+*A+AA+β=βA+*A+AA+β=βA+*A+β=βD
(CD)*β=β(AA*A+*A+)*β=βA+*A+AA*β=βA+*(A+A)*A*β=βA+*A*A+*A*β=β(AA+)*(AA+)*β=βAA+AA+β=βA(A+A)*A+=
β=βAA*A+*A+β=βCD
(DC)*β=β(A+*A+AA*)*β=βAA*A+*A+β=βA(A+A)*A+β=βAA+AA+β=β(AA+)*(AA+)*β=βA+*A*A+*A*β=βA+*(A+A)*A*=
β=βA+*A+AA*β=βDC
Therefore Dβ=βC+. In other words:
(AA*)+β=βA+*A+ and, since (A*)*β=βA
(A*A)+β=βA+A+*
Projectors and subspaces
Define Pβ=βAA+ and Qβ=βA+A. Observe that P2β=βAA+AA+β=βAA+β=βP. Similarly Q2β=βQ, and finally, Pβ=βP* and Qβ=βQ*. Thus P and Q are orthogonal projection operators. Orthogonality follows from the relations Pβ=βP* and Qβ=βQ*. Indeed, consider the operator P: any vector decomposes as
Β Β xβ=βPxβ +β (IβP)x
and for all vectors x and y satisfying Pxβ=βx and (IβP)yβ=βy, we have
Β Β x*yβ=β(Px)*(IβP)yβ=βx*P*(IβP)yβ=βx*P(IβP)yβ=β0.
It follows that PAβ=βAA+Aβ=βA and A+Pβ=βA+AA+β=βA+. Similarly, QA+β=βA+ and AQβ=βA. The orthogonal components are now readily identified.
If y belongs to the range of A then for some x, yβ=βAx and Pyβ=βPAxβ=βAxβ=βy. Conversely, if Pyβ=βy then yβ=βAA+y so that y belongs to the range of A. It follows that P is the orthogonal projector onto the range of A. Iβ ββ P is then the orthogonal projector onto the orthogonal complement of the range of A, which equals the kernel of A*.
A similar argument using the relation QA*β=βA* establishes that Q is the orthogonal projector onto the range of A* and (IβQ) is the orthogonal projector onto the kernel of A.
Using the relations P(A+)*β=βP*(A+)*β=β(A+P)*β=β(A+)* and Pβ=βP*β=β(A+)*A* it follows that the range of P equals the range of (A+)*, which in turn implies that the range of Iβ ββ P equals the kernel of A+. Similarly QA+β=βA+ implies that the range of Q equals the range of A+. Therefore, we find,
$$\begin{alignat}{2} \operatorname{Ker}(A^+) &= \operatorname{Ker}(A^*). \\ \operatorname{Im}(A^+) &= \operatorname{Im}(A^*). \\ \end{alignat}$$
Additional properties
Least-squares minimization
In the general case, it is shown here for any mβ Γβ n matrix A that β₯Axβ ββ bβ₯2ββ₯ββ₯Azβ ββ bβ₯2 where zβ=βA+b. This lower bound need not be zero as the system Axβ=βb may not have a solution (e.g. when the matrix A does not have full rank or the system is overdetermined).
To prove this, we first note that (stating the complex case), using the fact that Pβ=βAA+ satisfies PAβ=βA and Pβ=βP*, we have
$$\begin{alignat}{2} A^*(Az - b) & = A^*(A A^+ b - b)\\ & = A^*(P b - b) \\ & = A^*P^* b - A^*b \\ & = (PA)^* b - A^*b \\ & = 0 \end{alignat}$$ so that (c.c. stands for the complex conjugate of the previous term in the following)
$$\begin{alignat}{2} \|Ax -b\|_2^2 &= \|Az -b\|_2^2 + (A(x-z))^*(Az-b) + \text{c.c.} + \|A(x - z)\|_2^2 \\ &= \|Az -b\|_2^2 + (x-z)^*A^*(Az-b) + \text{c.c.} + \|A(x - z)\|_2^2 \\ &= \|Az -b\|_2^2 + \|A(x - z)\|_2^2\\ & \ge \|Az -b\|_2^2 \end{alignat}$$ as claimed.
If A is injective i.e. one-to-one (which implies mββ₯βn), then the bound is attained uniquely at z.
Minimum-norm solution to a linear system
The proof above also shows that if the system Axβ=βb is satisfiable i.e. has a solution, then necessarily zβ=βA+b is a solution (not necessarily unique). We show here that z is the smallest such solution (its Euclidean norm is uniquely minimum).
To see this, note first, with Qβ=βA+A, that Qzβ=βA+AA+bβ=βA+bβ=βz and that Q*β=βQ. Therefore, assuming that Axβ=βb, we have
$$\begin{alignat}{2} z^*(x-z) & = (Qz)^*(x-z)\\ &=z^*Q(x-z)\\ &=z^*(A^+ A x - z) \\ &=z^*(A^+ b - z) \\ &=0. \end{alignat}$$
Thus
$$\begin{alignat}{2} \|x\|_2^2 &= \|z\|_2^2 + 2z^*(x-z) + \|x-z\|_2^2 \\ &= \|z\|_2^2 + \|x-z\|_2^2 \\ &\ge \|z\|_2^2 \end{alignat}$$ with equality if and only if xβ=βz, as was to be shown.