A matrix is a rectangular array of numbers. The dimensions of a matrix are specified as rows by columns, typically written as $m \times n$. In the context of [[linear algebra]], a matrix can be used to efficiently encode and solve a [[linear system]]. ## coefficient matrix The coefficient matrix tracks the $a_i$ terms in a [[linear system]]. For example, the system of equations $\begin{align} 60x + 30y = 3000\\ 12x + 2y = 400 \end{align}$ Can be written as $\left[\begin{array}{} 60 & 30 \\ 12 & 2 \end{array}\right]$ ## augmented matrix The augmented matrix is useful when solving a [[linear system]] as it combines the coefficient matrix and the constant matrix. For example, the system of equations $\begin{align} 60x + 30y = 3000\\ 12x + 2y = 400 \end{align}$ Can be written as $\left[\begin{array}{rr|r} 60 & 30 & 3000 \\ 12 & 2 & 400\end{array}\right]$ ## row-equivalent matrices An augmented matrix is transformed into a row-equivalent matrix by performing any of the following row operations. - Interchange any two rows - Multiply any row by a nonzero constant - Add a constant multiple of one row to another row. To solve a system using augmented matrix methods, the objective is to transform the matrix into the form $\left[\begin{array}{rr|r} 1 & 0 & m \\ 0 & 1 & n \end{array}\right]$ For a system with two equations and two variables, the process is 1. Get a 1 in the upper left 2. Get a 0 in the lower left 3. Get a 1 in the lower right 4. Get a 0 in the upper right Consider our augmented matrix above. Let's start by simplifying to $\left[\begin{array}{rr|r} 2 & 1 & 100 \\ 6 & 1 & 200\end{array}\right]$ To get a $1$ in the upper left position, multiply $R_1$ by $1/2$. $\left[\begin{array}{rr|r} 1 & 1/2 & 50 \\ 6 & 1 & 200\end{array}\right]$ To get a 0 in the lower left position, add $-6 \cdot R_1$ to $R_2$. $\left[\begin{array}{rr|r} 1 & 1/2 & 50 \\ 0 & -2 & -100\end{array}\right]$ To get a 1 in the lower right position, multiply $R_2$ by $-1/2$. $\left[\begin{array}{rr|r} 1 & 1/2 & 50 \\ 0 & 1 & 50\end{array}\right]$ To get a 0 in the upper right position, add $-1/2 \cdot R_2$ to $R_1$. $\left[\begin{array}{rr|r} 1 & 0 & 25 \\ 0 & 1 & 50\end{array}\right]$ Therefore the solution is $x=25$ and $y=50$. (Note this is the same result as using elimination by addition in the example in [[linear system|linear systems]]). ### equivalent systems If the result is a matrix with a row of all zeros, like $\left[\begin{array}{rr|r} 1 & -2 & 4 \\ 0 & 0 & 0\end{array}\right]$ the system is equivalent (infinite number of solutions) and corresponds to a linear system like $\begin{align} x_1 - 2x_2 = 4 \\ 0x_1 + 0x_2 = 0 \end{align}$ We set the variable $x_2= t$ and note that $x_1 = 2t + 4$, therefore the solution is the set $(2t + 4, t)$. These solutions are said to be consistent and dependent. ### inconsistent solutions If the result is a matrix that cannot be satisfied by any ordered pair of real numbers, like $\left[\begin{array}{rr|r} 1 & -2 & 4 \\ 0 & 0 & -7\end{array}\right]$ the system is inconsistent (no solutions). However, keep in mind that for matrices that are taller than wide, there may exist a row of all zeros that does not indicate infinite solutions. ## reduced form A matrix is said to be in **reduced row echelon form** (reduced form) if - each row consisting entirely of zeroes is below any row having at least one nonzero element - the leftmost nonzero element in each row is $1$ - all other elements in the column containing the leftmost 1 of a given row are zeros - the leftmost $1$ in any row is to the right of the leftmost 1 in the row above. The following matrix is in reduced row form. $\left[\begin{array}{rrrr|r} 1 & 4 & 0 & 0 & -3 \\ 0 & 0 & 1 & 0 & 2 \\ 0 & 0 & 0 & 1 & 6\end{array}\right]$ Notice the "triangle" created by the area below and to the left of the diagonal is all zeros. Similarly for the triangle in the upper right. ## Gauss-Jordan elimination The Gauss-Jordan elimination method is named after [[Carl Friedrich Gauss]] and the German geodesist Wilhelm Jordan. Gauss-Jordan elimination is a technique for solving systems of linear equations. The process is 1. Choose the leftmost nonzero column and use appropriate row operations to get a 1 at the top 2. Use multiples of the row containing the 1 from step 1 to get zeros in all remaining places in the column containing this 1 3. repeat step 1 with submatrix formed by (mentally) deleting the row used in step 2 and all rows above this row 4. repeat step 2 with the entire matrix, including the rows deleted mentally 5. continue this process until the entire matrix is in reduced form. If the number of leftmost $1s in a reduced augmented coefficient matrix is less than the number of variables in the system and there are no contradictions, then the system is dependent and has infinitely many solutions. ## inverse The inverse matrix $A^{-1}$ of matrix $A$ is a matrix that, when multiplied by $A$, results in the identity matrix. Only square matrices have an inverse. $A A^{-1} = A^{-1} A = I$ To find the inverse of a matrix, augment the matrix with the identity matrix. Then solve using Gauss-Jordan elimination where the matrix becomes the identity matrix and what was the identity matrix becomes the inverse. $\left[\begin{array}{rr|rr} a & b & 1 & 0 \\ c & d & 0 & 1 \end{array}\right] => \left[\begin{array}{rr|rr} 1 & 0 & e & f \\ 0 & 1 & g & h \end{array}\right]$ The inverse of a $2 \times 2$ matrix is $1$ over the determinant multiplied by the matrix were $a$ and $d$ are swapped and $b$ and $c$ are negated. Given a matrix $P$ of the form $P = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$ The inverse is $P^{-1} = \frac{1}{|P|} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}$ where $|P|$ is the determinant of $P$. If the determinant is zero, there is no inverse of a $2 \times 2$ matrix. ## determinant Given a $2 \times 2$ matrix $P$ of the form $P = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$ the determinant is $|P| = ad - bc$ Given a $3 \times 3$ matrix (or greater dimensions) of the form $P = \begin{bmatrix} a & b & c \\ d & e & e \\ f & g & h \end{bmatrix}$ the determinant can be calculated by following the following algorithm 1. initialize the determinant's value as $0$ 2. pick any row 3. find the $2 \times 2$ matrix resulting from dropping the elements in the row and column of the first element in the selected row, then calculate the determinant 4. multiply the determinant from step 2 by the first value in the selected row 5. if the row selected is even, negate the value from step 3 6. add the value from step 4 to the determinant value 7. move on to the next element in the row and repeat steps 3 - 6 8. continue for all elements in the row (the remaining rows can be ignored). In practice, calculate determinants for matrices in [[R]]. ## diagonal matrix A diagonal matrix is one where the values of interest lie on the diagonal, and all other values are 0. Given a diagonal matrix $D$ $D = \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}$ The square of the matrix will be $D^2 = \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix} \begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix} = \begin{bmatrix} a^2 & 0 \\ 0 & b^2 \end{bmatrix}$ In general, a diagonal matrix to the $k$th power will be $D^k = \begin{bmatrix} a^k & 0 & \dots\\ 0 & b^k & \dots \\ \vdots & \vdots & z^k \end{bmatrix}$ ## raise a matrix to a power We can use diagonalization to simplify taking a matrix to a power. Given some matrix $P$, we will decompose the matrix as $A = P \ D \ P^{-1}$ where $D$ is a diagonal matrix and $P^{-1}$ is the inverse of $P$. Notice that $A^2 = (PDP^{-1})(PDP^{-1}) = PD^2P^{-1}$. This pattern will continue such that $A^k = PD^kP^{-1}$. ## matrix multiplication To get the element in row $i$ column $j$ of the matrix $C$, sum the product of corresponding elements from row $i$ in the first matrix and column $j$ of the second matrix. R1C1, R2C1, R3C1 where the first part refers to the first matrix and the second part refers to the second matrix (e.g., R1 of the first matrix times C1 of the second matrix). In other words you're always taking row-wise from the first matrix and column-wise from the second matrix. ## identity matrix Multiplying by the identity matrix is analogous to multiplying any number by $1$. The identity matrix is always a square matrix with $1$s along the diagonal and $0$s everywhere else. ## eigenvalue and eigenvector Given a matrix $A$, the eigenvector $x$ satisfies the equation $Ax = \lambda x$ where $\lambda$ is the associated eigenvalue. To find an eigenvector given an eigenvalue $\lambda$, we can rewrite the definition as $(A - \lambda I)x = 0$ where $I$ is the identity matrix that transforms the eigenvalue to the dimensions of $A$. Solve for $x$ (using the augmented matrix and Gauss-Jordan elimination). The solution will be where the determinant of $A-\lambda I$ equals 0. When $A$ is symmetric, $A \to V \Sigma V^T$ where $V$ and $V^T$ are orthogonal. When $A$ is not symmetric, $A \to V \Sigma V^{-1}$ but note $V$ and $V^{-1}$ are not orthogonal. To find the eigenvalue given just a matrix, use the characteristic polynomial. ## eigenvalue decomposition where $\Sigma$ is a diagonal matrix. $Av = \lambda I v$ ## transpose Simply flip rows and columns of a matrix. The first column becomes the first row and so on. ## matrix factorization Matrix factorization entails factoring a $m$ x $n$ matrix by finding a matrix $d$ where $m \times n = (m \times d) \times (d \times n)$ This is called 2-factor matrix factorization. The matrix $d$ could also be written as a [[diagonal matrix]] such that $m \times n = (m \times d) \times (d \times d) \times (d \times n)$ In this case, the diagonals of $d$ give "strengths" or the importance between each factor pair. This is called 3-factor matrix factorization. The paper Learning the parts of objects by non-negative matrix factorization by Lee and Seung published in Nature 1999 popularized matrix factorization. Matrix factorization is also used in multi-spectral images, for example to identify specific layers (e.g., road, grass, trees) in satellite imagery. Finally, topic modeling in [[natural language processing]] also uses matrix factorization to extract topics from documents based on their vocabularies. Other applications include audio signal separation, analytical chemistry, gene expression analysis and recommendation systems. ## singular value decomposition Singular value decomposition (SVD) uses the power iteration method to estimate eigen value decomposition, which requires square matrices, numerically for a matrix of any dimensions. In SVD we solve for $A = U \Sigma V^T$ where $\Sigma$ are the eigenvalues by initializing $V$ with some value like $V_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$ for $k$ iterations until the solution converges. ```python import numpy as np U, Sig, Vt = np.lingalg.svd(A) ``` Note that the output `Vt` will be the full matrix factorization, you can select the first $m$ rows to recover the matrix A. ```python A = np.matmul(u, np.matmul(np.diag(sig), vt[:m])) ``` For dimensionality reduction, use `sklearn`. ```python import numpy as np from sklearn.decomposition import TruncatedSVD tsvd = TruncatedSVD(n_components=m).fit(A) X = tsvd.transform(A) Vt = tsvd.components_ Sig = tsvd.singular_values_ U = np.matmul(A, mp.matmul(Vt.T, np.diag(1/Sig))) ``` ## non-negative matrix factorization Non-negative matrix factorization is a matrix factorization technique for matrices that have no negative values. Latent dimensions is the natural number of topics or genres Loss (objective) function, maximum likelihood estimator Additional constraints such as $W$ and $H$ must be orthogonal. Regularization: L2 Loss if modeled as Gaussian or L1 loss if modeled as Laplace distribution. Not suitable when data has many zero or near zero values. KL Loss in that case (assumes Poisson distribution). Itakura-Saito (IS) Loss used when modeled as multiplicative $\hat X = WHN$ common for audio signal data. ```python from sklearn.dcomposition import NMF ```