# Mathematics for Machine Learning - Day 6

Technically it takes the exact same steps... But we're not splitting hairs here. Also, I'm late :D

## Row-Echelon Form (REF)

To know REF, a few definitions need to be established.

- A leading coefficient / pivot: The first non-zero number in a row starting from the left
- Staircase structure: Each following pivot is strictly on the right of the matrix. Meaning, if row 1 pivot is on a12, row 2 pivot cannot be at 21 or at a22, but it can skip a column, so a23 is fine.

### Rule

- All rows that contain only zeros are at the bottom.
- All rows containing at least one non-zero will form a staircase structure.

### Example

#### Not row echelon form:

Why? Because on the fourth and fifth row, the pivot is on the same column.

$\begin{pmatrix} -2 & 4 & -2 & -1 & 4 \\\ 0 & -8 & 3 & -3 & 1 \\\ 0 & 0 & 6 & -1 & 1 \\\ 0 & 0 & 0 & -3 & 4 \\\ 0 & 0 & 0 & -3 & 1 \end{pmatrix}$#### Row echelon form:

I'm happy to announce that these are in fact both row echelon form and upper triangle matrix!

The first example might be a non-brainer, but the second one is interesting because it's not the typical style of a `sound`

matrix. But the pivots are on different and subsequent columns while having the zero row at the bottom, so both example 1 and 2 are row-echelon form!

### Regarding Day 5

I was right!, REF is a upper triangle like in LU decomposition method Though there is a caveat when searching further and a few discussion forums online.

Upper triangle matrix can on be defined on square matrices, while row-echelon form does not, meaning, REF is only a upper triangle and vice versa on square matrices, not on rectangular (though there was a forum saying that they can be...)

## Reduced Row Echelon Form (RREF) / Row Canonical Form

### Rule

- It's in REF (Meaning it needs to retain the rule of REF)
- Every pivot is 1

### Explanation and difference between REF

It's... a simplified version of an REF, what's the main difference? After you've obtain a REF matrix, on each row you divide using elementary transformation based on the pivot.

Take the previous matrix as an example:

$\text{This is a REF} = \begin{array}{ccccc|c} -2 & 4 & -2 & -1 & 4 & 0 \\\ 0 & -8 & 3 & -3 & 1 & -2 \\\ 0 & 0 & 2 & -1 & 1 & 1 \\\ 0 & 0 & 0 & -3 & 4 & -2 \\\ 0 & 0 & 0 & 0 & 1 & -2 \end{array}$But if you divide each row by it's pivot, it becomes:

$\text{This is a RREF} = \begin{array}{ccccc|c} 1 & -2 & 1 & \frac{1}{2} & -2 & 0 \\\ 0 & 1 & \frac{3}{8} & -\frac{3}{8} & \frac{1}{8} & \frac{1}{4} \\\ 0 & 0 & 1 & -\frac{1}{2} & \frac{1}{2} & \frac{1}{2} \\\ 0 & 0 & 0 & 1 & -\frac{4}{3} & \frac{2}{3} \\\ 0 & 0 & 0 & 0 & 1 & -2 \end{array}$Please keep in mind, the reason I use augmented matrix is to show that the result also change based on the division and not just the matrix itself!

### Now this is where the previous day comes in as well! Remember this matrix?

This is an example of a reduced row echelon matrix form in augmented matrix, with the non-transformed matrix being complicated, what we did was transforming it into an RREF.

$\begin{array}{ccccc|c} 1 & -2 & 1 & -1 & 1 & 0 \\\ 0 & 0 & 1 & -1 & 3 & -2 \\\ 0 & 0 & 0 & 1 & -2 & 1 \\\ 0 & 0 & 0 & 0 & 0 & a + 1 \end{array}$## Calculating Inverse

This is where I spent a lot of time on today, because I couldn't wrap my head around the concept, but fear not! I think I understand it correctly.

$\left[\begin{array}{c|c} A & I \end{array}\right] = \left[\begin{array}{c|c} I & A^{-1} \end{array}\right]$If you're confused, I feel you, basically what I'm going to show you and provide proof (from a non mathematician) regarding this concept.

### Let's start with an example!, then the proof.

#### Example:

Given:

$A = \begin{pmatrix} 4 & 2 & 2 \\\ 3 & 1 & 2 \\\ 2 & 1 & 2 \end{pmatrix}$We create an augmented matrix with A on the left side and an identity matrix on the right.

$\left[\begin{array}{c|c} A & I \end{array}\right] = \left[\begin{array}{ccc|ccc} 4 & 2 & 2 & 1 & 0 & 0 \\\ 3 & 1 & 2 & 0 & 1 & 0 \\\ 2 & 1 & 2 & 0 & 0 & 1 \end{array}\right]$##### Step 1: Subtract R1 with R2

$\overset{R_1 - R_2 \rightarrow R_1} { \left[\begin{array}{ccc|ccc} 1 & 1 & 0 & 1 & -1 & 0 \\\ 3 & 1 & 2 & 0 & 1 & 0 \\\ 2 & 1 & 2 & 0 & 0 & 1 \end{array}\right] }$##### Step 2: Subtract R2 with R3

$\overset{R_2 - R_3 \rightarrow R_2} { \left[\begin{array}{ccc|ccc} 1 & 1 & 0 & 1 & -1 & 0 \\\ 3 & 1 & 2 & 0 & 1 & 0 \\\ 1 & 0 & 0 & 0 & 1 & -1 \end{array}\right] }$##### Step 3: Subtract R1 with R2

$\overset{R_1 - R_2 \rightarrow R_1} { \left[\begin{array}{ccc|ccc} -2 & 0 & -2 & 1 & -2 & 0 \\\ 3 & 1 & 2 & 0 & 1 & 0 \\\ 1 & 0 & 0 & 0 & 1 & -1 \end{array}\right] }$##### Step 4: Swap R1 with R2

$\overset{R_1 \leftrightarrow R_2} { \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & -1 \\\ 0 & 1 & 0 & 1 & -2 & 1 \\\ 2 & 1 & 2 & 0 & 0 & 1 \end{array}\right] }$##### Step 5: Subtract R3 with R2 and 2R1

$\overset{R_3 - R_2 - 2R_1 \rightarrow R_3} { \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & -1 \\\ 0 & 1 & 0 & 1 & -2 & 1 \\\ 0 & 0 & 2 & -1 & 0 & 2 \end{array}\right] }$##### Step 6: Divide R3 by 2

$\overset{\frac{R3}{2} \rightarrow R_3} { \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & -1 \\\ 0 & 1 & 0 & 1 & -2 & 1 \\\ 0 & 0 & 1 & -\frac{1}{2} & -1 & 1 \end{array}\right] }$#### Conclusion

$\left[\begin{array}{c|c} I & A^{-1} \end{array}\right] = \left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & -1 \\\ 0 & 1 & 0 & 1 & -2 & 1 \\\ 0 & 0 & 1 & -\frac{1}{2} & -1 & 1 \end{array}\right]$#### Proof

For those who are really curious on why this works, I've got you covered!. Let's dive even deeper inside this math for your curiosity and to justify myself spending hours on this that isn't even about machine learning anymore.

##### So, like before we have this augmented matrix

$\left[\begin{array}{c|c} A & I \end{array}\right]$##### Our goal is to use elementary transformations to change A to I.

So it should look something like this:

$E_k E_{k-1} \cdots E_2 E_1 A = I$With Ek, ..., E2, E1 being the transformations we did, so with the example I showed, we did 6 transformations that made A into I

P.S. That's why just for today I added steps :D

##### Then, on both sides, we multiply by inverse A

$E_k E_{k-1} \cdots E_2 E_1 AA^{-1} = IA^{-1}$##### Remember the property of matrix and its inverse? We'll apply them here!

$AA^{-1} = I \\\ IA^{-1} = A^{-1}$Making the equation into:

$E_k E_{k-1} \cdots E_2 E_1 I = A^{-1}$##### Finally, we get the full formula

$E_k E_{k-1} \cdots E_2 E_1 \left[\begin{array}{c|c} A & I \end{array}\right] = \left[\begin{array}{c|c} I & A^{-1} \end{array}\right]$Meaning that with elementary transformation we can inverse the matrix!

##### Side note:

###### I skipped a section called Minus-1 Trick, I didn't feel it gave anything to the solution given we can already answer the problem being given on the example with just the normal method, but it's should be called out :D

## Acknowledgement

I can't overstate this: I'm truly grateful for this book being open-sourced for everyone. Many people will be able to learn and understand machine learning on a fundamental level. Whether changing careers, demystifying AI, or just learning in general, this book offers immense value even for *fledgling composer* such as myself. So, Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, thank you for this book.

Source: Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge: Cambridge University Press. https://mml-book.com