Learn with me: Linear Algebra for Data Science — Part 2: Matrices

Matthew Macias
8 min readJul 18, 2022

Welcome to the continuation of the linear algebra series. If you haven’t already it would be helpful to cover off everything from Part 1 of the series before proceeding, but if you are already a linear algebra whizz, then carry on!

Introduction to Matrices

We started things off with vectors, and so now its time to meet another pivotal form in linear algebra, the matrix. For those working with data on a regular basis, you will be no stranger to the matrix form. Basically any table of data has a matrix representation. See the example below:

d x n matrix

Sorry if you were expecting something a little more exciting, but this is it! The good news is that its the operations that you can perform on matrices that makes them so incredibly useful. Creating a matrix in Python is extremely similar to how we previously created vectors.

import numpy as npA = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

Indexing

The image of the matrix above gives an indication about how we would go about indexing it. To do so, we first select the respective row, then column. It’s important to note that as with all Python indexing, it begins with 0 rather than 1. So to extract the number 8 from the matrix A we created earlier, we would select row 3, column 2.

# returns a single value - 8 from matrix A
A[3,2]
# return multiple values - 5 and 6 from matrix A
A[1,1:]

Size

The amazing thing about matrices is their ability to scale to high dimensions. For the purposes of visualisation (and to prevent my brain from exploding trying to comprehend 4+ dimensions) we will stick with 2 or 3 dimensional matrices. The most common way to write the size of a matrix is in the following order (samples, rows, columns).

To check the size of a matrix in Python you can simply call .shape directly on the array. For instance, to get the shape of the matrix A that we created earlier, we would run:

A.shape

This would return that our matrix is (3, 3). To give an example of a matrix with multiple samples, we would need to create a new matrix B.

B = np.array([[[1,2,3],
[4,5,6],
[7,8,9]],
[[1,2,3],
[4,5,6],
[7,8,9]]])
B.shape

What you will see if you run this code is (2, 3, 3). This can be interpreted as there being two matrices that are 3 x 3, which makes sense because matrix B was made from 2 x matrix A.

Rank

The rank of a matrix is the amount of linearly independent columns within the matrix. This is best explained by an example:

Lets say we have the above matrix C. The first thing we would do is take each column and view it as its own vector. Then the question would become; are there any multiples of previous vectors that will give me the current vector?

Lets start with the first column, the vector [1 1 1] . Its the first vector and nothing comes before it so it is NOT a linear combination of any other vectors. Moving to the next vector [2 1 2] , is that a linear combination of [1 1 1] ? No, its not, it doesn’t matter what we multiply the vector [1 1 1] by, it will never result in the vector [2 1 2] . Moving to the third column we can see that [3 3 3] is just 3 times the first vector [1 1 1] . We have encountered our first column that is a linear combination of another! And lucky last, vector [4 2 4] is just 2 times vector 2 [2 1 2] , so again it is a column that is a linear combination of another. Whilst the example above is extremely simple, linear dependence can be any combination of previous vectors, whether that be multiplying them by decimals, or adding hundreds of other vectors together.

So in the above example, the hard work is done. The rank is simply the number of columns (vectors) that are linearly independent, that is they are NOT a linear combination of the vectors before them. In our case, the first and second vectors were linearly independent, while the final two columns were not. Therefore the rank of matrix C is 2.

To calculate the rank of a matrix in Python, we would again use NumPy.

np.linalg.matrix_rank(C)

Some of you may be wondering what application this may have for Data Science. If we refer back to our example using matrix C, which has a rank of 2, we need only keep 2 columns (in practice it is more as you also need to store a combination matrix), because the remaining two columns can be recreated using what was kept. This has major applications in big data, where information can be compressed whilst maintaining majority of the original matrices information.

Matrix Multiplication

If you read Part 1 of this series you would have noticed that I mentioned how important dot products would become as we delved deeper into linear algebra. Well you didn’t have to wait too long to see more of their applications. Matrix multiplication is one of the most important applications of linear algebra for Data Science. The most common applications of matrix multiplication is between a matrix and a vector, as well as between two matrices. Lets explore the first situation below:

Matrix and Vector Dot Products

Lets say we have a matrix A and a Vector x, how would we go about doing their dot product?

As you can see a matrix and vector multiplication results in a vector, with as many rows as the input matrix. Its good to know how to do this conceptually but again NumPy does all the heavy lifting for us.

np.dot(A,B)

Matrix Dot Products

Matrix multiplication builds on what we just saw. However it is important to note that in order to be able to do the dot product between two matrices the second matrix needs to have the same amount of rows as the first matrix has columns (this is also true for matrix vector dot products where the vector needs to have as many entries as the matrix has columns).

Whilst this can look really confusing at first, its essentially just doing the dot product between the columns of matrix A and the rows of matrix B.

This will result in two matrices that you can then add together to get the final matrix.

If we do simple matrix addition we will end up with the same output as the dot product from the two original matrices. This is be no means the only way to perform matrix multiplication, but probably one of the more simple to conceptualise. Again NumPy can do all of the above for us:

np.dot(A,B)

Further Matrix Operations

There are many more matrix operations beyond the ones that we have discussed so far. We will cover off two more that are pivotal in Data Science.

Inversion

If we begin to think of matrices as having the potential to perform a linear transformation, that is, it takes an input and maps it to an output, then we are better placed to understand inversion. Let’s see the example below:

In this case the matrix A transforms the vector x into vector b. But what would we do if we did not know vector x, but we had the vector b? What could we do to reverse the process to see what the input vector was? That is where matrix inversion comes in! We can invert matrix A (note: not all matrices are invertible) and perform the dot product on the output b and it will return the original input vector x.

To perform the above operation in NumPy we would do the below:

np.dot(np.linalg.inv(A), b)

The keen eyed among you may have noticed that the Ax = b scenario looks quite similar to how we could summarise a linear regression model. Where A is the input data matrix, x is the coefficient matrix and b is our output vector that we are trying to predict. During training we have the input data matrix A and the output labels vector b, but no idea about the coefficient vector x. That is where inversion plays a major part!

Transpose

As we acknowledged earlier, in order to do a dot product between two matrices the second matrix must have the same amount of rows that the first matrix has columns. Often in Data Science this will not be the case, and so the transpose allows us to rotate a matrix so that we can meet this condition and proceed. We can think of transposing as flipping the matrix along its diagonal. See the example below:

To transpose a matrix in NumPy we can call the T method on the array directly.

A.T 

Laws for Matrix Operations

Its important to note that as with any branch of mathematics, the order of operations is important. That is also the case for linear algebra, in particular matrix operations. There are three main laws, and two of them hold for matrix operations.

  1. The commutative law AB = BA is broken for matrix operations.
  2. The distributive law A(B+C) = AB + AC holds for matrix operations
  3. The associative law A(BC) = (AB)C also holds for matrix operations

Conclusion

Hopefully you learnt a thing or two about matrices and their important applications within Data Science. It will be clear in following articles the significance of everything that has been covered in Part 1 and Part 2 of this series as it will underpin all of the more complex applications of linear algebra in Data Science, so stay tuned for more!

--

--

Matthew Macias

Eternal learner trying to make Data Science more accessible.