Learn with me: Linear Algebra for Data Science— Part 1: Vectors

Matthew Macias
5 min readJun 30, 2022

Have you ever wondered what’s going on under the hood of those machine learning packages? Me too! So I decided it was time that I brushed off those math textbooks and rediscovered the importance of Linear Algebra for its applications in data science. What better way to remember it, than by writing about it. So feel free to follow along with my journey, hopefully some of you will find it helpful!

Vectors

Generally speaking vectors can be used to represent any number of datasets. Lets say for instance we are presented with the below table:

This data is completely made up, so if any geneticists are viewing this, close your eyes

If we imagine that every row within this table is a person and their respective results for the different genes, then we can represent all of them as a unique vector. This would look like the below:

More intuitively we can plot each of these vectors on a 2D plane to get a further understanding of vector representation.

As you can see each person becomes a unique vector in 2D space with the two values within the vector being their respective x and y values.

So then the question arises as to what the actual benefit of using vectors to represent data is? Well it allows us to leverage linear algebra and all of its associated formulas to find patterns and or relationships in our data. A HUGE benefit of this is its ability to scale to n dimensions. For example we can continue to rely on those formula even if the above example had 1000+ genes.

Vector Operations

There is an almost seemingly endless amount of vector operations so I will try and stick to the ones that I think are most appropriate for data science.

Vector Dot products

Dot products are a massively important concept in linear algebra and will continue to be important even into the more advanced topics. Think of a dot product as the sum of element wise multiplication of two vectors. In the case of the dot product of Person 1 and Person 2:

Python, in particular NumPy makes this process super simple, with just a few lines of code you can implement the dot product.

import numpy as np p_1 = np.array([0.88, 0.26])
p_2 = np.array([0.97, 0.15])
# Can also use the np.dot() function to achieve the same output.
p_1 @ p_2

At this stage all the dot product is telling us is about the angle between the two vectors. We can interpret the output of vector dot products as:

  • The angle between the vectors is obtuse if the dot product is < 0
  • The angle between the vectors is acute if the dot product is > 0
  • The vectors are orthogonal (at right angles) if the dot product = 0

The result of the vector dot product is quite vague however we will see later on that it will be used to calculate the actual angle between two vectors. Before we get to that its worth mentioning another handy trick that the dot product has, and thats when you do a dot product of a vector with itself, it returns the squared length of the vector. To calculate the length of the vector Person 1:

To do this in Python we would execute the below:

import numpy as np p_1 = np.array([0.88, 0.26]) np.sqrt(p_1 @ p_1)

Cosine Similarity

Cosine similarity builds on vector dot products and allows us to calculate the exact angles between vectors. It is given by the formula:

Hopefully at this point the cosine formula makes complete sense to you! Put simply, the numerator is the dot product between the two vectors of interest and the denominator is the length of the two vectors multiplied. Building on our previous example lets implement cosine similarity between Person 1 and Person 2.

p_1 = np.array([0.88, 0.26])
p_2 = np.array([0.97, 0.15])
# np.linalg.norm() is a replacement for np.sqrt(A @ A)
(p_1 @ p_2) / (np.linalg.norm(p_1) * np.linalg.norm(p_2))

The only new addition here is the np.linalg.norm function which returns the magnitude (length) of a vector so we don’t have to go through the process of doing the square root of the dot product. If we have done this correctly we should see that Person 1 and Person 2 are the most similar!

The higher the number the more similar the vectors are

We can confirm that these numbers make sense by looking at the plot of the vectors above, the angle between Person 1 and Person 2 is much smaller than the angle between Person 1 and Person 3.

Cosine similarity has many applications within machine learning, particularly areas that require sparse matrices (we will touch on those in later articles!). Some examples are document similarity within the field of natural language processing, recommendation systems (think Netflix) and even image similarity in computer vision.

Be sure to be on the lookout for the following parts of this series so we can master Linear Algebra together.

--

--

Matthew Macias

Eternal learner trying to make Data Science more accessible.