Learn with me: Linear Algebra for Data Science— Part 1: Vectors
Have you ever wondered what’s going on under the hood of those machine learning packages? Me too! So I decided it was time that I brushed off those math textbooks and rediscovered the importance of Linear Algebra for its applications in data science. What better way to remember it, than by writing about it. So feel free to follow along with my journey, hopefully some of you will find it helpful!
Vectors
Generally speaking vectors can be used to represent any number of datasets. Lets say for instance we are presented with the below table:
If we imagine that every row within this table is a person and their respective results for the different genes, then we can represent all of them as a unique vector. This would look like the below:
More intuitively we can plot each of these vectors on a 2D plane to get a further understanding of vector representation.
As you can see each person becomes a unique vector in 2D space with the two values within the vector being their respective x and y values.
So then the question arises as to what the actual benefit of using vectors to represent data is? Well it allows us to leverage linear algebra and all of its associated formulas to find patterns and or relationships in our data. A HUGE benefit of this is its ability to scale to n dimensions. For example we can continue to rely on those formula even if the above example had 1000+ genes.
Vector Operations
There is an almost seemingly endless amount of vector operations so I will try and stick to the ones that I think are most appropriate for data science.
Vector Dot products
Dot products are a massively important concept in linear algebra and will continue to be important even into the more advanced topics. Think of a dot product as the sum of element wise multiplication of two vectors. In the case of the dot product of Person 1 and Person 2:
Python, in particular NumPy makes this process super simple, with just a few lines of code you can implement the dot product.
import numpy as np p_1 = np.array([0.88, 0.26])
p_2 = np.array([0.97, 0.15]) # Can also use the np.dot() function to achieve the same output.
p_1 @ p_2
At this stage all the dot product is telling us is about the angle between the two vectors. We can interpret the output of vector dot products as:
- The angle between the vectors is obtuse if the dot product is < 0
- The angle between the vectors is acute if the dot product is > 0
- The vectors are orthogonal (at right angles) if the dot product = 0
The result of the vector dot product is quite vague however we will see later on that it will be used to calculate the actual angle between two vectors. Before we get to that its worth mentioning another handy trick that the dot product has, and thats when you do a dot product of a vector with itself, it returns the squared length of the vector. To calculate the length of the vector Person 1:
To do this in Python we would execute the below:
import numpy as np p_1 = np.array([0.88, 0.26]) np.sqrt(p_1 @ p_1)
Cosine Similarity
Cosine similarity builds on vector dot products and allows us to calculate the exact angles between vectors. It is given by the formula:
Hopefully at this point the cosine formula makes complete sense to you! Put simply, the numerator is the dot product between the two vectors of interest and the denominator is the length of the two vectors multiplied. Building on our previous example lets implement cosine similarity between Person 1 and Person 2.
p_1 = np.array([0.88, 0.26])
p_2 = np.array([0.97, 0.15]) # np.linalg.norm() is a replacement for np.sqrt(A @ A)
(p_1 @ p_2) / (np.linalg.norm(p_1) * np.linalg.norm(p_2))
The only new addition here is the np.linalg.norm function which returns the magnitude (length) of a vector so we don’t have to go through the process of doing the square root of the dot product. If we have done this correctly we should see that Person 1 and Person 2 are the most similar!
We can confirm that these numbers make sense by looking at the plot of the vectors above, the angle between Person 1 and Person 2 is much smaller than the angle between Person 1 and Person 3.
Cosine similarity has many applications within machine learning, particularly areas that require sparse matrices (we will touch on those in later articles!). Some examples are document similarity within the field of natural language processing, recommendation systems (think Netflix) and even image similarity in computer vision.
Be sure to be on the lookout for the following parts of this series so we can master Linear Algebra together.