Whether you look at a video on YouTube, a movie on Netflix or a product on Amazon, you’re going to get recommendations for more things to view, like or buy. You can thank the advent of machine learning algorithms and recommender systems for this development.
What Is Collaborative Filtering?
Collaborative filtering filters information by using the interactions and data collected by the system from other users. It’s based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future.
Recommender systems are far-reaching in scope, so we’re going to zero in on an important approach called collaborative filtering, which filters information by using the interactions and data collected by the system from other users. It’s based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future.
A Quick Primer on Recommender Systems
A recommender system is a subclass of information filtering that seeks to predict the “rating” or “preference” a user will give an item, such as a product, movie or song.
Recommender systems provide personalized information by learning the user’s interests through traces of interaction with that user. Much like machine learning algorithms, a recommender system makes a prediction based on a user’s past behaviors. Specifically, it’s designed to predict user preference for a set of items based on experience.
Mathematically, a recommendation task is set to be:
- Set of users (U).
- Set of items (I) that are to be recommended to U.
- Learn a function based on the user’s past interaction data that predicts the likeliness of item I to U.
Recommender systems are broadly classified into two types based on the data being used to make inferences:
- Content-based filtering, which uses item attributes.
- Collaborative filtering, which uses user behavior (interactions) in addition to item attributes.
Some key examples of recommender systems at work include:
- Product recommendations on Amazon and other shopping sites.
- Movie and TV show recommendations on Netflix.
- Article recommendations on news sites.
What Is Collaborative Filtering?
Collaborative filtering filters information by using the interactions and data collected by the system from other users. It’s based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future.
The concept is simple: when we want to find a new movie to watch we’ll often ask our friends for recommendations. Naturally, we have greater trust in the recommendations from friends who share tastes similar to our own.
Most collaborative filtering systems apply the so-called similarity index-based technique. In the neighborhood-based approach, a number of users are selected based on their similarity to the active user. Inference for the active user is made by calculating a weighted average of the ratings of the selected users.
Collaborative filtering systems focus on the relationship between users and items. The similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items.
There are two classes of Collaborative Filtering:
- User-based, which measures the similarity between target users and other users.
- Item-based, which measures the similarity between the items that target users rate or interact with and other items.
How Collaborative Filtering Works
Collaborative filtering uses a matrix to track user behavior for different items, converting matrix values into data points that can be mapped onto a vector space. There are then multiple types of metrics that can calculate the similarities between users for each item.
User-Item Matrix
A user-item matrix seeks to find the similarities between users by breaking down all users into smaller groups of users who demonstrate similar behavior when interacting with different items. In this matrix, users may be represented in rows and items in columns. The value that corresponds to each user-item interaction can be binary (‘yes/no’ product ratings) or continuous (product rating along a numerical range).
A filtering algorithm analyzes these data points and identifies users with similar tastes, preferences and other behaviors. It then groups users into clusters of similar users to predict what products or recommendations will likely resonate with each cluster.
Similarity Score
To determine whether two users are similar, collaborative filtering algorithms rely on the assumption that similar data points lie close to each other in a vector space. There are many metrics for calculating whether similarity exists, but two of the most popular ones are cosine similarity and the Pearson correlation coefficient (PCC).
- Cosine similarity: Measures similarity as the cosine of the angle between two vectors. Numerical values fall within a range of -1 and 1, with a higher score indicating a higher degree of similarity between two vectors.
- Pearson correlation coefficient (PCC): Measures similarity by calculating the correlation between users’ ratings. Like cosine similarity, the value produced falls within a range of -1 and 1, with a higher score indicating a stronger correlation. Unlike cosine similarity, PCC takes into account every rating for each user to calculate the correlation between two users.
Collaborative Filtering Using Python
Collaborative methods are typically worked out using a utility matrix. The task of the recommender model is to learn a function that predicts the utility of fit or similarity to each user. The utility matrix is typically very sparse, huge and has removed values.
In the following matrices, each row represents a user, while the columns correspond to different films by Pixar, except the last one which records the similarity between that user and the target user. Each cell represents the rating that the user gives to that movie. The cosine similarity is the simplest algorithm needed to find the similarity of the vectors. The last, which is the utility matrix following the first matrix, contains only partial data, which is needed to predict the likeliness of the expected rating by the “root” that could be given by the user.
cosine_similarity(p, q) = p.q
____
|p|.|q|
cosine_similarity(joe, beck) =
When a new user joins the platform, we apply the simplest algorithm that computes cosine or correlation similarity of rows (users) or columns (movies) and recommends items that are k-nearest neighbors.
A few equations that can deal with the question of similarity measures include:
- Pearson similarity
- Jaccard similarity
- Spearman rank correlation
- Mean squared differences
- Proximity–impact–popularity similarity
Advantages of Collaborative Filtering
Collaborative filtering offers users a number of benefits, including more personalized recommendations for products and services.
Recommendations Become More Personalized Over Time
Collaborative filtering algorithms provide users with recommendations that are relevant to their preferences. As these algorithms gather more data on user behavior, they can improve their accuracy and offer users even more personalized recommendations. This can lead to a more enhanced user experience over time.
Users Are Exposed to New Products
If a group of users gives high ratings for certain products, a collaborative filtering system can recommend those products to a user who demonstrates similar behavior but hasn’t viewed these products yet. This process enables users to discover new items that they wouldn’t have found otherwise, expanding their options.
Domain Knowledge Is Not Required
Because collaborative filtering only needs user behavior data to function, domain knowledge isn’t necessary for this method. That means filtering algorithms don’t need to understand the ins and outs of specific industries, so they can be easily applied across sectors like e-commerce and entertainment services.
Performance Is Independent of Product Details
Collaborative filtering algorithms also don’t need to compile in-depth data on a product’s features. They simply track users’ interactions to predict users’ preferences and make informed recommendations. As a result, collaborative filtering doesn’t depend on contextual information, adding to its convenience.
Disadvantages of Collaborative Filtering
While collaborative filtering can connect users to even more useful recommendations, there are some downsides to consider.
Filtering Algorithms Are Susceptible to the ‘Cold Start’ Problem
New users who enter the system have no historical data or user interactions tied to them. Without any data to go off of, collaborative filtering algorithms will fail to offer users personalized recommendations. This is what’s known as the “cold start” problem, and it’s an issue that filtering algorithms are susceptible to with every new user.
Data Sparsity Undermines Algorithmic Accuracy
Filtering algorithms depend on users interacting with items, especially sharing product ratings. But users may not always choose to rate a product, leaving a limited amount of data for algorithms to work with. Known as data sparsity, this problem impacts the accuracy of filtering algorithms and leads to more random recommendations.
Filtering Algorithms Experience Scaling Issues
Collaborative filtering systems often struggle to handle massive volumes of data. That’s because adding new users and products to a collaborative filtering system strains its computational resources. Because of their inability to scale effectively, filtering algorithms can only handle so many users before dropping in performance.
Popular Items Tend to Get More Attention
Since collaborative filtering uses historical user data to group similar users and make recommendations, products with fewer interactions or ratings are ignored and popular items with more recorded interactions are recommended more often. This creates a vicious cycle where just a few popular items are suggested to all users, resulting in less diverse recommendations.
Frequently Asked Questions
What is collaborative filtering in simple terms?
Collaborative filtering is a method that recommends items to a user by analyzing how users with similar preferences have interacted with those items. The idea is that users who have similar preferences for one item will likely have similar preferences for other items.
What is a real-life example of collaborative filtering?
A common example of collaborative filtering is Netflix’s recommendation engine. It compiles data on a user’s viewing habits and compares them with the habits of similar users. Based on those users’ behaviors, the engine recommends shows and movies a user hasn’t seen yet but are preferred by users with related viewing habits.
What are the benefits of collaborative filtering?
Collaborative filtering can deliver more personalized recommendations to users and introduce them to relevant products they haven’t discovered yet. It also doesn’t depend on domain knowledge or in-depth product details, making it a versatile method that can be applied to industries ranging from e-commerce to streaming services.