What Is Collaborative Filtering: A Simple Introduction

Collaborative filtering is a method for recommending items to users based on how users with similar preferences have interacted with those same items. Here’s how it works, a helpful example and some pros and cons to consider.

Written by Vihar Kurama
collaborative filtering
Image: Shutterstock / Built In
Brand Studio Logo
UPDATED BY
Matthew Urwin | Jan 22, 2025
REVIEWED BY

Whether you look at a video on YouTube, a movie on Netflix or a product on Amazon, you’re going to get recommendations for more things to view, like or buy. You can thank the advent of machine learning algorithms and recommender systems for this development. 

What Is Collaborative Filtering?

Collaborative filtering filters information by using the interactions and data collected by the system from other users. It’s based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future.

Recommender systems are far-reaching in scope, so we’re going to zero in on an important approach called collaborative filtering, which filters information by using the interactions and data collected by the system from other users. It’s based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future.

 

How Recommender Systems Work (Netflix/Amazon) | Video: Art of the Problem

A Quick Primer on Recommender Systems

A recommender system is a subclass of information filtering that seeks to predict the “rating” or “preference” a user will give an item, such as a product, movie or song.

Recommender systems provide personalized information by learning the user’s interests through traces of interaction with that user. Much like machine learning algorithms, a recommender system makes a prediction based on a user’s past behaviors. Specifically, it’s designed to predict user preference for a set of items based on experience.

Mathematically, a recommendation task is set to be:

  • Set of users (U).
  • Set of items (I) that are to be recommended to U.
  • Learn a function based on the user’s past interaction data that predicts the likeliness of item I to U.

Recommender systems are broadly classified into two types based on the data being used to make inferences:

  1. Content-based filtering, which uses item attributes.
  2. Collaborative filtering, which uses user behavior (interactions) in addition to item attributes.

Some key examples of recommender systems at work include:

  • Product recommendations on Amazon and other shopping sites.
  • Movie and TV show recommendations on Netflix.
  • Article recommendations on news sites.

 

What Is Collaborative Filtering?

Collaborative filtering filters information by using the interactions and data collected by the system from other users. It’s based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future.

The concept is simple: when we want to find a new movie to watch we’ll often ask our friends for recommendations. Naturally, we have greater trust in the recommendations from friends who share tastes similar to our own.

Most collaborative filtering systems apply the so-called similarity index-based technique. In the neighborhood-based approach, a number of users are selected based on their similarity to the active user. Inference for the active user is made by calculating a weighted average of the ratings of the selected users.

Collaborative filtering systems focus on the relationship between users and items. The similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items.

There are two classes of Collaborative Filtering: 

  • User-based, which measures the similarity between target users and other users.
  • Item-based, which measures the similarity between the items that target users rate or interact with and other items.

 

How Collaborative Filtering Works

Collaborative filtering uses a matrix to track user behavior for different items, converting matrix values into data points that can be mapped onto a vector space. There are then multiple types of metrics that can calculate the similarities between users for each item.  

User-Item Matrix 

A user-item matrix seeks to find the similarities between users by breaking down all users into smaller groups of users who demonstrate similar behavior when interacting with different items. In this matrix, users may be represented in rows and items in columns. The value that corresponds to each user-item interaction can be binary (‘yes/no’ product ratings) or continuous (product rating along a numerical range).  

A filtering algorithm analyzes these data points and identifies users with similar tastes, preferences and other behaviors. It then groups users into clusters of similar users to predict what products or recommendations will likely resonate with each cluster. 

Similarity Score

To determine whether two users are similar, collaborative filtering algorithms rely on the assumption that similar data points lie close to each other in a vector space. There are many metrics for calculating whether similarity exists, but two of the most popular ones are cosine similarity and the Pearson correlation coefficient (PCC). 

  • Cosine similarity: Measures similarity as the cosine of the angle between two vectors. Numerical values fall within a range of -1 and 1, with a higher score indicating a higher degree of similarity between two vectors.  
  • Pearson correlation coefficient (PCC): Measures similarity by calculating the correlation between users’ ratings. Like cosine similarity, the value produced falls within a range of -1 and 1, with a higher score indicating a stronger correlation. Unlike cosine similarity, PCC takes into account every rating for each user to calculate the correlation between two users.   

 

Collaborative Filtering Using Python 

Collaborative methods are typically worked out using a utility matrix. The task of the recommender model is to learn a function that predicts the utility of fit or similarity to each user. The utility matrix is typically very sparse, huge and has removed values.

In the following matrices, each row represents a user, while the columns correspond to different films by Pixar, except the last one which records the similarity between that user and the target user. Each cell represents the rating that the user gives to that movie. The cosine similarity is the simplest algorithm needed to find the similarity of the vectors. The last, which is the utility matrix following the first matrix, contains only partial data, which is needed to predict the likeliness of the expected rating by the “root” that could be given by the user.

cosine_similarity(p, q) = p.q

____

|p|.|q|

collaborative filtering table

cosine_similarity(joe, beck) =

collaborative filtering with python

When a new user joins the platform, we apply the simplest algorithm that computes cosine or correlation similarity of rows (users) or columns (movies) and recommends items that are k-nearest neighbors.

CF table 2

A few equations that can deal with the question of similarity measures include:

  • Pearson similarity
  • Jaccard similarity
  • Spearman rank correlation
  • Mean squared differences
  • Proximity–impact–popularity similarity

 

Advantages of Collaborative Filtering

Collaborative filtering offers users a number of benefits, including more personalized recommendations for products and services.  

Recommendations Become More Personalized Over Time

Collaborative filtering algorithms provide users with recommendations that are relevant to their preferences. As these algorithms gather more data on user behavior, they can improve their accuracy and offer users even more personalized recommendations. This can lead to a more enhanced user experience over time.   

Users Are Exposed to New Products 

If a group of users gives high ratings for certain products, a collaborative filtering system can recommend those products to a user who demonstrates similar behavior but hasn’t viewed these products yet. This process enables users to discover new items that they wouldn’t have found otherwise, expanding their options.    

Domain Knowledge Is Not Required

Because collaborative filtering only needs user behavior data to function, domain knowledge isn’t necessary for this method. That means filtering algorithms don’t need to understand the ins and outs of specific industries, so they can be easily applied across sectors like e-commerce and entertainment services. 

Performance Is Independent of Product Details

Collaborative filtering algorithms also don’t need to compile in-depth data on a product’s features. They simply track users’ interactions to predict users’ preferences and make informed recommendations. As a result, collaborative filtering doesn’t depend on contextual information, adding to its convenience.  

 

Disadvantages of Collaborative Filtering

While collaborative filtering can connect users to even more useful recommendations, there are some downsides to consider. 

Filtering Algorithms Are Susceptible to the ‘Cold Start’ Problem

New users who enter the system have no historical data or user interactions tied to them. Without any data to go off of, collaborative filtering algorithms will fail to offer users personalized recommendations. This is what’s known as the “cold start” problem, and it’s an issue that filtering algorithms are susceptible to with every new user.   

Data Sparsity Undermines Algorithmic Accuracy

Filtering algorithms depend on users interacting with items, especially sharing product ratings. But users may not always choose to rate a product, leaving a limited amount of data for algorithms to work with. Known as data sparsity, this problem impacts the accuracy of filtering algorithms and leads to more random recommendations. 

Filtering Algorithms Experience Scaling Issues

Collaborative filtering systems often struggle to handle massive volumes of data. That’s because adding new users and products to a collaborative filtering system strains its computational resources. Because of their inability to scale effectively, filtering algorithms can only handle so many users before dropping in performance.  

Popular Items Tend to Get More Attention

Since collaborative filtering uses historical user data to group similar users and make recommendations, products with fewer interactions or ratings are ignored and popular items with more recorded interactions are recommended more often. This creates a vicious cycle where just a few popular items are suggested to all users, resulting in less diverse recommendations.

Frequently Asked Questions

Collaborative filtering is a method that recommends items to a user by analyzing how users with similar preferences have interacted with those items. The idea is that users who have similar preferences for one item will likely have similar preferences for other items.

A common example of collaborative filtering is Netflix’s recommendation engine. It compiles data on a user’s viewing habits and compares them with the habits of similar users. Based on those users’ behaviors, the engine recommends shows and movies a user hasn’t seen yet but are preferred by users with related viewing habits.

Collaborative filtering can deliver more personalized recommendations to users and introduce them to relevant products they haven’t discovered yet. It also doesn’t depend on domain knowledge or in-depth product details, making it a versatile method that can be applied to industries ranging from e-commerce to streaming services.

Explore Job Matches.