Movie Recommendation System

Overview

This project implements a movie recommendation system using various NLP techniques to match user preferences with movie descriptions. TF-IDF, lemmatized TF-IDF, SVD-reduced TF-IDF, and SBERT embeddings to provide content recommendations based on user inputs.

Folder Structure

movie-recommendation-system/
├── cleaned/                  # Preprocessed data files (ignored in git)
│   └── filtered_df.pkl      # Filtered dataset
├── data/                    # Raw data 
│   └── tmdb_5000_movies.csv
├── models/                  # Model implementations
│   ├── sbert.py            # Sentence-BERT model
│   ├── tfidf.py            # Basic TF-IDF model
│   ├── tfidf_lemmatized.py # Lemmatized TF-IDF
│   └── tfidf_svd.py        # TF-IDF with SVD
├── outputs/                 # Recommendation outputs (ignored in git)
├── client.py               # CLI interface
├── preprocessing.py        # Data preprocessing scripts
├── README.md        # this file
├── demo.md        # link to video demo
└── requirements.txt        # Project dependencies

Dataset

We used a publically available dataset from Kaggle called the TMDb 5000 Movie Dataset. This is a list of around 5k popular movies with plot overviews and other related data collected around 7 years ago. The database was generated using the TMDb API.

Setup

Clone and navigate to root

git clone https://github.com/lous-e/movie-recommendation-system
cd movie-recommendation-system

Install dependencies

Create a new virtual environment using any tool you prefer. We use venv for this example

Windows

python -m venv venv
venv/Scripts/activate

MacOS

python3 -m venv venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt
python load_models.py

Usage

CLI Tool

python client.py --desc "I like action movies set in space" --topn 5 --model tfidf --out recommendations

Arguments

Argument	Description	Type	Default
--desc	User input describing preference	str	Required
--topn	Number of top recommendations to return	int	5 (Max 10)
--model	Model type for recommendations	str	tfidf
--out	Output file name (saved in outputs/)	str	output

Currently, the following models are supported:

tfidf: Returns the top-n movies sorted by descending order of tf-idf.
tfidf-lemmatized: Lemmatizes the words before tf-idf.
tfidf-svd: Performs SVD on tfidf matrices to reduce dimensionality.
sbert: Uses SBERT embeddings for semantic similarity matching

Outputs

The recommendations are saved in outputs/{out}.txt with details including movie title, similarity score, and overview.

Results

Top-5 movie recommendations for sample query

I like space adventure films

tfidf
1. The Kentucky Fried Movie (Similarity: 0.4654)
2. Space Pirate Captain Harlock (Similarity: 0.2361)
3. A Haunted House (Similarity: 0.2192)
4. Metallica: Through the Never (Similarity: 0.1830)
5. Lifeforce (Similarity: 0.1696)
tfidf-lemmatized
1. The Kentucky Fried Movie (Similarity: 0.4654)
2. Space Pirate Captain Harlock (Similarity: 0.2361)
3. A Haunted House (Similarity: 0.2192)
4. Metallica: Through the Never (Similarity: 0.1830)
5. Lifeforce (Similarity: 0.1696)
tfidf-svd
1. Lost in Space (Similarity: 0.4337)
2. Space Pirate Captain Harlock (Similarity: 0.4144)
3. Moonraker (Similarity: 0.3778)
4. Deck the Halls (Similarity: 0.3716)
5. The Kentucky Fried Movie (Similarity: 0.3600)
sbert
1. Interstellar (Similarity: 0.4547)
2. You Only Live Twice (Similarity: 0.4534)
3. Sea Rex 3D: Journey to a Prehistoric World (Similarity: 0.4388)
4. My Big Fat Independent Movie (Similarity: 0.4096)
5. Galaxy Quest (Similarity: 0.4080)

Future Work

Model Improvements
- Add collaborative filtering based on user ratings
- Incorporate more advanced transformer models
Real-time movie data updates
Deployment
- Dockerization
- Endpoints using FastAPI
- Streamlit frontend
Evaluation

Expectations

Commented in PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Recommendation System

Overview

Folder Structure

Dataset

Setup

Clone and navigate to root

Install dependencies

Windows

MacOS

Install dependencies

Usage

CLI Tool

Arguments

Outputs

Results

Future Work

Expectations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
data		data
models		models
.gitignore		.gitignore
ORIGINAL_README.md		ORIGINAL_README.md
README.md		README.md
client.py		client.py
demo.md		demo.md
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Movie Recommendation System

Overview

Folder Structure

Dataset

Setup

Clone and navigate to root

Install dependencies

Windows

MacOS

Install dependencies

Usage

CLI Tool

Arguments

Outputs

Results

Future Work

Expectations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages