- The paper “Organizing music, organizing gender: algorithmic culture and Spotify recommendations” focuses on isolating music based on Spotify functions such as related artists, discover, and browse. By studying each of Spotify’s features, this study found that grouping songs based on how frequently a user interacts with “related artists” and “discover” Spotify features aided most in determining which songs a user prefers.
- The paper "A Smart Spotify Assistance and Recommendation System" aligns significantly with our project's objective of constructing a machine-learning model for personalized Spotify recommendations. It uses Spotify's Web API and Spotify library to draw correlations between users' listening habits and song attributes.
- In a study titled, The Utilization of Content Based Filtering for Spotify Music Recommendation, Anthony et.al created a recommendation system for Spotify users using cosine similarity. They were able to reach up to 80% similarity for the song and 50% similarity for the artist. The songs were able to be turned into vectors by One-Hot encoding, where a vector was created from assigning a 0 or a 1 for each categorical column in the dataset.
- Dataset: "Spotify Tracked Dataset" from Hugging Face.
- Contains attributes like danceability, energy, artists, album name, popularity, loudness, etc.
- Link to Dataset
Our project addresses the prevalent dissatisfaction with Spotify's song recommendations, where users often find themselves skipping tracks, leading to inefficient music discovery. Leveraging the "Spotify Tracked Dataset," we aim to develop a machine-learning model to provide personalized song recommendations that resonate with individual users. This initiative stems from our personal frustrations with Spotify's recommendation system, motivating us to enhance user satisfaction and streamline music discovery.
- Feature Scaling:
- Method: Min-Max Scaling (scikit-learn.preprocessing).
- Reason: Ensures uniform feature magnitude.
- One-Hot Encoding:
- Method: OneHotEncoder (scikit-learn.preprocessing).
- Reason: Converts categorical variables into a numerical format.
- Handling Missing Values:
- Method: KNNImputer (scikit-learn.impute).
- Reason: Filling in missing data.
- Collaborative Filtering:
- Algorithm: SVD.
- Reason: Uses user-item interaction data.
- Content-Based Filtering:
- Algorithm: TF-IDF or Cosine Similarity.
- Reason: Recommends songs based on characteristics.
- Matrix Factorization:
- Algorithm: Non-negative Matrix Factorization.
- Reason: Identifies latent features in the user-item interaction matrix.
- Supervised Learning Methods:
- Random Forest:
- Algorithm: RandomForestClassifier.
- Reason: Handles non-linearity and high dimensionality.
- Gradient Boosting Machines:
- Algorithm: GradientBoostingClassifier.
- Reason: Sequentially adds weak learners to minimize errors.
- Random Forest:
Random Forest and Collaborative Filtering are chosen for the Spotify recommendation project due to their complementary strengths. Random Forest offers robustness against overfitting and noisy data, essential for capturing diverse user preferences in music. Its ability to reveal feature importance aids in understanding what aspects of songs influence recommendations. Meanwhile, Collaborative Filtering excels in leveraging user behavior data, such as listening history, to identify patterns and make personalized recommendations. By combining these approaches, the project can provide accurate and tailored song suggestions based on users' recent listening activities, enhancing their overall music discovery experience on Spotify.
We plan to evaluate our model's performance using precision, recall, and F1-score metrics. Our goal is to enhance user satisfaction by providing personalized song recommendations, thereby reducing track skipping and improving engagement. We expect our algorithm to significantly increase recommendation precision and recall, introducing users to a variety of songs that closely match their interests and enriching their music discovery experience on Spotify.
Random Forest:
Accuracy: 0.9999122807017544
Precision: 1.0
Recall: 0.8
F1 Score: 0.8888888888888888
Confusion Matrix:
[[22790 0]
[ 2 8]]
Content-based Filtering (Cosine Similarity):
Precision: 0.7
Recall: 0.0835
Analysis of 1+ Algorithm:
One of the algorithms we implemented for song recommendation is content-based filtering. This algorithm utilizes a combination of textual and numerical features derived from song metadata, including artist names, album titles, track names, and genres, as well as quantitative attributes like popularity and duration. The approach first enriches the dataset by creating a unified feature representation through TF-IDF vectorization of textual data and normalization of numerical attributes. This comprehensive feature set is then dimensionally reduced using Truncated SVD to make the dataset more manageable and to emphasize the most informative aspects of the data. Recommendations are generated based on cosine similarity, identifying songs with feature vectors most similar to those of a given song, even chunking calculations to accommodate large datasets.
As we continue to refine our model, we'll focus on cleaning up our data even more by fixing any missing values and ensuring all features are properly scaled. We'll also switch from using random forests to trying out gradient-boosting algorithms, which are known to offer better results. Additionally, we'll expand our repertoire of machine learning models alongside our collaborative filtering approach to further enhance the quality of our recommendations. To keep track of our progress, we'll use metrics like precision and cross-validation. And of course, we'll listen to feedback and make adjustments to ensure our system consistently provides the best recommendations possible.
- cs4641proj-sp24/
: The main directory containing all of our files for the project
- cs4641proj-sp24/collaborative_filtering.ipynb
: Jupyter notebook containing collaborative filtering implementation.
- cs4641proj-sp24/dataset.csv
: Dataset used for collaborative filtering.
- cs4641proj-sp24/featureScaling.py
: Python script for feature scaling.
- cs4641proj-sp24/processed_dataset.csv
: Processed dataset after feature scaling.
- cs4641proj-sp24/randomforest.ipynb
: Jupyter notebook containing random forest implementation.
- Werner, A. (2020). Organizing music, organizing gender: algorithmic culture and Spotify recommendations. Popular Communication, 18(1), 1–13. DOI
- K. Allawadi and C. Vij, (2023). A Smart Spotify Assistance and Recommendation System. 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), 286-291. DOI
- J. T. Anthony, G. E. Christian, V. Evanlim, H. Lucky, and D. Suhartono, (2022). The Utilization of Content Based Filtering for Spotify Music Recommendation. 2022 International Conference on Informatics Electrical and Electronics (ICIEE), 1-4. DOI