Skip to content

Recommendations for Java using collaborative filtering

License

Notifications You must be signed in to change notification settings

ankane/disco-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disco Java

🔥 Recommendations for Java using collaborative filtering

  • Supports user-based and item-based recommendations
  • Works with explicit and implicit feedback
  • Uses high-performance matrix factorization

Build Status

Installation

For Maven, add to pom.xml under <dependencies>:

<dependency>
    <groupId>org.ankane</groupId>
    <artifactId>disco</artifactId>
    <version>0.1.0</version>
</dependency>

For other build tools, see this page.

Getting Started

Prep your data in the format userId, itemId, value

import org.ankane.disco.Dataset;

Dataset<String, String> data = new Dataset<>();
data.add("user_a", "item_a", 5.0f);
data.add("user_a", "item_b", 3.5f);
data.add("user_b", "item_a", 4.0f);

IDs can be integers, strings, or any other hashable data type

data.add(1, "item_a", 5.0f);

If users rate items directly, this is known as explicit feedback. Fit the recommender with:

import org.ankane.disco.Recommender;

Recommender<String, String> recommender = Recommender.fitExplicit(data);

If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Use 1.0 or a value like number of purchases or page views for the dataset, and fit the recommender with:

Recommender<String, String> recommender = Recommender.fitImplicit(data);

Get user-based recommendations - “users like you also liked”

recommender.userRecs(userId, 5);

Get item-based recommendations - “users who liked this item also liked”

recommender.itemRecs(itemId, 5);

Get predicted ratings for a specific user and item

recommender.predict(userId, itemId);

Get similar users

recommender.similarUsers(userId, 5);

Examples

MovieLens

Load the data

import org.ankane.disco.Data;

Dataset<Integer, String> data = Data.loadMovieLens();

Create a recommender

Recommender<Integer, String> recommender = Recommender
    .builder()
    .factors(20)
    .fitExplicit(data);

Get similar movies

recommender.itemRecs("Star Wars (1977)", 5);

Storing Recommendations

Save recommendations to your database.

Alternatively, you can store only the factors and use a library like pgvector-java. See an example.

Algorithms

Disco uses high-performance matrix factorization.

Specify the number of factors and iterations

Recommender<String, String> recommender = Recommender
    .builder()
    .factors(8)
    .iterations(20)
    .fitExplicit(data);

Progress

Pass a callback to show progress

Recommender<String, String> recommender = Recommender
    .builder()
    .callback((info) -> System.out.printf("%d: %f\n", info.iteration, info.trainLoss))
    .fitExplicit(data);

Note: trainLoss is not available for implicit feedback

Cold Start

Collaborative filtering suffers from the cold start problem. It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.

recommender.userRecs(newUserId, 5); // returns empty array

There are a number of ways to deal with this, but here are some common ones:

  • For user-based recommendations, show new users the most popular items
  • For item-based recommendations, make content-based recommendations

Reference

Get ids

recommender.userIds();
recommender.itemIds();

Get the global mean

recommender.globalMean();

Get factors

recommender.userFactors(userId);
recommender.itemFactors(itemId);

References

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/disco-java.git
cd disco-java
mvn test

About

Recommendations for Java using collaborative filtering

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages