Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #124

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 15 additions & 10 deletions projects/twhin/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,24 @@
Twhin in torchrec
# Twhin in torchrec

This project contains code for pretraining dense vector embedding features for Twitter entities. Within Twitter, these embeddings are used for candidate retrieval and as model features in a variety of recommender system models.
This project contains code for pretraining dense vector embedding features for Twitter entities.
Within Twitter, these embeddings are used for candidate retrieval and as model features in a variety of recommender system models.

We obtain entity embeddings based on a variety of graph data within Twitter such as:
"User follows User"
"User favorites Tweet"
"User clicks Advertisement"
* "User follows User"
* "User favorites Tweet"
* "User clicks Advertisement"

While we cannot release the graph data used to train TwHIN embeddings due to privacy restrictions, heavily subsampled, anonymized open-sourced graph data can used:
https://huggingface.co/datasets/Twitter/TwitterFollowGraph
https://huggingface.co/datasets/Twitter/TwitterFaveGraph
While we cannot release the graph data used to train TwHIN embeddings due to privacy restrictions, heavily subsampled, anonymized open-sourced graph data can used:
* https://huggingface.co/datasets/Twitter/TwitterFollowGraph
* https://huggingface.co/datasets/Twitter/TwitterFaveGraph

The code expects parquet files with three columns: lhs, rel, rhs that refer to the vocab index of the left-hand-side node, relation type, and right-hand-side node of each edge in a graph respectively.
The code expects parquet files with three columns:
* lhs
* rel
* rhs
that refer to the vocab index of the left-hand-side node, relation type, and right-hand-side node of each edge in a graph respectively.

The location of the data must be specified in the configuration yaml files in projects/twhin/configs.
The location of the data must be specified in the configuration yaml files in `projects/twhin/configs`.


Workflow
Expand Down