From 31a6d5125b8d14fa3003d44f1edab96c46423429 Mon Sep 17 00:00:00 2001 From: Anirudh NJ Date: Mon, 3 Apr 2023 11:09:15 +0200 Subject: [PATCH] Update README.md --- projects/twhin/README.md | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/projects/twhin/README.md b/projects/twhin/README.md index ec35507..dbda8ef 100644 --- a/projects/twhin/README.md +++ b/projects/twhin/README.md @@ -1,19 +1,24 @@ -Twhin in torchrec +# Twhin in torchrec -This project contains code for pretraining dense vector embedding features for Twitter entities. Within Twitter, these embeddings are used for candidate retrieval and as model features in a variety of recommender system models. +This project contains code for pretraining dense vector embedding features for Twitter entities. +Within Twitter, these embeddings are used for candidate retrieval and as model features in a variety of recommender system models. We obtain entity embeddings based on a variety of graph data within Twitter such as: - "User follows User" - "User favorites Tweet" - "User clicks Advertisement" +* "User follows User" +* "User favorites Tweet" +* "User clicks Advertisement" -While we cannot release the graph data used to train TwHIN embeddings due to privacy restrictions, heavily subsampled, anonymized open-sourced graph data can used: -https://huggingface.co/datasets/Twitter/TwitterFollowGraph -https://huggingface.co/datasets/Twitter/TwitterFaveGraph +While we cannot release the graph data used to train TwHIN embeddings due to privacy restrictions, heavily subsampled, anonymized open-sourced graph data can used: +* https://huggingface.co/datasets/Twitter/TwitterFollowGraph +* https://huggingface.co/datasets/Twitter/TwitterFaveGraph -The code expects parquet files with three columns: lhs, rel, rhs that refer to the vocab index of the left-hand-side node, relation type, and right-hand-side node of each edge in a graph respectively. +The code expects parquet files with three columns: +* lhs +* rel +* rhs +that refer to the vocab index of the left-hand-side node, relation type, and right-hand-side node of each edge in a graph respectively. -The location of the data must be specified in the configuration yaml files in projects/twhin/configs. +The location of the data must be specified in the configuration yaml files in `projects/twhin/configs`. Workflow