Skip to content

Commit 8e8b3b3

Browse files
authored
Merge pull request #17 from kay-wong/main
Add some info about TwHIN
2 parents 3385d20 + 035ea5a commit 8e8b3b3

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

README.md

+41
Original file line numberDiff line numberDiff line change
@@ -188,8 +188,49 @@ If a user [opted out of](https://github.com/twitter/the-algorithm/blob/138bb5199
188188
### Hypothetical Architecture Diagram
189189
<img width="487" alt="Simclusters" src="https://user-images.githubusercontent.com/3837836/230660246-6ce676c5-204d-47fa-9909-013565bec142.png">
190190

191+
---
191192

192193
### TwHIN
194+
195+
[TwHIN Code](https://github.com/twitter/the-algorithm-ml/tree/main/projects/twhin),[TwHIN Paper](https://arxiv.org/pdf/2202.05387.pdf)
196+
197+
Twitter's Heterogeneous Information Network (HIN) is a graph network where the nodes/vertices of the graph represent multiple entity types, and the edges represent one of many interaction types between the entities.
198+
199+
The following entity and relation types are represented:
200+
- **Entity Types (Nodes)**: User, Tweet, Advertiser, Ad
201+
- **Relation Types (Edges)**: Follow, Authors, Favourites, Replies, Retweets, Promotes, Clicks
202+
203+
The multi-type, multi-relation network enables the resultant TwHIN embeddings to capture signals such as:
204+
- social signals (follow-graph)
205+
- content engagement signals (tweet, image, and video engagements)
206+
- advertisement engagements
207+
208+
## TwHIN at Twitter
209+
The TwHIN approach is applied to form two different heterogenous networks, each centered around a high coverage relation:
210+
211+
### 1. TwHIN-Follow
212+
[kNN-Embed paper](https://arxiv.org/abs/2205.06205), [Huggingface Dataset](https://huggingface.co/datasets/Twitter/TwitterFollowGraph)
213+
214+
A Twitter HIN centered around a User-User follow graph.
215+
Graph of User (consumer) to Author (producer) nodes, where an edge represents a user "following" an author engagement
216+
- 261M edges and 15.5M vertices
217+
- Max-degree of 900𝐾 and a min-degree of 5.
218+
219+
### 2.TwHIN-Engagement
220+
[MiCRO paper](https://arxiv.org/abs/2210.16271), [Huggingface Dataset](https://huggingface.co/datasets/Twitter/TwitterFaveGraph)
221+
222+
A Twitter HIN centered around a User-Tweet engagement graph.
223+
Graph of User to Tweet nodes, where an edge represents a "fave" engagement
224+
- 6.7M user nodes, 13M Tweet nodes, and 283M edges
225+
- For users: max-degree of 100 and min-degree of 1
226+
- For tweets: max-degree of 280k and min-degree of 5.
227+
228+
Each of the TwHIN datasets released on Huggingface have been heavily subsampled, and anonymized due to privacy restrictions.
229+
230+
## Training
231+
- [local.yaml](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/config/local.yaml) defines some training configurations
232+
- [machines.yaml](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/machines.yaml) defines the resources for training the TWHIN embeddings. Notably, it specifies 16x A100 GPUs & 1.4TB of RAM.
233+
193234
---
194235

195236
### RealGraph

0 commit comments

Comments
 (0)