You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Twitter's Heterogeneous Information Network (HIN) is a graph network where the nodes/vertices of the graph represent multiple entity types, and the edges represent one of many interaction types between the entities.
198
+
199
+
The following entity and relation types are represented:
200
+
-**Entity Types (Nodes)**: User, Tweet, Advertiser, Ad
A Twitter HIN centered around a User-Tweet engagement graph.
223
+
Graph of User to Tweet nodes, where an edge represents a "fave" engagement
224
+
- 6.7M user nodes, 13M Tweet nodes, and 283M edges
225
+
- For users: max-degree of 100 and min-degree of 1
226
+
- For tweets: max-degree of 280k and min-degree of 5.
227
+
228
+
Each of the TwHIN datasets released on Huggingface have been heavily subsampled, and anonymized due to privacy restrictions.
229
+
230
+
## Training
231
+
-[local.yaml](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/config/local.yaml) defines some training configurations
232
+
-[machines.yaml](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/machines.yaml) defines the resources for training the TWHIN embeddings. Notably, it specifies 16x A100 GPUs & 1.4TB of RAM.
0 commit comments