This is a code for benchmark tasks of RLKWiC dataset.
- In-context prediction
Classify whether each event belongs to the current work context. - KWA label prediction
Classify what type of work the session represents (e.g., information retrieval, learning, administrative tasks). - Relevant Entity estimation
Estimate how related knowledge entities are to the user’s interests based on the work history so far. - Domain prediction
Predict the next domain to access (e.g., web search, email, internal tools). - Event prediction
Predict the next event (title) the user is likely to look up. - Application prediction
Predict the next application (tool) to be used.
- Download the dataset:
https://purl.org/RLKWiC - Download the label data:
https://purl.org/entity-recommendation-on-rlkwic
Place them under data/p1...p8
and data/label
as shown below:
data
├── p1/
│ ├── contexts.csv
│ ├── sessions.csv
├── p2/
│ ...
└── label/
├── Recommendations.csv
└── Scores.csv
change directory to data
- Convert the label file
- Script:
python comb_order_recom.py
- Output:
label/recommendations_ordered.json
- Script:
- Convert CSV files to JSON
- Script:
bash comb_json.sh
- Output:
p*/json_files
- Script:
- Generate metadata
- Script:
python make_metadata.py
- Output:
metadata.json
- Script:
change directory to data/label
- For task3, prepare dbpedia abstract.
- Script
python dbpedia_abstract.py
- Output:
entity_abstract.json
- Script
- Tasks 1 & 2
python run_classification_task.py
- Task 3
python run_score_dbpedia.py
- Tasks 4–6
bash run_seq_recom.py
@inproceedings{RLKWiC_benchmark,
author = {Yuuki Tachioka},
title = {Benchmarking Predictive Models for Knowledge Work Productivity on the {RLKWiC} Dataset},
booktitle = {Proceedings of the Fifth Workshop on Recommender Systems for Human Resources @ 19th ACM Conference on Recommender Systems (RecSys 2025)},
year = {2025},
month = {9},
}
- p7: the content of terms.csv and stemterm.csv are reversed
- session 4 is misssing for p2
- some spos are missing
p spoid p1 {1056, 1053, 1005} p2 {739, 718} p3 {832, 755} p4 {752, 743} p6 {743, 746, 810, 816, 721, 724, 853, 727} p8 {840, 753, 747, 785} - some clipboards are truncated