About

This is a code for benchmark tasks of RLKWiC dataset.

Task Types

In-context prediction
Classify whether each event belongs to the current work context.
KWA label prediction
Classify what type of work the session represents (e.g., information retrieval, learning, administrative tasks).
Relevant Entity estimation
Estimate how related knowledge entities are to the user’s interests based on the work history so far.
Domain prediction
Predict the next domain to access (e.g., web search, email, internal tools).
Event prediction
Predict the next event (title) the user is likely to look up.
Application prediction
Predict the next application (tool) to be used.

Dataset Preparation

Download the dataset:
https://purl.org/RLKWiC
Download the label data:
https://purl.org/entity-recommendation-on-rlkwic

Place them under data/p1...p8 and data/label as shown below:

data
├── p1/
│   ├── contexts.csv
│   ├── sessions.csv
├── p2/
│   ...
└── label/
    ├── Recommendations.csv
    └── Scores.csv

change directory to data

Convert the label file
- Script: python comb_order_recom.py
- Output: label/recommendations_ordered.json
Convert CSV files to JSON
- Script: bash comb_json.sh
- Output: p*/json_files
Generate metadata
- Script: python make_metadata.py
- Output: metadata.json

change directory to data/label

For task3, prepare dbpedia abstract.
- Script python dbpedia_abstract.py
- Output: entity_abstract.json

Running Experiments

Tasks 1 & 2
```
python run_classification_task.py
```
Task 3
```
python run_score_dbpedia.py
```
Tasks 4–6
```
bash run_seq_recom.py
```

citation

@inproceedings{RLKWiC_benchmark,
author = {Yuuki Tachioka},
title = {Benchmarking Predictive Models for Knowledge Work Productivity on the {RLKWiC} Dataset},
booktitle = {Proceedings of the Fifth Workshop on Recommender Systems for Human Resources @ 19th ACM Conference on Recommender Systems (RecSys 2025)},
year = {2025},
month = {9},
}

There are known issues in RLKWiC dataset; these are unrelated to our experiment

p7: the content of terms.csv and stemterm.csv are reversed
session 4 is misssing for p2
some spos are missing

p spoid

p1 {1056, 1053, 1005}

p2 {739, 718}

p3 {832, 755}

p4 {752, 743}

p6 {743, 746, 810, 816, 721, 724, 853, 727}

p8 {840, 753, 747, 785}
some clipboards are truncated

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
data		data
dataloader		dataloader
model		model
README.md		README.md
run_classification_task.py		run_classification_task.py
run_recbole.py		run_recbole.py
run_score_dbpedia.py		run_score_dbpedia.py
run_seq_recom.sh		run_seq_recom.sh
split_train_eval.py		split_train_eval.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Task Types

Dataset Preparation

Running Experiments

citation

There are known issues in RLKWiC dataset; these are unrelated to our experiment

About

Uh oh!

Releases

Packages

Languages

p	spoid
p1	{1056, 1053, 1005}
p2	{739, 718}
p3	{832, 755}
p4	{752, 743}
p6	{743, 746, 810, 816, 721, 724, 853, 727}
p8	{840, 753, 747, 785}

DensoITLab/RLKWiC_benchmark

Folders and files

Latest commit

History

Repository files navigation

About

Task Types

Dataset Preparation

Running Experiments

citation

There are known issues in RLKWiC dataset; these are unrelated to our experiment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages