Skip to content

DensoITLab/RLKWiC_benchmark

Repository files navigation

About

This is a code for benchmark tasks of RLKWiC dataset.

Task Types

  1. In-context prediction
    Classify whether each event belongs to the current work context.
  2. KWA label prediction
    Classify what type of work the session represents (e.g., information retrieval, learning, administrative tasks).
  3. Relevant Entity estimation
    Estimate how related knowledge entities are to the user’s interests based on the work history so far.
  4. Domain prediction
    Predict the next domain to access (e.g., web search, email, internal tools).
  5. Event prediction
    Predict the next event (title) the user is likely to look up.
  6. Application prediction
    Predict the next application (tool) to be used.

Dataset Preparation

Place them under data/p1...p8 and data/label as shown below:

data
├── p1/
│   ├── contexts.csv
│   ├── sessions.csv
├── p2/
│   ...
└── label/
    ├── Recommendations.csv
    └── Scores.csv

change directory to data

  1. Convert the label file
    • Script: python comb_order_recom.py
    • Output: label/recommendations_ordered.json
  2. Convert CSV files to JSON
    • Script: bash comb_json.sh
    • Output: p*/json_files
  3. Generate metadata
    • Script: python make_metadata.py
    • Output: metadata.json

change directory to data/label

  • For task3, prepare dbpedia abstract.
    • Script python dbpedia_abstract.py
    • Output: entity_abstract.json

Running Experiments

  • Tasks 1 & 2
    python run_classification_task.py
  • Task 3
    python run_score_dbpedia.py
  • Tasks 4–6
    bash run_seq_recom.py

citation

@inproceedings{RLKWiC_benchmark,
author = {Yuuki Tachioka},
title = {Benchmarking Predictive Models for Knowledge Work Productivity on the {RLKWiC} Dataset},
booktitle = {Proceedings of the Fifth Workshop on Recommender Systems for Human Resources @ 19th ACM Conference on Recommender Systems (RecSys 2025)},
year = {2025},
month = {9},
}

There are known issues in RLKWiC dataset; these are unrelated to our experiment

  • p7: the content of terms.csv and stemterm.csv are reversed
  • session 4 is misssing for p2
  • some spos are missing
    p spoid
    p1 {1056, 1053, 1005}
    p2 {739, 718}
    p3 {832, 755}
    p4 {752, 743}
    p6 {743, 746, 810, 816, 721, 724, 853, 727}
    p8 {840, 753, 747, 785}
  • some clipboards are truncated

About

This is a benchmark task for RLKWiC dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published