Authors: Jekaterina Novikova, Ondrej Dusek and Verena Rieser
This repository contains the dataset and code released with the submission of our NAACL 2018 paper "RankME: Reliable Human Ratings for Natural Language Generation".
This folder contains instructions, CML, CSS and JS code used in CrowdFlower tasks.
This folder contains data files with human evaluation ratings collected via CrowdFlower.
Setup 1 corresponds to the experimental setup when the human evaluation ratings of informativeness, naturalness and quality are collected together. The folder crowdflower/setup_1, in correspondence with the paper, contains three code versions of Setup 1 - Likert, PlainME and RankME. Screenshots of the corresponding CrowdFlower tasks are shown in Fig.1:
Fig.1. Screenshots of three methods used with Setup 1 to collect human evaluation data. Left to right - Likert, PlainME and RankME methods
Setup 2 corresponds to the experimental setup when the human evaluation ratings of informativeness, naturalness and quality are collected separately. The folder crowdflower/setup_2 provides CrowdFlower code for three collection methods (Likert, PlainME and RankME) for each human rating (informativeness*, naturalness and quality). Screenshots of the RankME method for Setup 2 for informativeness, naturalness and quality are shown in Fig.2:
Fig.2. Screenshots of the RankME methods/setup 2 used to collect human evaluation data. Left to right - informativeness, naturalness, quality.
If you use this code or data in your work, please cite the following paper:
@inproceedings{novikova2018rankME,
title={Rank{ME}: Reliable Human Ratings for Natural Language Generation},
author={Novikova, Jekaterina and Du{\v{s}}ek, Ondrej and Rieser, Verena},
booktitle={Proceedings of the 16th Annual Conference of the North American Chapter
of the Association for Computational Linguistics},
address={New Orleans, Louisiana},
pages={72--78},
year={2018},
url={http://aclweb.org/anthology/N18-2012},
}
Distributed under the Creative Commons 4.0 Attribution-ShareAlike license (CC4.0-BY-SA).
This research received funding from the EPSRC projects DILiGENt (EP/M005429/1) and MaDrIgAL (EP/N017536/1).