RankME: Reliable Human Ratings for NLG

Authors: Jekaterina Novikova, Ondrej Dusek and Verena Rieser

This repository contains the dataset and code released with the submission of our NAACL 2018 paper "RankME: Reliable Human Ratings for Natural Language Generation".

Description

Setup 1 corresponds to the experimental setup when the human evaluation ratings of informativeness, naturalness and quality are collected together. The folder crowdflower/setup_1, in correspondence with the paper, contains three code versions of Setup 1 - Likert, PlainME and RankME. Screenshots of the corresponding CrowdFlower tasks are shown in Fig.1:

Fig.1. Screenshots of three methods used with Setup 1 to collect human evaluation data. Left to right - Likert, PlainME and RankME methods

Setup 2 corresponds to the experimental setup when the human evaluation ratings of informativeness, naturalness and quality are collected separately. The folder crowdflower/setup_2 provides CrowdFlower code for three collection methods (Likert, PlainME and RankME) for each human rating (informativeness*, naturalness and quality). Screenshots of the RankME method for Setup 2 for informativeness, naturalness and quality are shown in Fig.2:

Fig.2. Screenshots of the RankME methods/setup 2 used to collect human evaluation data. Left to right - informativeness, naturalness, quality.

Citing

If you use this code or data in your work, please cite the following paper:

@inproceedings{novikova2018rankME,
  title={Rank{ME}: Reliable Human Ratings for Natural Language Generation},
  author={Novikova, Jekaterina and Du{\v{s}}ek, Ondrej and Rieser, Verena},
  booktitle={Proceedings of the  16th Annual Conference of the North American Chapter 
             of the Association for Computational Linguistics},
  address={New Orleans, Louisiana},
  pages={72--78},
  year={2018},
  url={http://aclweb.org/anthology/N18-2012},
}

License

Distributed under the Creative Commons 4.0 Attribution-ShareAlike license (CC4.0-BY-SA).

Acknowledgements

This research received funding from the EPSRC projects DILiGENt (EP/M005429/1) and MaDrIgAL (EP/N017536/1).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
crowdflower		crowdflower
data		data
README.md		README.md
lik_stp1.png		lik_stp1.png
plainME_stp1.png		plainME_stp1.png
rankME_stp1.png		rankME_stp1.png
rankME_stp2_inf.png		rankME_stp2_inf.png
rankME_stp2_natur.png		rankME_stp2_natur.png
rankME_stp2_qual.png		rankME_stp2_qual.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RankME: Reliable Human Ratings for NLG

Contents

crowdflower:

data:

Description

Citing

License

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

jeknov/RankME

Folders and files

Latest commit

History

Repository files navigation

RankME: Reliable Human Ratings for NLG

Contents

crowdflower:

data:

Description

Citing

License

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages