Skip to content

Code & Data for the paper "RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models"


Notifications You must be signed in to change notification settings



Repository files navigation

Welcome to our Github! We're team Meta 1A, a group of five women studying computer science and hoping to make a difference in the world through machine learning. In this project, we developed a BERT model that detects bias in the language of Reddit comments. By building this model, we hope to purge bias in language-based models by determining the quality of the datasets being fed into it. Below, you'll find the documentation from the original dataset we forked from @umanlp, who has collected the data for public use - thank you @umanlp!

To run our model locally, there are a few steps:

  1. Click on the file labeled "final_reddit_bias.ipynb" In it, you'll find our Google Colab notebook containing our model.
  2. Click the "Download raw file" button in the top right of the notebook.
  3. Go to Google Colab or Kaggle and upload the file to a new notebook.
  4. Next, you'll need to download the data we used for the model. You can either download the entire dataset through the file "data." To do so, right click on the file, and some options should come up. Click the one that say "Download Linked File," and you should see all the data files in your recently downloaded. Note: This may look slightly different on a Windows laptop.
  5. Upload the file to your new notebook on Google Colab/Kaggle. You can upload the dataset on Google Colab by clicking on the file button to the left or on Kaggle using the input tab at the top.
  6. Run all the cells and watch the model work! Follow along with the descriptions to get a feel for how/why we're doing each step.

Feel free to look over the documentation from @umanlp for the technicalities on the data we used. Thanks for stopping by!


This repository contains the code and data for bias evaluation with RedditBias (to appear at ACL21). The code for the debiasing approaches and the conversational downstream evaluation can be found here:

Privacy & Ethics

RedditBias is created from real-world conversations. To protect the users whose comments are included in our data set, we have removed all identifying information, e.g., user names, and kept only the text needed for our analysis. However, if you find your text in our data set and you feel misrepresented being included in this data set, please reach out to us with the following information: comment to be removed & reddit username. Thank you!

How to Use this Code

For bias evaluation with RedditBias, please use Evaluation/ The rest of the code you can find in this repository documents the data set creation and offers other useful functions.

Data Preparation

The data preparation code is included in the directory - DataPreparation

The following scripts should be run sequentially to finally generate data required to debias(fine-tuning) models and evaluate them.

  • DataPreparation/ -> Retrieves raw reddit comments using query match (Target group words and attribute words)
  • DataPreparation/reddit_data_process -> Processes the retrieved comments
  • DataPreparation/reddit_data_phrases -> Generates phrases from processed Reddit comments
  • Create manual bias annotations and generate file 'reddit_comments_gender_female_processed_phrase_annotated.csv'
  • DataPreparation/ -> Extracts biased phrases and creates counter target data
  • DataPreparation/ -> Creates train test split of biased phrases
  • evaluation/ -> Removes outliers from test set and creates reduced test set
  • DataPreparation/ -> Creates valid-test split of the reduced test set
  • DataPreparation/ -> Creates counter target augmented data
  • DataPreparation/ -> Creates counter attribute data
  • DataPreparation/ -> Creates test files of counter attribute augmented data

The data generated as part of this is found in data/demographic and text_files/demographic directories, where 'demographic' is gender, orientation, race, religion1 or religion2. The txt files in folder text_files/ are used for train, validation and evaluation during fine-tuning the DialoGPT model using Debiasing methods.

A brief description of files in data/religion1 is:

  • religion2_muslims.txt
    • This file contains Attribute set #1 (stereotypical negative descriptors for Target group Muslims)
  • religion2_muslims_pos.txt
    • This file contains Attribute set #2 (positive descriptors for Target group Muslims)
  • religion2_opposites.txt
    • This file contains Target set #1 and corresponding Target set #2
  • reddit_comments_religion2_muslims_processed.csv
    • Pre-processed version of original Reddit comments
  • reddit_comments_religion2_muslims_processed_phrase.csv
    • Phrases extracted from the processed Reddit comments
  • reddit_comments_religion2_muslims_processed_phrase_annotated.csv
    • Manual bias annotations for Reddit comments and phrases
  • reddit_comments_religion2_christians_biased_test_reduced.csv and reddit_comments_religion2_muslims_biased_test_reduced.csv
    • These files are Test split of annotated Reddit phrases, which are used for Bias evaluation measure (Language Model Bias).
  • reddit_comments_religion2_christians_biased_valid_reduced.csv and reddit_comments_religion2_muslims_biased_valid_reduced.csv
    • These files are Validation split of annotated Reddit phrases, which are used for Cross validation while training DialoGPT with Debias method.
  • reddit_comments_religion2_muslims_processed_phrase_biased_testset_neg_attr_reduced.csv and reddit_comments_religion2_muslims_processed_phrase_biased_testset_pos_attr_reduced.csv
    • These files are test split of Reddit phrases used for Bias evaluation over contrasting Attributes for marginalised demographic.

Note: The unprocessed reddit comment files could not be uploaded to GitHub due to size constraints. Find it on

Language Model Bias (Significance test Bias evaluation)

  • Evaluation/

    • This script performs Student t-test on the perplexity distribution of two sentences groups with contrasting targets. For example works on files: reddit_comments_religion2_christians_biased_test_reduced.csv and reddit_comments_religion2_muslims_biased_test_reduced.csv. Set variable 'REDUCE_SET' to remove outliers from target set. Unset variable ''REDUCE_SET' if you are already using reduced test set for input
  • Evaluation/

    • This script performs Student t-test on the perplexity distribution of two sentences groups with contrasting attributes. For example works on files: reddit_comments_religion2_muslims_processed_phrase_unbiased_testset_pos_attr_reduced.csv and reddit_comments_religion2_muslims_processed_phrase_biased_testset_neg_attr_reduced.csv. Set variable 'REDUCE_SET' to remove outliers from target set. Unset variable ''REDUCE_SET' if you are already using reduced test set for input

Generate response from models

  • Decoding/ -> Generates pre-trained model responses from a context
  • Decoding/ -> Creates token ids of attribute words
  • Decoding/ -> Creates token ids of target words


Code & Data for the paper "RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models"







No releases published


No packages published


  • Jupyter Notebook 79.6%
  • Python 20.4%