Skip to content

Code portion of my master's thesis on a hate speech detection method using ML and a telegram bot

Notifications You must be signed in to change notification settings

Nathan-Yorio/tgbot-hate-speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hate-speech-detection-method-for-telegram

Code portion of masters thesis on a machine-learning focused content detection method for Telegram

Dataset Citations

Dataset: The Pushshift Telegram Dataset
Author: Baumgartner, Jason; Zannettou, Savvas; Squire, Megan; Blackburn, Jeremy
Source: Zenodo
License: CC BY 4.0
Date of Publication: Jan 14, 2020
Version: v1
Date Accessed: 2024-01-07
Identifier: 10.5281/zenodo.3607496 @ https://zenodo.org/doi/10.5281/zenodo.3607496


Dataset: Labeled Hate Speech Detection Dataset
Author: Cooke, Shane (2022)
Source: Figshare
License: CC BY 4.0
Date of Publication: 2022-04-30, 03:25
Version: v1
Date Accessed: 2024-01-14
Identifier: https://doi.org/10.6084/m9.figshare.19686954.v1


Dataset: A Benchmark Dataset for Learning to Intervene in Online Hate Speech
Author: Qian et al.
Source: Github, Arxiv
License: CC BY 4.0
Date of Publication: Tue, 10 Sep 2019 03:00:58 UTC
Version: v1
Date Accessed: 2024-04
Identifier: https://doi.org/10.48550/arXiv.1909.04251 Github link


Dataset: A Curated Hate Speech Dataset
Author: Mody et al.
Source: Mendeley Data
License: CC BY 4.0
Date of Publication: 3 Oct 2022
Version: v1
Date Accessed: 2024-01 - 2024-04
Identifier: 10.17632/9sxpkmm8xn.1, https://data.mendeley.com/datasets/9sxpkmm8xn/1, Article link


Machine Learning Model Citations

Model: facebook/roberta-hate-speech-dynabench-r4-target
Author: Vidgen et al.
Source: Huggingface
Date of Publication: Thu, 31 Dec 2020 17:36:48 UTC
Version: v1 presumed
Analysis Software: Python, with PyTorch and Transformers libraries
Date Accessed: 2024-01 - 2024-04
Identifier: https://doi.org/10.48550/arXiv.2012.15761
License: Not including specific license attached on HuggingFace
Research article simply includes intended use-case and Arxiv citation request if model is used:
"The approach, dataset and models presented here are intended to support more accurate and robust detection and classification of online hate. We anticipate that the high-quality and fine-grained labels in the dataset will advance research in online hate in other ways, such as enabling multiclass classification of types and targets of online hate." (Vidgen et al., 2020)

@inproceedings{vidgen2021lftw,
  title={Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection},
  author={Bertie Vidgen and Tristan Thrush and Zeerak Waseem and Douwe Kiela},
  booktitle={ACL},
  year={2021}
}

Model: cardiffnlp/twitter-roberta-base-hate-latest
Author: Loureiro et al.
Source: Huggingface, v1
Date of Publication: Fri, 1 Apr 2022 15:01:53 UTC
Version: v2
Analysis Software: Python, with PyTorch and Transformers libraries
Date Accessed: 2024-01 - 2024-04
Identifier: https://doi.org/10.48550/arXiv.2202.03829 , https://arxiv.org/pdf/2202.03829
License: MIT

@inproceedings{antypas-camacho-collados-2023-robust,
    title = "Robust Hate Speech Detection in Social Media: A Cross-Dataset Empirical Evaluation",
    author = "Antypas, Dimosthenis  and
      Camacho-Collados, Jose",
    booktitle = "The 7th Workshop on Online Abuse and Harms (WOAH)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.woah-1.25",
    pages = "231--242"
}

** This one appears to actually be based on two different papers, one for dataset processing and one for model construction


Model: Andrazp/multilingual-hate-speech-robacofi
Author: Barbieri et al.
Source: Huggingface
Date of Publication: Wed, 11 May 2022 08:06:39 UTC
Version: v2
Analysis Software: Python, with PyTorch and Transformers libraries
Date Accessed: 2024-01 - 2024-04
Identifier: https://doi.org/10.48550/arXiv.2104.12250
License: MIT


About

Code portion of my master's thesis on a hate speech detection method using ML and a telegram bot

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published