Skip to content

This repo implements the experiment described in "Unpacking the Resilience of SNLI Contradiction Examples to Attacks"

Notifications You must be signed in to change notification settings

ckvermaAI/SNLI-Attack-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SNLI-Attack-Analysis

Pre-trained models excel on NLI benchmarks like SNLI and MultiNLI, but their true language understanding remains uncertain. Models trained only on hypotheses and labels achieve high accuracy, indicating reliance on dataset biases and spurious correlations. To explore this issue, we applied the Universal Adversarial Attack to examine the model’s vulnerabilities. Our analysis revealed substantial drops in accuracy for the entailment and neutral classes, whereas the contradiction class exhibited a smaller decline. Fine-tuning the model on an augmented dataset with adversarial examples restored its performance to near baseline levels for both the standard and challenge sets. Our findings highlight the value of adversarial triggers in identifying spurious correlations and improving robustness while providing insights into the resilience of the contradiction class to adversarial attacks.

Setup for starters_code

  • Clone this repo and switch the working directory
git clone https://github.com/ckvermaAI/NLP_project.git
cd ./starters_code
  • Install the requirements
pip install -r requirements.txt
  • Run the model
# Training (checkpoints will be saved under output_dir)
python3 run.py --do_train --task nli --dataset snli --output_dir ./trained_model/

# Evaluation
python3 run.py --do_eval --task nli --dataset snli --model ./trained_model/ --output_dir ./eval_output/

Setup for universal_triggers

  • Clone this repo and install requirements for universal_triggers
git clone https://github.com/ckvermaAI/allennlp-fork.git
pip install allennlp-fork/
  • Run the script
python universal_triggers/triggers.py

Generating the triggers

# Use the hotflip attack to generate universal triggers

# 1) Attack on entailment class, to flip the label to contradiction
python triggers.py --label_filter entailment --target_label 1 2>&1 | tee hotflip/entailment-contradiction.log
# 2) Attack on entailment class, to flip the label to neutral
python triggers.py --label_filter entailment --target_label 2 2>&1 | tee hotflip/entailment-neutral.log

# 3) Attack on contradiction class, to flip the label to entailment
python triggers.py --label_filter contradiction --target_label 0  2>&1 | tee hotflip/contradiction-entailment.log
# 4) Attack on contradiction class, to flip the label to neutral
python triggers.py --label_filter contradiction --target_label 2 2>&1 | tee hotflip/contradiction-neutral.log

# 5) Attack on neutral class, to flip the label to entailment
python triggers.py --label_filter neutral --target_label 0 2>&1 | tee hotflip/neutral-entailment.log
# 6) Attack on neutral class, to flip the label to contradiction
python triggers.py --label_filter neutral --target_label 1 2>&1 | tee hotflip/neutral-contradiction.log

Creating the dataset with generated triggers

  1. Use the ./universal_triggers/build_dataset.py to generate the dataset
  2. Update the inputs to create_dataset function as required and run the scrips

Source code

Starters code

Pulled from https://github.com/gregdurrett/fp-dataset-artifacts

Universal triggers

Pulled from https://github.com/Eric-Wallace/universal-triggers/

About

This repo implements the experiment described in "Unpacking the Resilience of SNLI Contradiction Examples to Attacks"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •