SNLI-Attack-Analysis

Pre-trained models excel on NLI benchmarks like SNLI and MultiNLI, but their true language understanding remains uncertain. Models trained only on hypotheses and labels achieve high accuracy, indicating reliance on dataset biases and spurious correlations. To explore this issue, we applied the Universal Adversarial Attack to examine the model’s vulnerabilities. Our analysis revealed substantial drops in accuracy for the entailment and neutral classes, whereas the contradiction class exhibited a smaller decline. Fine-tuning the model on an augmented dataset with adversarial examples restored its performance to near baseline levels for both the standard and challenge sets. Our findings highlight the value of adversarial triggers in identifying spurious correlations and improving robustness while providing insights into the resilience of the contradiction class to adversarial attacks.

Setup for starters_code

Clone this repo and switch the working directory

git clone https://github.com/ckvermaAI/NLP_project.git
cd ./starters_code

Install the requirements

pip install -r requirements.txt

Run the model

# Training (checkpoints will be saved under output_dir)
python3 run.py --do_train --task nli --dataset snli --output_dir ./trained_model/

# Evaluation
python3 run.py --do_eval --task nli --dataset snli --model ./trained_model/ --output_dir ./eval_output/

Setup for universal_triggers

Clone this repo and install requirements for universal_triggers

git clone https://github.com/ckvermaAI/allennlp-fork.git
pip install allennlp-fork/

Run the script

python universal_triggers/triggers.py

Generating the triggers

# Use the hotflip attack to generate universal triggers

# 1) Attack on entailment class, to flip the label to contradiction
python triggers.py --label_filter entailment --target_label 1 2>&1 | tee hotflip/entailment-contradiction.log
# 2) Attack on entailment class, to flip the label to neutral
python triggers.py --label_filter entailment --target_label 2 2>&1 | tee hotflip/entailment-neutral.log

# 3) Attack on contradiction class, to flip the label to entailment
python triggers.py --label_filter contradiction --target_label 0  2>&1 | tee hotflip/contradiction-entailment.log
# 4) Attack on contradiction class, to flip the label to neutral
python triggers.py --label_filter contradiction --target_label 2 2>&1 | tee hotflip/contradiction-neutral.log

# 5) Attack on neutral class, to flip the label to entailment
python triggers.py --label_filter neutral --target_label 0 2>&1 | tee hotflip/neutral-entailment.log
# 6) Attack on neutral class, to flip the label to contradiction
python triggers.py --label_filter neutral --target_label 1 2>&1 | tee hotflip/neutral-contradiction.log

Creating the dataset with generated triggers

Use the ./universal_triggers/build_dataset.py to generate the dataset
Update the inputs to create_dataset function as required and run the scrips

Source code

Starters code

Pulled from https://github.com/gregdurrett/fp-dataset-artifacts

Universal triggers

Pulled from https://github.com/Eric-Wallace/universal-triggers/

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
helper		helper
starters_code		starters_code
universal_triggers		universal_triggers
.gitignore		.gitignore
README.md		README.md
SNLI-attack-analysis.pdf		SNLI-attack-analysis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SNLI-Attack-Analysis

Setup for starters_code

Setup for universal_triggers

Generating the triggers

Creating the dataset with generated triggers

Source code

Starters code

Universal triggers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ckvermaAI/SNLI-Attack-Analysis

Folders and files

Latest commit

History

Repository files navigation

SNLI-Attack-Analysis

Setup for starters_code

Setup for universal_triggers

Generating the triggers

Creating the dataset with generated triggers

Source code

Starters code

Universal triggers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages