Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus (EMNLP 2023)

This is the repo for our EMNLP 2023 main conference paper "Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus"[Arxiv] [Poster] [PPT].

Reproduction

Take the llama-30b-SFT version as an example, to reproduce our results on wiki_bio_gpt3_hallucination dataset, you can run

python run_wikibiogpt3.py --model_path ausboss/llama-30b-supercot --low_cpu_mem_usage

To reproduce our results on XSumFaith dataset, you can run

python run_xsum.py --model_path ausboss/llama-30b-supercot --low_cpu_mem_usage

To reproduce our results on FRANK dataset, you can run

python run_frank.py --model_path ausboss/llama-30b-supercot --low_cpu_mem_usage

Inference

Here is an example of how to run the code on wikibio task with your own data using LLaMA(v1) family models, you can modify the source code (e.g. prompt) to fit your own scenario.

python inference.py --model_path ausboss/llama-30b-supercot --task wikibio --only_keyword --use_penalty --add_type --use_idf --use_entropy --gamma 0.9 --rho 0.01

task: currently only support wikibio.
only_keyword: only consider keyword when calculating hallucination score.
use_penalty: if use penalty transmission.
add_type: if provide entity type information before named entity.
use_entropy: if consider token entropy as part of hallucination score.
use_idf: if consider token frequency.
model_path: path to the proxy model or the model name in huggingface.
gamma: the discount value when accumulating the penalty, default to 0.9.
rho: the threshold of removing tokens with a lower probability, default to 0.01.

We provide several example inputs in inference.py, you can replace them with your own data.

For wikibio task you should provide:

concept: the wikipedia passage concept.
response: the model output to be evaluated.

The outputs are a list of sentence-level hallucination scores [(sentence_id, score)] and a passage-level hallucination score float normalized to [0,1].

When you run the command above, the following result is expected (using 4 NVIDIA A10 GPUs).

[(0, 0.24670052861789984), (1, 0.29810068306299564), (2, 0.2951899944143249), (3, 0.3141602069019362), (4, 0.30233789540108125)]
0.2904945467183812
[(0, 0.47496516642470016), (1, 0.6360287223580757), (2, 0.5964681300706745), (3, 0.6218195278510681), (4, 0.5845075986061048), (5, 0.6956721858189342), (6, 0.7153816822922959), (7, 0.8139274048758866)]
0.6488059482008401

Data

Wiki_bio_gpt3_hallucination dataset

Since the original wiki_bio_gpt3_hallucination dataset does not provide the corresponding concept information for each passage, we searched for the corresponding row in the wikibio dataset based on the wiki_bio_test_idx and concatenated it into following prompt:

f"This is a passage from Wikipedia about {concept}"

We have saved the results in data/wikibio_gpt3_v3.pkl.

Token idf data

We provided a script utils/count_token_frequency.py to calculate the token idf for each token in the provided tokenizer. Take RedPajama as an example, you can run:

python count_token_frequency.py --tokenizer togethercomputer/RedPajama-INCITE-7B-Base

This will save the calculated token idf file in the token_frequency_data folder. We have provided some token idf files used in our experiments, which can be found in the token_frequency_data folder.

Requirements

python = 3.8
pytorch >= 1.9.0
transformers >= 4.28.1
tokenizers >= 0.13.3
spacy >= 3.5.1
accelerate >= 0.18.0

*When running with llama-65b (float16), please ensure that at least 140 GiB GPU memory is available (more is needed for longer input sequence lengths).

Citation

@article{zhang2023enhancing,
  title={Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus},
  author={Zhang, Tianhang and Qiu, Lin and Guo, Qipeng and Deng, Cheng and Zhang, Yue and Zhang, Zheng and Zhou, Chenghu and Wang, Xinbing and Fu, Luoyi},
  journal={arXiv preprint arXiv:2311.13230},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus (EMNLP 2023)

Reproduction

Inference

Data

Wiki_bio_gpt3_hallucination dataset

Token idf data

Requirements

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
data		data
models		models
token_frequency_data		token_frequency_data
utils		utils
README.md		README.md
inference.py		inference.py
run_frank.py		run_frank.py
run_wikibiogpt3.py		run_wikibiogpt3.py
run_xsum.py		run_xsum.py
task.py		task.py

davendw49/Focus

Folders and files

Latest commit

History

Repository files navigation

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus (EMNLP 2023)

Reproduction

Inference

Data

Wiki_bio_gpt3_hallucination dataset

Token idf data

Requirements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages