Replicate Our Experiments

Code for "Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation" (arxiv)

Replicate Our Experiments

Packages you might need:

simple-disk-queue: Used to store and run tasks.

persist_to_disk: Used to cache experiment results (i.e. those @ptd.persistf decorators and ptd.manual_cache calls).

Set the Paths

First, set the corresponding paths of "Step 1" in _settings.py.

Generate the responses

Use the llama2-13b, gemma-7b or mistralai/Mistral-7B-v0.1 for model, and coqa_new, triviaqa_new and nq_open_new for the dataset below.

python -m pipeline.generate --model llama2-13b --dataset coqa_new

Update GEN_PATHS in _settings.py for next steps. (You could find the exact generatoins we used in our paper here in "output".)

Caching/Computing Results

First, add all tasks to a queue on disk, by running

python -m scripts.dq_add

Then, run the actual computation via the following (in sequence). You could specify the device to use via -d [device_numbers]

python -m scripts.dq_work -q qAll_1 -d 1
python -m scripts.dq_work -q qAll_2 -d 1
python -m scripts.dq_work -q qMult -d 0,1,2 # This runs a 70B model so might require more GPUs
python -m scripts.dq_work -q qAPI # This queue has only GPT API calls, so no GPU is needed

Downloading the Cache

The previous computation could be skipped by downloading our cache from link in "persist_to_disk". Run python -m test so that persist_to_disk package will automatically create a cache folder that looks like /path/persist_to_disk/cache/ContextSL-1/test. Put all contents in "persist_to_disk cache" under /path/persist_to_disk/cache/ContextSL-1. Once you download the chace, run python -m scripts.dq_add to confirm that all queues are empty.

Optional But Recommended

After all queues finished, you can optionally run the following to cache down some summarization.

python -m pipeline.uq
python -m scripts.cache

Run the Notebooks

Now, you can run notebook/demo.ipynb (or other notebooks)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dataeval		dataeval
models		models
notebook		notebook
pipeline		pipeline
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_settings.py		_settings.py
environment.yml		environment.yml
human_eval_new - final.xlsx		human_eval_new - final.xlsx
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Replicate Our Experiments

Set the Paths

Generate the responses

Caching/Computing Results

Downloading the Cache

Optional But Recommended

Run the Notebooks

About

Uh oh!

Releases

Packages

Languages

License

zlin7/ContextSL

Folders and files

Latest commit

History

Repository files navigation

Replicate Our Experiments

Set the Paths

Generate the responses

Caching/Computing Results

Downloading the Cache

Optional But Recommended

Run the Notebooks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages