-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are prompts being picked up? #24
Comments
The following is my experiment script, I am using only prompt_set 1 and avoiding test evals for quicker prototyping: #!/usr/bin/env bash
# Expected command line argument values.
valid_systems=("ircot" "ircot_qa" "oner" "oner_qa" "nor_qa")
valid_models=("codex" "flan-t5-xxl" "flan-t5-xl" "flan-t5-large" "flan-t5-base" "none")
valid_datasets=("hotpotqa" "2wikimultihopqa" "musique" "iirc")
# Function to check if an argument is valid
check_argument() {
local arg="$1"
local position="$2"
local valid_values=("${!3}")
if ! [[ " ${valid_values[*]} " =~ " $arg " ]]; then
echo "argument number $position is not a valid. Please provide one of: ${valid_values[*]}"
exit 1
fi
if [[ $position -eq 2 && $arg == "none" && $1 != "oner" ]]; then
echo "The model argument can only be 'none' only if the system argument is 'oner'."
exit 1
fi
}
# Check the number of arguments
if [[ $# -ne 3 ]]; then
echo "Error: Invalid number of arguments. Expected format: ./reproduce.sh SYSTEM MODEL DATASET"
exit 1
fi
# Check the validity of arguments
check_argument "$1" 1 valid_systems[*]
check_argument "$2" 2 valid_models[*]
check_argument "$3" 3 valid_datasets[*]
echo ">>>> Instantiate experiment configs with different HPs and write them in files. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1
echo ">>>> Run experiments for different HPs on the dev set. <<<<"
python runner.py $1 $2 $3 predict --prompt_set 1
echo ">>>> Show results for experiments with different HPs <<<<"
python runner.py $1 $2 $3 summarize --prompt_set 1
echo ">>>> Pick the best HP and save the config with that HP. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1 --best
|
@HarshTrivedi did you get a chance to look at this issue? Any help is greatly appreciated! |
@mohdsanadzakirizvi Sorry for the late response. What you are doing seems correct. But to see the prompt is affected or not, you should be putting a breakpoint/print statement elsewhere. Put a breakpoint here and ensure that the self.prompt is what you expect it to be. I left some notes about how to navigate the code/flow better here some time ago. Should be helpful to figure out where the relevant code is based on the config. Lastly, note that the command |
Thanks for your response! Another question that I had was, how do you pick up the dev and eval sets? I have seen the subsampled files in the folders but where do you read them? It seems to me that inference/ircot.py's "StepByStepCOT..." only get executed when I call the predict on eval:
Which doesn't make sense. Shouldn't it also run during the predict on dev too? If so, why am I not getting any print in the output? Are we not making a LLM prediction on dev set unless eval is involved? |
Just to clarify: If you leave out the It seems you already know this. So when you drop these additional flags and still don't get any log/breakpoint in Note that |
I am trying to experiment with prompts and I'm unable to check whether the system is picking up my changed prompts?
base_config
for my experiment) inprompts/
folder.The text was updated successfully, but these errors were encountered: