Thank you for your work. I have some questions about the evaluation of Correct Format in the BFCL-v3 benchmark in Table 1 of the paper. I would like to know how the authors evaluated this metric, and whether there are any plans to open-source the relevant code to facilitate follow-up work.
Thank you for your work. I have some questions about the evaluation of Correct Format in the BFCL-v3 benchmark in Table 1 of the paper. I would like to know how the authors evaluated this metric, and whether there are any plans to open-source the relevant code to facilitate follow-up work.