How to reproduce the performance on the ReDial dataset? #8

dandyxxxx · 2023-11-04T13:39:12Z

I trained according to the code provided on GitHub, but since the dataset link you provided cannot be opened, I used mapping based objects_ Lang=en_ 202112.ttl dataset. The final results of my training are as follows:

conv:
'test/dist@2': 0.310709750246931, 'test/dist@3': 0.49851841399746016, 'test/dist@4': 0.6383519119514605
rec:
'test/recall@1': 0.029324894514767934, 'test/recall@10': 0.16729957805907172, 'test/recall@50': 0.37953586497890296

(1)These results differ greatly from the results presented in the paper. Can you give me some guidance? I hope to reproduce results similar to yours. Thank you very much.
(2)According to your paper, do I need to set --n_prefix_conv 50 in the train_conv.py and --use_resp in the train_rec. py?

linshan-79 · 2023-11-27T11:17:39Z

I also encountered the same problem as you. Since the author provided the missing files, I trained according to the guidance. However the metric results also similar to yours. Here are the detail of redial dataset metrics:

conv:

'test/dist@2': 0.26710879074361504, 'test/dist@3': 0.4199238041484408, 'test/dist@4': 0.5233526174686045,

rec:

'test/recall@1': 0.035443037974683546, 'test/recall@10': 0.1729957805907173, 'test/recall@50': 0.3744725738396624,

Here is my config of conversational task:

accelerate launch train_conv.py \
           --dataset redial \
           --tokenizer ~/model/DialoGPT-small \
           --model ~/model/DialoGPT-small \
           --text_tokenizer ~/model/roberta-base \
           --text_encoder ~/model/roberta-base \
           --n_prefix_conv 50 \
           --prompt_encoder ${prompt_encoder_dir}/final \
           --num_train_epochs 10 \
           --gradient_accumulation_steps 1 \
           --ignore_pad_token_for_loss \
           --per_device_train_batch_size 8 \
           --per_device_eval_batch_size 16 \
           --num_warmup_steps 6345 \
           --context_max_length 200 \
           --resp_max_length 183 \
           --prompt_max_length 200 \
           --entity_max_length 32 \
           --learning_rate 1e-4 \
           --output_dir ${output_dir} \
           --log_all

(1)@dandyxxxx, you can see that I set 'n_prefix_conv=50,' but the results don't match the paper. Could you share your configuration details? Maybe we can work together to solve the problem. Thank you very much!
(2)@wxl1999 Thanks for your work! I learned a lot from your paper and code as a beginner. I'm thinking the issue might be related to the kg module not correctly capturing relations from the dataset. Could you provide some guidance? Thank you very much!

wxl1999 · 2023-11-27T11:50:43Z

Sorry for the late reply!

The pre-training stage is very important for the final performance. You should observe a very good performance since the answer is actually provided in the response.
- If your pre-training is not so good, you cannot observe a continuous drop in the loss curve.
Once your pre-training is well conducted, you will observe similar performance for the recommendation task with fine-tuning.
As for the conversation task, since distinct is not a very reliable metric (you can observe continuous performance gain if you do not stop training), I suggest you do not pay too much attention to this, but focus more on human evaluation. This is also the practice for large language models.
About the evaluation for conversational recommendation, you can also refer to this paper: Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models

Hope this can help you!

linshan-79 · 2023-12-15T02:10:31Z

Thanks for your replying! This help me a lot.

careerists · 2024-01-04T11:33:39Z

@linshan-79 I have the same problem. Did you finally solve it? Thank you so much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reproduce the performance on the ReDial dataset? #8

How to reproduce the performance on the ReDial dataset? #8

dandyxxxx commented Nov 4, 2023 •

edited

Loading

linshan-79 commented Nov 27, 2023

wxl1999 commented Nov 27, 2023

linshan-79 commented Dec 15, 2023

careerists commented Jan 4, 2024

How to reproduce the performance on the ReDial dataset? #8

How to reproduce the performance on the ReDial dataset? #8

Comments

dandyxxxx commented Nov 4, 2023 • edited Loading

linshan-79 commented Nov 27, 2023

wxl1999 commented Nov 27, 2023

linshan-79 commented Dec 15, 2023

careerists commented Jan 4, 2024

dandyxxxx commented Nov 4, 2023 •

edited

Loading