-
Notifications
You must be signed in to change notification settings - Fork 106
Description
Issue: Fine-tuning PE-Core Models with OpenCLIP
Thank you for your excellent work on the PE-Core models and for open-sourcing them!
In the documentation you only refer to using the OpenCLIP framework for training and evaluation of the PE-Code encoder models. We're attempting contrastive fine-tuning of these models (e.g., PE-Core-L14-336
) using OpenCLIP with custom image-caption datasets, but encountered a few challenges:
1. Model Registration in OpenCLIP
PE-Core models (e.g., PE-Core-L14-336
) aren't registered directly in OpenCLIP:
open_clip_train.main --model PE-Core-L14-336
This results in RuntimeError(f'Model config for {model_name} not found.')
2. JSON Configuration Challenges
We attempted to register the model using a custom configuration as suggested in OpenCLIP Discussion #1022:
open_clip.add_model_config(custom_config_path)
With a configuration like:
{
"embed_dim": 1024,
"vision_cfg": {
"image_size": 336,
"layers": 24,
"width": 1024,
"patch_size": 14,
"mlp_ratio": 4
},
"text_cfg": {
"context_length": 32,
"layers": 24,
"width": 1024,
"mlp_ratio": 4,
"heads": 16,
"vocab_size": 49408
}
}
However, this approach led to issues:
- Unexpected key errors (e.g.,
visual.attn_pool.*
) - Dimension mismatches due to differing pooling methods
Question
Could you clarify:
- Is there a more straightforward approach or recommended workflow to fine-tune PE-Core models with OpenCLIP than the manual approach described above?
- If not, could you provide additional documentation or examples detailing the correct configuration and fine-tuning steps using OpenCLIP for PE-Core encoder models?
Your support on this would be greatly appreciated!
Thanks!