[WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"
Xingye Chen, Wei Feng, Zhenbang Du, Weizhen Wang, Yanyin Chen, Haohan Wang, Linkai Liu, Yaoyu Li, Jinyuan Zhao, Yu Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Zhangang Lin, Jingping Shao, Yuanjie Shao, Xinge You, Changxin Gao, Nong Sang
Huazhong University of Science and Technology, JD.COM
WWW 2025
In web data, advertising images are crucial for capturing user attention and improving advertising effectiveness. Most existing methods generate background for products primarily focus on the aesthetic quality, which may fail to achieve satisfactory online performance. To address this limitation, we explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective. Firstly, we build targeted pre-training tasks, and leverage a large-scale e-commerce multimodal dataset to equip MLLMs with initial capabilities for advertising image generation tasks. To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL), which can jointly utilize multimodal features and accurately reflect user click preferences. Meanwhile, a product-centric preference optimization strategy is developed to ensure that the generated background content aligns with the product characteristics after fine-tuning, enhancing the overall relevance and effectiveness of the advertising images. Extensive experiments have demonstrated that our method achieves state-of-the-art performance in both online and offline metrics.
[2025-03-16]: ✨ Everything’s Here! Pre-training datasets, models, and the full PCPO training code are now available!
[2025-02-25]: 🔥 We've released our pre-trained Prompt Model and inference code! Check out the repository for implementation details.
[2025-02-12]: 🎯 Our paper is now available on arXiv! Check it out here: https://arxiv.org/abs/2502.06823.
[2025-01-20]: 🎉 Exciting news! Our paper has been accepted to WWW 2025! Stay tuned for more updates!
- Python >= 3.8 (Recommend to use Anaconda or Miniconda)
- PyTorch >= 2.3.1+cu11.8
conda create -n caig python==3.8.20
conda activate caig
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/Chenguoz/CAIG.git
cd CAIG
pip install -r requirements.txtDownload and extract the pre-trained Prompt Model [Google Drive] [Hugging Face] and its corresponding vision tower [download link]. Then modify the mm_vision_tower key in the config.json file of the Prompt Model to point to the correct path.
Next, run the following code:
bash scripts/gen_demo.shPlease ensure that the Prompt_Model_Path in the gen_demo.sh script is set correctly (no need to set other paths, as the remaining models will be downloaded automatically).
Description: Large-scale e-commerce multimodal dataset for equipping MLLMs with advertising image generation capabilities. Access:
- Download Link
- Access password:
4o69kt
Description: Curated dataset with user interaction signals for click preference modeling. Source:
- Primary training data: Tianchi Competition Dataset
- Complementary test set: Google Drive
Note: All JD.COM-provided datasets are for academic research only. Commercial use requires explicit authorization.
Download the pre-trained Reward Model from [Hugging Face], along with the test set mentioned above (no need to download full training set). Then modify the mm_vision_tower key in the config.json file of the Reward Model to point to the correct path.
Next, run the following code:
bash scripts/eval_reward_model.shPlease ensure that the inference processes for the Prompt Model and Reward Model mentioned above are functioning correctly, then install the additional libraries required for training with: pip install -r train_requirements.txt.
Next, run the following code:
bash scripts/train_pcpo.shNote that there is no need to download any additional datasets; the code will use the tiny_dataset included in the repository to complete the training. Alternatively, you can customize your own dataset according to the tiny_dataset format (this is very simple😊).
If you find our paper or repo helpful for your research, please consider citing our paper and giving this repo a star⭐. Thank you! :)
@inproceedings{chen2025ctr,
title={CTR-Driven Advertising Image Generation with Multimodal Large Language Models},
author={Chen, Xingye and Feng, Wei and Du, Zhenbang and Wang, Weizhen and Chen, Yanyin and Wang, Haohan and Liu, Linkai and Li, Yaoyu and Zhao, Jinyuan and Li, Yu and others},
booktitle={Proceedings of the ACM on Web Conference 2025},
pages={2262--2275},
year={2025}
}
The dataset and code in this project are provided by JD.COM and are intended solely for academic research purposes. Any commercial use requires explicit authorization from JD.COM. Unauthorized commercial use of any part of this project is strictly prohibited.
