The official codebase for our paper "Evaluating Durability: Benchmark Insights into Multimodal Watermarking".
Jielin Qiu*, William Jongwon Han*, Xuandong Zhao, Shangbang Long, Christos Faloutsos, Lei Li.
More details can be found on the project webpage.
If you feel our code or models helps in your research, kindly cite our paper:
@inproceedings{Qiu2024EvaluatingDB,
title={Evaluating Durability: Benchmark Insights into Multimodal Watermarking},
author={Jielin Qiu and William Han and Xuandong Zhao and Shangbang Long and Christos Faloutsos and Lei Li},
journal={arXiv preprint arXiv:2406.03728},
year={2024}
}
We generally recommend the following pipeline:
- Generate text and images utilizing multimodal models.
- Watermark generated text and images.
- Perturb watermarked text and images.
- Detect perturbed, watermarked text and image.
We will now go a bit more in depthon how to do each step.
In our study, we follow the existing codebases for comprehensive benchmarking.
We recommend creating separate environments for each multimodal model and watermarking method. All perturbations (Text and Image) can be done through one environment.
One thing to note is that some of the links are not in fact repositories but Hugging Face tutorials on how to utilize the models. For such models, we experienced that downloading the latest transformers version works well. However, if there are any errors utilizing multiple multimodal models with a singular environment, please feel free to create another environment.
We provide the link to all of the necessary repositorys for this project. Please carefully follow their environment settings and generate, watermark, perturb in separate environments. We thank all of the repositories as well for open sourcing their code.
Type | Link |
---|---|
Multimodal Model | NExT-GPT |
Multimodal Model | RPG |
Multimodal Model | LCMs |
Multimodal Model | Kandinsky |
Multimodal Model | PIXART |
Multimodal Model | SDXL-Lightning |
Multimodal Model | DALLE3 |
Multimodal Model | Stable Diffusion |
Multimodal Model | Fuyu-8B |
Multimodal Model | InternLM-XComposer |
Multimodal Model | InstructBLIP |
Multimodal Model | LLaVA 1.6 |
Multimodal Model | MiniGPT-4 |
Multimodal Model | mPLUG-Owl2 |
Multimodal Model | Qwen-VL |
Watermark | KGW |
Watermark | KTH |
Watermark | Blackbox |
Watermark | Unigram |
Watermark | DwtDctSvd |
Watermark | RivaGAN |
Watermark | SSL |
Watermark | Stega Stamp |
Image and Text Perturbations | MM_Robustness |
Please download the COCO validation split from the official website cocodataset. You can download images-val2017 and annotations-val2017.
If for some reason there is a problem with the link, a copy of the data can be found here.
Then move the data into the COCO folder. the coco.py
file is the data loader used to iterate through the data.
All multimodal models used in this study is available in the mm_model
directory.
We do want to note that not all models had a Github repository, however, we still provide an example of how to utilize the model for text or image generation.
Additionally, we want to note that some of the models on Hugging Face are fairly large. We recommend to set the model download cache path to a specific folder on your local machine that has enough memory.
All watermarks are in the watermark
directory. After setting up their respective enironments and having already generated the text or images, please proceed to watermark all generated texts or images.
All perturbations are in the perturbation
directory. After setting up the perturbation evironment from the perturbations/MM_Robustness
repository, please proceed to perturb all of the watermarked images or text. Additionally, inside the perturbation
directory, the image_perturb.py
and text_perturb.py
files contain all of the needed image and text perturbations for this study.
Due to each watermarking method having their own way of detection, we provide an example pipeline of detecting watermarks. Please view them to see examples of how to detect them. We also provide the calculation of the other metrics as well (e.g., ROUGE, PSNR, etc.).
This project is licensed under CC BY-NC-SA License.
If you have any questions, please contact [email protected], [email protected].