4 Nankai Institute of Advanced Research (SHENZHEN FUTIAN)โ 5 Nankai Universityโ 6 MBZUAIโ
โญ Let's have a quick recap of our main idea. Full-screen viewing is recommended for better visual details.
lawdis_show_quick_v5_small.mov
We present LawDIS, a language-window-based controllable dichotomous image segmentation (DIS) framework. It supports two forms of user control: generating an initial mask based on user-provided language prompts, and enabling flexible refinement of user-defined regions (i.e., size-adjustable windows) within initial masks.
The following is an introductory video of our work:
introduction_v2_small.mov
-
Framework innovation. We recast the DIS task as an image-conditioned mask generation problem within a latent diffusion model. This enables LawDIS to seamlessly integrate both macro and micro user controls under a unified model and a shared set of parameters.
-
Dual control modes. LawDIS employs a mode switcher to coordinate two distinct control modes. In macro mode, a language-controlled segmentation strategy (LS) generates an initial mask guided by user prompts. In micro mode, a window-controlled refinement strategy (WR) supports unlimited refinements on user-specified regions via size-adjustable local windows, enabling precise delineation of fine structures.
-
Flexible adaptation. LS and WR can function independently or in collaboration. Their joint use meets high-accuracy personalized demands, while the micro mode (WR) alone can serve as a general-purpose post-refinement tool to enhance outputs from any segmentation model.
-
Superior performance. Extensive evaluations on the DIS5K benchmark demonstrate that LawDIS consistently outperforms 11 state-of-the-art methods. Compared to the second-best model MVANet, LawDIS achieves a 3.6%
$F_\beta^\omega$ improvement using LS alone, and a 4.6% gain when combining both LS and WR on DIS-TE.
Note
The future directions of this project include more precise prompt control and improved efficiency. We warmly invite all potential collaborators to contribute to making LawDIS more accessible and practical. If you are interested in collaboration or have any questions about our paper, feel free to contact us via email ([email protected] & [email protected]). If you are using our code for your research, please cite this paper (BibTeX).
- 2025.07 We have open-sourced the core code of LawDIS!
- 2025.06 ๐ Our paper has been accepted by ICCV 2025, Honolulu, Hawai'i!
Clone the repository (requires git):
git clone https://github.com/XinyuYanTJU/LawDIS.git
cd LawDIS
conda create -n lawdis python=3.8
conda activate lawdis
pip install -r requirements.txt
This project uses a custom VAE class AutoencoderKlLawDIS
that needs to be manually added into the diffusers
library.
bash install_lawdis_diffusers.sh
Download the DIS5K dataset from this Google Drive link or Baidu Pan link with the fetch code: rtgw
. Unzip the dataset and move the DIS5K folder into the LawDIS/data directory.
The language prompts we annotated for DIS5K can be found in LawDIS/data/json/
.
Download the pre-trained checkpoints from this Google Drive link or Baidu Pan link with the fetch code: 2025
.
Place the checkpoint files under:
.stable-diffusion-2/
We provide scripts for:
- Batch testing a dataset
- Testing a single image with multiple language prompts
- Fully automatic testing of a single image without requiring prompt input
Batch testing
python script/infer_macro_batch_imgs.py \
--checkpoint "stable-diffusion-2" \
--input_rgb_dir "data/DIS5K" \
--subset_name "DIS-TE4" \
--prompt_dir 'data/json' \
--output_dir "output/output-macro" \
--denoise_steps 1 \
--processing_res 1024
Testing a single image with prompts
python script/infer_macro_single_img.py \
--checkpoint "stable-diffusion-2" \
--input_img_path "data/imgs/2#Aircraft#7#UAV#16522310810_468dfa447a_o.jpg" \
--prompts "Black professional camera drone with a high-definition camera mounted on a gimbal." "Three men beside a UAV." \
--output_dir 'output/output-macro-single' \
--denoise_steps 1 \
--processing_res 1024
Fully automatic testing of a single image without a prompt
python script/infer_macro_single_img.py \
--checkpoint "stable-diffusion-2" \
--input_img_path "data/imgs/2#Aircraft#7#UAV#16522310810_468dfa447a_o.jpg" \
--prompts "" \
--output_dir 'output/output-macro-single' \
--denoise_steps 1 \
--processing_res 1024
We provide scripts for:
- Batch testing a dataset
- Testing a single image
You can choose how to generate the refinement windows using --window_mode
:
"auto"
: Automatically select windows based on object edges in the initial segmentation map."semi-auto"
: Simulate user-guided selection using GT segmentation."manual"
: User manually selects windows (โ ๏ธ Only works on local servers).
Batch testing
python script/infer_micro_batch_imgs.py \
--checkpoint "stable-diffusion-2" \
--input_rgb_dir "data/DIS5K" \
--subset_name "DIS-TE4" \
--init_seg_dir 'output/output-macro/' \
--output_dir "output/output-micro/" \
--window_mode "semi-auto" \
--denoise_steps 1 \
--processing_res 1024
Single image testing
python script/infer_micro_single_img.py \
--checkpoint "stable-diffusion-2" \
--input_img_path "data/imgs/2#Aircraft#7#UAV#16522310810_468dfa447a_o.jpg" \
--init_seg_dir 'output/output-macro-single/2#Aircraft#7#UAV#16522310810_468dfa447a_o_0.png' \
--output_dir "output/output-micro-single" \
--window_mode "auto" \
--denoise_steps 1 \
--processing_res 1024
The predicted segmentation maps can be downloaded from this Google Drive link or Baidu Pan link with the fetch code: lawd
.
LawDIS-S
refers to the initial segmentation results obtained under the macro mode with language prompt control. LawDIS-R
refers to the refined results obtained under the micro mode, where window-based refinement are applied to LawDIS-S.
Notably, the initial results (LawDIS-S) already achieve SOTA performance, and LawDIS-R further improves the metrics.
Fig. 2: Quantitative comparison of DIS5K with 11 representative methods.
Fig. 3: Qualitative comparison of our model with four leading models. Local masks are evaluated with MAE score for clarity.
Due to its capability of achieving high-precision segmentation of foreground objects at high resolutions, our LawDIS enables extensive application across a variety of scenarios. Fig. 6 shows application cases of background removal. As can be seen, compared with the original image, the background-removed image shows higher aesthetic values and good usability, which can even be directly used as: 3D modeling, augmented reality (AR), and still image animation.
Fig. 5: Application cases of 3D modeling.
Fig. 6: Application cases of AR.
Fig. 7: Application cases of still image animation.
Our code is based on Marigold and Diffusers. Latest DIS studies can refer to this awesome paper list organised by Xianjie Liu (SCU). We are grateful to the authors of these projects for their pioneering work and contributions!
If you find this code useful, we kindly ask you to cite our paper in your work.
@article{yan2025lawdis,
title={LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation},
author={Xinyu Yan and Meijun Sun and Ge-Peng Ji and Fahad Shahbaz Khan and Salman Khan and Deng-Ping Fan},
journal={arXiv preprint arXiv:2508.01152},
year={2025}
}