Skip to content

XinyuYanTJU/LawDIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation (ICCV 2025)

Xinyu Yan 1,2,6,โ€‰ Meijun Sun 1,2,โ€‰ Ge-Peng Ji 3,โ€‰ Fahad Shahbaz Khan 6,โ€‰ Salman Khan 6,โ€‰ Deng-Ping Fan 4,5*
1 Tianjin Universityโ€‚ 2 Tianjin Key Laboratory of Machine Learningโ€‚ 3 Australian National Universityโ€‚
4 Nankai Institute of Advanced Research (SHENZHEN FUTIAN)โ€‚ 5 Nankai Universityโ€‚ 6 MBZUAIโ€‚
โ€‚ โ€‚ โ€‚ โ€‚ โ€‚ โ€‚

โญ Let's have a quick recap of our main idea. Full-screen viewing is recommended for better visual details.

lawdis_show_quick_v5_small.mov

We present LawDIS, a language-window-based controllable dichotomous image segmentation (DIS) framework. It supports two forms of user control: generating an initial mask based on user-provided language prompts, and enabling flexible refinement of user-defined regions (i.e., size-adjustable windows) within initial masks.

๐Ÿš€ 1. Features

The following is an introductory video of our work:

introduction_v2_small.mov
  • Framework innovation. We recast the DIS task as an image-conditioned mask generation problem within a latent diffusion model. This enables LawDIS to seamlessly integrate both macro and micro user controls under a unified model and a shared set of parameters.

  • Dual control modes. LawDIS employs a mode switcher to coordinate two distinct control modes. In macro mode, a language-controlled segmentation strategy (LS) generates an initial mask guided by user prompts. In micro mode, a window-controlled refinement strategy (WR) supports unlimited refinements on user-specified regions via size-adjustable local windows, enabling precise delineation of fine structures.

  • Flexible adaptation. LS and WR can function independently or in collaboration. Their joint use meets high-accuracy personalized demands, while the micro mode (WR) alone can serve as a general-purpose post-refinement tool to enhance outputs from any segmentation model.

  • Superior performance. Extensive evaluations on the DIS5K benchmark demonstrate that LawDIS consistently outperforms 11 state-of-the-art methods. Compared to the second-best model MVANet, LawDIS achieves a 3.6% $F_\beta^\omega$ improvement using LS alone, and a 4.6% gain when combining both LS and WR on DIS-TE.

๐Ÿ“ข 2. News

Note

The future directions of this project include more precise prompt control and improved efficiency. We warmly invite all potential collaborators to contribute to making LawDIS more accessible and practical. If you are interested in collaboration or have any questions about our paper, feel free to contact us via email ([email protected] & [email protected]). If you are using our code for your research, please cite this paper (BibTeX).

๐Ÿ› ๏ธ 3. Setup

3.1. Repository

Clone the repository (requires git):

git clone https://github.com/XinyuYanTJU/LawDIS.git
cd LawDIS

3.2. Dependencies

โœ… Step 1. Install the dependencies:

conda create -n lawdis python=3.8
conda activate lawdis
pip install -r requirements.txt

โœ… Step 2. Integrate Custom VAE into diffusers

This project uses a custom VAE class AutoencoderKlLawDIS that needs to be manually added into the diffusers library.

bash install_lawdis_diffusers.sh

3.3. Dataset Preparation

Download the DIS5K dataset from this Google Drive link or Baidu Pan link with the fetch code: rtgw. Unzip the dataset and move the DIS5K folder into the LawDIS/data directory.

The language prompts we annotated for DIS5K can be found in LawDIS/data/json/.

3.4. Inference

โœ… Step 1. Download the Checkpoints

Download the pre-trained checkpoints from this Google Drive link or Baidu Pan link with the fetch code: 2025. Place the checkpoint files under:

.stable-diffusion-2/

โœ… Step 2. Inference in Macro Mode

We provide scripts for:

  • Batch testing a dataset
  • Testing a single image with multiple language prompts
  • Fully automatic testing of a single image without requiring prompt input

Batch testing

python script/infer_macro_batch_imgs.py \
    --checkpoint "stable-diffusion-2" \
    --input_rgb_dir "data/DIS5K" \
    --subset_name "DIS-TE4" \
    --prompt_dir 'data/json' \
    --output_dir "output/output-macro" \
    --denoise_steps 1 \
    --processing_res 1024 

Testing a single image with prompts

python script/infer_macro_single_img.py \
    --checkpoint "stable-diffusion-2" \
    --input_img_path "data/imgs/2#Aircraft#7#UAV#16522310810_468dfa447a_o.jpg" \
    --prompts "Black professional camera drone with a high-definition camera mounted on a gimbal." "Three men beside a UAV." \
    --output_dir 'output/output-macro-single' \
    --denoise_steps 1 \
    --processing_res 1024 

Fully automatic testing of a single image without a prompt

python script/infer_macro_single_img.py \
    --checkpoint "stable-diffusion-2" \
    --input_img_path "data/imgs/2#Aircraft#7#UAV#16522310810_468dfa447a_o.jpg" \
    --prompts "" \
    --output_dir 'output/output-macro-single' \
    --denoise_steps 1 \
    --processing_res 1024 

โœ… Step 3. Inference in Micro Mode

We provide scripts for:

  • Batch testing a dataset
  • Testing a single image

You can choose how to generate the refinement windows using --window_mode:

  • "auto": Automatically select windows based on object edges in the initial segmentation map.
  • "semi-auto": Simulate user-guided selection using GT segmentation.
  • "manual": User manually selects windows (โš ๏ธ Only works on local servers).

Batch testing

python script/infer_micro_batch_imgs.py \
    --checkpoint "stable-diffusion-2" \
    --input_rgb_dir "data/DIS5K" \
    --subset_name "DIS-TE4" \
    --init_seg_dir 'output/output-macro/' \
    --output_dir "output/output-micro/" \
    --window_mode "semi-auto" \
    --denoise_steps 1 \
    --processing_res 1024 

Single image testing

python script/infer_micro_single_img.py \
    --checkpoint "stable-diffusion-2" \
    --input_img_path "data/imgs/2#Aircraft#7#UAV#16522310810_468dfa447a_o.jpg" \
    --init_seg_dir 'output/output-macro-single/2#Aircraft#7#UAV#16522310810_468dfa447a_o_0.png' \
    --output_dir "output/output-micro-single" \
    --window_mode "auto" \
    --denoise_steps 1 \
    --processing_res 1024 

๐Ÿ‹๏ธ 4. SOTA Results

The predicted segmentation maps can be downloaded from this Google Drive link or Baidu Pan link with the fetch code: lawd.

LawDIS-S refers to the initial segmentation results obtained under the macro mode with language prompt control. LawDIS-R refers to the refined results obtained under the micro mode, where window-based refinement are applied to LawDIS-S.

Notably, the initial results (LawDIS-S) already achieve SOTA performance, and LawDIS-R further improves the metrics.


Fig. 2: Quantitative comparison of DIS5K with 11 representative methods.


Fig. 3: Qualitative comparison of our model with four leading models. Local masks are evaluated with MAE score for clarity.

๐ŸŽฎ 5. Applications

Due to its capability of achieving high-precision segmentation of foreground objects at high resolutions, our LawDIS enables extensive application across a variety of scenarios. Fig. 6 shows application cases of background removal. As can be seen, compared with the original image, the background-removed image shows higher aesthetic values and good usability, which can even be directly used as: 3D modeling, augmented reality (AR), and still image animation.

Fig. 4: Application cases of background-removed results in various scenarios.


Fig. 5: Application cases of 3D modeling.


Fig. 6: Application cases of AR.


Fig. 7: Application cases of still image animation.

๐Ÿ“ฆ 6. Acknowledgement

Our code is based on Marigold and Diffusers. Latest DIS studies can refer to this awesome paper list organised by Xianjie Liu (SCU). We are grateful to the authors of these projects for their pioneering work and contributions!

๐ŸŽ“ 7. Citations

If you find this code useful, we kindly ask you to cite our paper in your work.

@article{yan2025lawdis,
  title={LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation},
  author={Xinyu Yan and Meijun Sun and Ge-Peng Ji and Fahad Shahbaz Khan and Salman Khan and Deng-Ping Fan},
  journal={arXiv preprint arXiv:2508.01152},
  year={2025}
}