Skip to content

Commit 0fee2a9

Browse files
author
Mark-ZhouWX
committed
update readme of blip2 and clip
1 parent eb04165 commit 0fee2a9

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

official/cv/segment-anything/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -124,13 +124,13 @@ with the extracted CLIP image embeddings as text prompt input. At inference time
124124

125125
The key that make the training procedure work is that CLIP’s image embeddings are trained to align with its text embeddings.
126126

127-
This repository provides an implementation of text-to-mask finetune referring to the model structure and training procedure described in the official SAM paper and replace CLIP to a stronger multimodal encoder BLIP2.
127+
This repository provides an implementation of text-to-mask finetune referring to the model structure and training procedure described in the official SAM paper and introduces a stronger multimodal encoder BLIP2 in addition to CLIP.
128128

129129
A machine with **64G ascend memory** is required for text-prompt finetune.
130130

131131
First download SA-1B dataset and put it under `${project_root}/datasets/sa-1b`.
132132

133-
for standalone finetune of SA-1B dataset, please run:
133+
for standalone finetune of SA-1B dataset with BLIP2 (CLIP is similar), please run:
134134
```shell
135135
python train.py -c configs/sa1b_text_finetune_blip2.yaml
136136
```
@@ -146,7 +146,7 @@ the fine-tuned model will be saved at the work_root specified in `configs/sa1b_t
146146
python text_inference.py --checkpoint=your/path/to/ckpt --text-prompt your_prompt
147147
```
148148

149-
Below are some zero-shot experimental result prompted with `floor` and `buildings`. The checkpoint can be downloaded [here](https://download-mindspore.osinfra.cn/toolkits/mindone/sam/sam_vitb_text_finetune_sa1b_10k-972de39e.ckpt). _Note that the model is trained with limited data and the smallest SAM type `vit_b`._
149+
Below are some zero-shot experimental result prompted with `floor` and `buildings`. The checkpoint fine-tuned with BLIP2 can be downloaded [here](https://download-mindspore.osinfra.cn/toolkits/mindone/sam/sam_vitb_text_finetune_sa1b_10k-972de39e.ckpt). _Note that the model is trained with limited data and the smallest SAM type `vit_b`._
150150

151151
<div align="center">
152152
<img src="images/dengta-floor.png" height="350" />

0 commit comments

Comments
 (0)