You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: official/cv/segment-anything/README.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -124,13 +124,13 @@ with the extracted CLIP image embeddings as text prompt input. At inference time
124
124
125
125
The key that make the training procedure work is that CLIP’s image embeddings are trained to align with its text embeddings.
126
126
127
-
This repository provides an implementation of text-to-mask finetune referring to the model structure and training procedure described in the official SAM paper and replace CLIP to a stronger multimodal encoder BLIP2.
127
+
This repository provides an implementation of text-to-mask finetune referring to the model structure and training procedure described in the official SAM paper and introduces a stronger multimodal encoder BLIP2 in addition to CLIP.
128
128
129
129
A machine with **64G ascend memory** is required for text-prompt finetune.
130
130
131
131
First download SA-1B dataset and put it under `${project_root}/datasets/sa-1b`.
132
132
133
-
for standalone finetune of SA-1B dataset, please run:
133
+
for standalone finetune of SA-1B dataset with BLIP2 (CLIP is similar), please run:
Below are some zero-shot experimental result prompted with `floor` and `buildings`. The checkpoint can be downloaded [here](https://download-mindspore.osinfra.cn/toolkits/mindone/sam/sam_vitb_text_finetune_sa1b_10k-972de39e.ckpt). _Note that the model is trained with limited data and the smallest SAM type `vit_b`._
149
+
Below are some zero-shot experimental result prompted with `floor` and `buildings`. The checkpoint fine-tuned with BLIP2 can be downloaded [here](https://download-mindspore.osinfra.cn/toolkits/mindone/sam/sam_vitb_text_finetune_sa1b_10k-972de39e.ckpt). _Note that the model is trained with limited data and the smallest SAM type `vit_b`._
0 commit comments