Grounding SAM使用方法 - 张芷铭的个人博客

Install without Docker

You should set the environment variable manually as follows if you want to build a local GPU environment for Grounded-SAM:

1
2
3
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-11.3/

Install Segment Anything:

1
python -m pip install -e segment_anything

Install Grounding DINO:

1
pip install --no-build-isolation -e GroundingDINO

Install diffusers:

1
pip install --upgrade diffusers[torch]

Install osx:

1
2
git submodule update --init --recursive
cd grounded-sam-osx && bash install.sh

Install RAM & Tag2Text:

1
2
3
git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install -e ./recognize-anything/

The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. jupyter is also required to run the example notebooks.

pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel

More details can be found in install segment anything and install GroundingDINO and install OSX

🏷️ Grounded-SAM with RAM or Tag2Text for Automatic Labeling

The Recognize Anything Models are a series of open-source and strong fundamental image recognition models, including RAM++, RAM and Tag2text.

It is seamlessly linked to generate pseudo labels automatically as follows:

Use RAM/Tag2Text to generate tags.
Use Grounded-Segment-Anything to generate the boxes and masks.

Step 1: Init submodule and download the pretrained checkpoint

Init submodule:

1
2
3
cd Grounded-Segment-Anything
git submodule init
git submodule update

Download pretrained weights for GroundingDINO, SAM and RAM/Tag2Text:

1
2
3
4
5
6
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth


wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/ram_swin_large_14m.pth
wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/tag2text_swin_14m.pth

Step 2: Running the demo with RAM

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
export CUDA_VISIBLE_DEVICES=0
python automatic_label_ram_demo.py \
  --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
  --ram_checkpoint ram_swin_large_14m.pth \
  --grounded_checkpoint groundingdino_swint_ogc.pth \
  --sam_checkpoint sam_vit_h_4b8939.pth \
  --input_image assets/demo9.jpg \
  --output_dir "outputs" \
  --box_threshold 0.25 \
  --text_threshold 0.2 \
  --iou_threshold 0.5 \
  --device "cuda"

Step 2: Or Running the demo with Tag2Text

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
export CUDA_VISIBLE_DEVICES=0
python automatic_label_tag2text_demo.py \
  --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
  --tag2text_checkpoint tag2text_swin_14m.pth \
  --grounded_checkpoint groundingdino_swint_ogc.pth \
  --sam_checkpoint sam_vit_h_4b8939.pth \
  --input_image assets/demo9.jpg \
  --output_dir "outputs" \
  --box_threshold 0.25 \
  --text_threshold 0.2 \
  --iou_threshold 0.5 \
  --device "cuda"

RAM++ significantly improves the open-set capability of RAM, for RAM++ inference on unseen categoreis.
Tag2Text also provides powerful captioning capabilities, and the process with captions can refer to BLIP.
The pseudo labels and model prediction visualization will be saved in output_dir as follows (right figure):

Install without Docker

🏷️ Grounded-SAM with RAM or Tag2Text for Automatic Labeling

💬 评论