张芷铭的个人博客

Grounding SAM使用方法

Install without Docker

You should set the environment variable manually as follows if you want to build a local GPU environment for Grounded-SAM:

1
2
3
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-11.3/

Install Segment Anything:

1
python -m pip install -e segment_anything

Install Grounding DINO:

1
pip install --no-build-isolation -e GroundingDINO

Install diffusers:

1
pip install --upgrade diffusers[torch]

Install osx:

1
2
git submodule update --init --recursive
cd grounded-sam-osx && bash install.sh

Install RAM & Tag2Text:

1
2
3
git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install -e ./recognize-anything/

The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. jupyter is also required to run the example notebooks.

pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel

More details can be found in install segment anything and install GroundingDINO and install OSX

🏷️ Grounded-SAM with RAM or Tag2Text for Automatic Labeling

The Recognize Anything Models are a series of open-source and strong fundamental image recognition models, including RAM++RAM and Tag2text.

It is seamlessly linked to generate pseudo labels automatically as follows:

  1. Use RAM/Tag2Text to generate tags.
  2. Use Grounded-Segment-Anything to generate the boxes and masks.

Step 1: Init submodule and download the pretrained checkpoint

  • Init submodule:
1
2
3
cd Grounded-Segment-Anything
git submodule init
git submodule update
  • Download pretrained weights for GroundingDINOSAM and RAM/Tag2Text:
1
2
3
4
5
6
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth


wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/ram_swin_large_14m.pth
wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/tag2text_swin_14m.pth

Step 2: Running the demo with RAM

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
export CUDA_VISIBLE_DEVICES=0
python automatic_label_ram_demo.py \
  --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
  --ram_checkpoint ram_swin_large_14m.pth \
  --grounded_checkpoint groundingdino_swint_ogc.pth \
  --sam_checkpoint sam_vit_h_4b8939.pth \
  --input_image assets/demo9.jpg \
  --output_dir "outputs" \
  --box_threshold 0.25 \
  --text_threshold 0.2 \
  --iou_threshold 0.5 \
  --device "cuda"

Step 2: Or Running the demo with Tag2Text

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
export CUDA_VISIBLE_DEVICES=0
python automatic_label_tag2text_demo.py \
  --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
  --tag2text_checkpoint tag2text_swin_14m.pth \
  --grounded_checkpoint groundingdino_swint_ogc.pth \
  --sam_checkpoint sam_vit_h_4b8939.pth \
  --input_image assets/demo9.jpg \
  --output_dir "outputs" \
  --box_threshold 0.25 \
  --text_threshold 0.2 \
  --iou_threshold 0.5 \
  --device "cuda"
  • RAM++ significantly improves the open-set capability of RAM, for RAM++ inference on unseen categoreis.
  • Tag2Text also provides powerful captioning capabilities, and the process with captions can refer to BLIP.
  • The pseudo labels and model prediction visualization will be saved in output_dir as follows (right figure):

💬 评论