Install without Docker
You should set the environment variable manually as follows if you want to build a local GPU environment for Grounded-SAM:
1
2
3
| export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-11.3/
|
Install Segment Anything:
1
| python -m pip install -e segment_anything
|
Install Grounding DINO:
1
| pip install --no-build-isolation -e GroundingDINO
|
Install diffusers:
1
| pip install --upgrade diffusers[torch]
|
Install osx:
1
2
| git submodule update --init --recursive
cd grounded-sam-osx && bash install.sh
|
Install RAM & Tag2Text:
1
2
3
| git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install -e ./recognize-anything/
|
The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. jupyter is also required to run the example notebooks.
pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel
More details can be found in install segment anything and install GroundingDINO and install OSX
🏷️ Grounded-SAM with RAM or Tag2Text for Automatic Labeling
The Recognize Anything Models are a series of open-source and strong fundamental image recognition models, including RAM++, RAM and Tag2text.
It is seamlessly linked to generate pseudo labels automatically as follows:
- Use RAM/Tag2Text to generate tags.
- Use Grounded-Segment-Anything to generate the boxes and masks.
Step 1: Init submodule and download the pretrained checkpoint
1
2
3
| cd Grounded-Segment-Anything
git submodule init
git submodule update
|
- Download pretrained weights for
GroundingDINO, SAM and RAM/Tag2Text:
1
2
3
4
5
6
| wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/ram_swin_large_14m.pth
wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/tag2text_swin_14m.pth
|
Step 2: Running the demo with RAM
1
2
3
4
5
6
7
8
9
10
11
12
| export CUDA_VISIBLE_DEVICES=0
python automatic_label_ram_demo.py \
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
--ram_checkpoint ram_swin_large_14m.pth \
--grounded_checkpoint groundingdino_swint_ogc.pth \
--sam_checkpoint sam_vit_h_4b8939.pth \
--input_image assets/demo9.jpg \
--output_dir "outputs" \
--box_threshold 0.25 \
--text_threshold 0.2 \
--iou_threshold 0.5 \
--device "cuda"
|
Step 2: Or Running the demo with Tag2Text
1
2
3
4
5
6
7
8
9
10
11
12
| export CUDA_VISIBLE_DEVICES=0
python automatic_label_tag2text_demo.py \
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
--tag2text_checkpoint tag2text_swin_14m.pth \
--grounded_checkpoint groundingdino_swint_ogc.pth \
--sam_checkpoint sam_vit_h_4b8939.pth \
--input_image assets/demo9.jpg \
--output_dir "outputs" \
--box_threshold 0.25 \
--text_threshold 0.2 \
--iou_threshold 0.5 \
--device "cuda"
|
- RAM++ significantly improves the open-set capability of RAM, for RAM++ inference on unseen categoreis.
- Tag2Text also provides powerful captioning capabilities, and the process with captions can refer to BLIP.
- The pseudo labels and model prediction visualization will be saved in
output_dir as follows (right figure):

💬 评论