Grounded-SAM 结合 Grounding DINO 和 SAM 实现文本引导的自动分割，配合 RAM/Tag2Text 可全自动生成标签。

环境配置

1
2
3
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-11.3/

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 安装核心组件
python -m pip install -e segment_anything
pip install --no-build-isolation -e GroundingDINO
pip install --upgrade diffusers[torch]

# 安装 RAM & Tag2Text
git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install -e ./recognize-anything/

# 可选依赖
pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel

自动标注流程

下载预训练权重：

1
2
3
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/ram_swin_large_14m.pth

运行 RAM 自动标注：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
python automatic_label_ram_demo.py \
  --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
  --ram_checkpoint ram_swin_large_14m.pth \
  --grounded_checkpoint groundingdino_swint_ogc.pth \
  --sam_checkpoint sam_vit_h_4b8939.pth \
  --input_image assets/demo9.jpg \
  --output_dir "outputs" \
  --box_threshold 0.25 \
  --text_threshold 0.2 \
  --iou_threshold 0.5 \
  --device "cuda"

工作流程

RAM/Tag2Text 生成图像标签
Grounding DINO 根据标签检测边界框
SAM 根据边界框生成分割掩码

环境配置

自动标注流程

工作流程

Comments