annomate

annomate is an MCP server for VIA v3. It serves the VIA image annotator at a localhost URL, syncs annotations to a local store, and exposes a suite of MCP tools so Claude can read, add, edit, and delete regions in real time alongside the user.

It is the perceptual counterpart to jscad-mcp. Where jscad-mcp closes the loop by letting Claude render and look at what it built, annomate closes the loop by letting Claude commit coordinates and look at what it sees. A bounding box at [2, 1850, 2400, 380, 410] is either on the brass milk pail or it isn’t — vague language hides perception errors; precise coordinates expose them.

Jaguar annotated with 11 regions via annomate

For 20 worked perception sessions on real images — wildlife, art, photography, drawings — see annomate-examples.

What it does

Serves the VIA image annotator at a localhost URL
Implements VIA’s push/pull protocol so annotations sync to a local store
Auto-refreshes the browser when Claude makes changes (no manual pull needed)
Exposes MCP tools so Claude can read, add, edit, and delete regions
Optional local-model assistance — open-vocabulary detection (GroundingDINO / YOLO-World), promptable segmentation (SAM 2), VLM verification (Florence-2), scene classification, annotation grading, and free-form Q&A (Qwen2.5-VL)
Optional format conversion — HEIC / HEIF / AVIF (iPhone photos), PDF pages, EXIF / GPS / camera metadata
Optional OCR — Tesseract over image regions, returns word-level boxes as detection candidates

Optional features advertise themselves on the MCP tool surface even when their dependencies aren’t installed — they return a structured install hint instead of erroring.

Install

The base install is the annotation server alone. Install from GitHub into a dedicated venv:

python -m venv ~/.local/annomate
~/.local/annomate/bin/pip install \
  'annomate @ git+https://github.com/caliperhq/annomate.git'

Pick the extras you want by appending to the git+ URL:

# Local AI models (~3 GB on disk, lazy-downloaded on first use)
pip install 'annomate[ai] @ git+https://github.com/caliperhq/annomate.git'

# Faster detection (YOLO-World, ~95 MB)
pip install 'annomate[ai,yolo] @ git+https://github.com/caliperhq/annomate.git'

# Free-form Q&A via chat VLM (Qwen2.5-VL-3B, ~6 GB)
pip install 'annomate[ai,chat] @ git+https://github.com/caliperhq/annomate.git'

# Format conversion (HEIC, PDF) + EXIF metadata
pip install 'annomate[io] @ git+https://github.com/caliperhq/annomate.git'

# OCR via Tesseract
pip install 'annomate[ocr] @ git+https://github.com/caliperhq/annomate.git'

# Everything
pip install 'annomate[ai,yolo,chat,io,ocr] @ git+https://github.com/caliperhq/annomate.git'

The package is not on PyPI — the name annomate there belongs to an unrelated project.

System packages

Several optional features need command-line tools the Python extras can’t install themselves. Install only what you’ll use.

Feature	Tool	Debian/Ubuntu	macOS	Gentoo	Fedora/RHEL	Arch
PDF loading (`[io]`)	`pdftoppm`	`apt install poppler-utils`	`brew install poppler`	`emerge app-text/poppler`	`dnf install poppler-utils`	`pacman -S poppler`
OCR (`[ocr]`)	`tesseract`	`apt install tesseract-ocr`	`brew install tesseract`	`emerge app-text/tesseract`	`dnf install tesseract`	`pacman -S tesseract`
OCR language packs	`tesseract-<lang>`	`apt install tesseract-ocr-eng …`	`brew install tesseract-lang`	set `LINGUAS="en es …"` then re-emerge tesseract	`dnf install tesseract-langpack-eng …`	`pacman -S tesseract-data-eng …`
Rich EXIF (`[io]`)	`exiftool`	`apt install libimage-exiftool-perl`	`brew install exiftool`	`emerge media-libs/exiftool`	`dnf install perl-Image-ExifTool`	`pacman -S perl-image-exiftool`
AI accelerator (optional)	NVIDIA drivers + CUDA	distribution-specific	n/a (use MPS)	`emerge nvidia-drivers`	`dnf install akmod-nvidia`	`pacman -S nvidia`

HEIC support (pillow-heif) bundles its own libheif; no system package is required.

Claude Code config

Add to your .mcp.json:

{
  "mcpServers": {
    "annomate": {
      "command": "/path/to/annomate"
    }
  }
}

Find the entry point after install:

which annomate                    # global/user install
~/.local/annomate/bin/annomate    # venv

Usage

annomate                         # start server (port OS-assigned)
annomate --port 9669             # pin the port
annomate --browser               # open the UI on startup
annomate --no-ai                 # skip the model registry entirely
annomate --models-config FILE    # override ~/.config/annomate/models.toml

The local URL prints to stderr on startup. Open it to use the annotator; load images, draw boxes, then ask Claude about them — or ask Claude to add annotations directly. Changes appear in your browser within a few seconds.

Skills

Install the companion Claude Code skill so Claude knows the annotation workflow, the AI tools, and the trigger-phrase patterns automatically:

cp -r skills/annomate ~/.claude/skills/

The skill is split into a small SKILL.md plus sibling reference files (region-encoding, attributes, perception-gotchas, ai-tools, common-patterns) — Claude loads only what’s relevant to a given request.

MCP tools

Core (always available)

Tool	Description
`via_get_annotator_url`	Return the local URL of the running VIA UI
`via_add_file`	Add an image (or PDF page, via `[io]`) to the project
`via_get_image` / `via_get_image_crop`	Fetch the image (or a crop) at a target resolution, optionally with the current annotations overlaid
`via_get_project` / `via_list_files` / `via_get_annotations`	Read the project state
`via_add_region` / `via_update_region` / `via_delete_region`	Write regions in pixel, returned-pixel, or fraction (0–1) coordinates
`via_update_project` / `via_save_project`	Bulk-load or persist a VIA v3 project JSON

Local-model assistance — needs [ai]

Tool	Description
`via_model_status`	Report which model adapters are loaded and on which device
`via_suggest_regions`	Open-vocabulary detection (GroundingDINO / YOLO-World), excluding existing regions
`via_tighten_region`	SAM-derived tight bounding box + IoU score
`via_verify_region`	Florence-2 / Qwen-VL crop caption check for generic categories
`via_grade_annotations`	CLIP-cosine rubric across the project — flags misplacements and suggests shape encodings
`via_classify_scene`	Sub-second CLIP scene class — auto-routes detection to the right pipeline
`via_ask_model`	Free-form Q&A against a chat VLM (Qwen2.5-VL-3B)
`via_find_similar`	CLIP nearest-neighbour lookup across the project

IO layer — needs [io] / [ocr]

Tool	Description
`via_load_document`	Rasterize PDF pages and add them as images
`via_read_metadata`	EXIF / GPS / camera metadata — catches a class of priors before they’re placed
`via_run_ocr`	Tesseract over image regions, returning word-level boxes as detection candidates

Why a third voice

annomate is built around a three-voice loop. The model proposes regions; the user reviews in the browser at full resolution; the local-model layer adds a pixel-statistics verdict between the two. Each voice catches what the others miss:

Voice	What it knows	How it speaks
Claude	Cultural priors, language, whole-image gestalt	Boxes, polygons, polylines, circles; evidence-based labels
Local model (`[ai]`)	Pixel statistics, SAM masks, CLIP embeddings, VLM captions	Suggests candidates, tightens boxes, verifies labels, grades placements
User	Image at full resolution + actual knowledge of the subject	Adjudicates, corrects, deletes

Essays

How it pairs with jscad-mcp

Loop	Origin of the artifact	Forcing function	What gets exposed
`jscad-mcp`	Claude wrote it	Render and look	Code-vs-intent mismatch
`annomate`	Someone else made it	Commit coordinates and look	Prior-vs-reality mismatch

Both tools refuse to let the model get away with sounding right.

Source: github.com/caliperhq/annomate
License: MIT (VIA included under BSD 2-Clause — see NOTICE; the optional YOLOE adapter is upstream AGPL-3.0)