Skip to content

annomate

annomate is an MCP server for VIA v3. It serves the VIA image annotator at a localhost URL, syncs annotations to a local store, and exposes a suite of MCP tools so Claude can read, add, edit, and delete regions in real time alongside the user.

It is the perceptual counterpart to jscad-mcp. Where jscad-mcp closes the loop by letting Claude render and look at what it built, annomate closes the loop by letting Claude commit coordinates and look at what it sees. A bounding box at [2, 1850, 2400, 380, 410] is either on the brass milk pail or it isn’t — vague language hides perception errors; precise coordinates expose them.

Jaguar annotated with 11 regions via annomate

For 20 worked perception sessions on real images — wildlife, art, photography, drawings — see annomate-examples.

  • Serves the VIA image annotator at a localhost URL
  • Implements VIA’s push/pull protocol so annotations sync to a local store
  • Auto-refreshes the browser when Claude makes changes (no manual pull needed)
  • Exposes MCP tools so Claude can read, add, edit, and delete regions
  • Optional local-model assistance — open-vocabulary detection (GroundingDINO / YOLO-World), promptable segmentation (SAM 2), VLM verification (Florence-2), scene classification, annotation grading, and free-form Q&A (Qwen2.5-VL)
  • Optional format conversion — HEIC / HEIF / AVIF (iPhone photos), PDF pages, EXIF / GPS / camera metadata
  • Optional OCR — Tesseract over image regions, returns word-level boxes as detection candidates

Optional features advertise themselves on the MCP tool surface even when their dependencies aren’t installed — they return a structured install hint instead of erroring.

The base install is the annotation server alone. Install from GitHub into a dedicated venv:

Terminal window
python -m venv ~/.local/annomate
~/.local/annomate/bin/pip install \
'annomate @ git+https://github.com/caliperhq/annomate.git'

Pick the extras you want by appending to the git+ URL:

Terminal window
# Local AI models (~3 GB on disk, lazy-downloaded on first use)
pip install 'annomate[ai] @ git+https://github.com/caliperhq/annomate.git'
# Faster detection (YOLO-World, ~95 MB)
pip install 'annomate[ai,yolo] @ git+https://github.com/caliperhq/annomate.git'
# Free-form Q&A via chat VLM (Qwen2.5-VL-3B, ~6 GB)
pip install 'annomate[ai,chat] @ git+https://github.com/caliperhq/annomate.git'
# Format conversion (HEIC, PDF) + EXIF metadata
pip install 'annomate[io] @ git+https://github.com/caliperhq/annomate.git'
# OCR via Tesseract
pip install 'annomate[ocr] @ git+https://github.com/caliperhq/annomate.git'
# Everything
pip install 'annomate[ai,yolo,chat,io,ocr] @ git+https://github.com/caliperhq/annomate.git'

The package is not on PyPI — the name annomate there belongs to an unrelated project.

Several optional features need command-line tools the Python extras can’t install themselves. Install only what you’ll use.

FeatureToolDebian/UbuntumacOSGentooFedora/RHELArch
PDF loading ([io])pdftoppmapt install poppler-utilsbrew install poppleremerge app-text/popplerdnf install poppler-utilspacman -S poppler
OCR ([ocr])tesseractapt install tesseract-ocrbrew install tesseractemerge app-text/tesseractdnf install tesseractpacman -S tesseract
OCR language packstesseract-<lang>apt install tesseract-ocr-eng …brew install tesseract-langset LINGUAS="en es …" then re-emerge tesseractdnf install tesseract-langpack-eng …pacman -S tesseract-data-eng …
Rich EXIF ([io])exiftoolapt install libimage-exiftool-perlbrew install exiftoolemerge media-libs/exiftooldnf install perl-Image-ExifToolpacman -S perl-image-exiftool
AI accelerator (optional)NVIDIA drivers + CUDAdistribution-specificn/a (use MPS)emerge nvidia-driversdnf install akmod-nvidiapacman -S nvidia

HEIC support (pillow-heif) bundles its own libheif; no system package is required.

Add to your .mcp.json:

{
"mcpServers": {
"annomate": {
"command": "/path/to/annomate"
}
}
}

Find the entry point after install:

Terminal window
which annomate # global/user install
~/.local/annomate/bin/annomate # venv
Terminal window
annomate # start server (port OS-assigned)
annomate --port 9669 # pin the port
annomate --browser # open the UI on startup
annomate --no-ai # skip the model registry entirely
annomate --models-config FILE # override ~/.config/annomate/models.toml

The local URL prints to stderr on startup. Open it to use the annotator; load images, draw boxes, then ask Claude about them — or ask Claude to add annotations directly. Changes appear in your browser within a few seconds.

Install the companion Claude Code skill so Claude knows the annotation workflow, the AI tools, and the trigger-phrase patterns automatically:

Terminal window
cp -r skills/annomate ~/.claude/skills/

The skill is split into a small SKILL.md plus sibling reference files (region-encoding, attributes, perception-gotchas, ai-tools, common-patterns) — Claude loads only what’s relevant to a given request.

Core (always available)

ToolDescription
via_get_annotator_urlReturn the local URL of the running VIA UI
via_add_fileAdd an image (or PDF page, via [io]) to the project
via_get_image / via_get_image_cropFetch the image (or a crop) at a target resolution, optionally with the current annotations overlaid
via_get_project / via_list_files / via_get_annotationsRead the project state
via_add_region / via_update_region / via_delete_regionWrite regions in pixel, returned-pixel, or fraction (0–1) coordinates
via_update_project / via_save_projectBulk-load or persist a VIA v3 project JSON

Local-model assistance — needs [ai]

ToolDescription
via_model_statusReport which model adapters are loaded and on which device
via_suggest_regionsOpen-vocabulary detection (GroundingDINO / YOLO-World), excluding existing regions
via_tighten_regionSAM-derived tight bounding box + IoU score
via_verify_regionFlorence-2 / Qwen-VL crop caption check for generic categories
via_grade_annotationsCLIP-cosine rubric across the project — flags misplacements and suggests shape encodings
via_classify_sceneSub-second CLIP scene class — auto-routes detection to the right pipeline
via_ask_modelFree-form Q&A against a chat VLM (Qwen2.5-VL-3B)
via_find_similarCLIP nearest-neighbour lookup across the project

IO layer — needs [io] / [ocr]

ToolDescription
via_load_documentRasterize PDF pages and add them as images
via_read_metadataEXIF / GPS / camera metadata — catches a class of priors before they’re placed
via_run_ocrTesseract over image regions, returning word-level boxes as detection candidates

annomate is built around a three-voice loop. The model proposes regions; the user reviews in the browser at full resolution; the local-model layer adds a pixel-statistics verdict between the two. Each voice catches what the others miss:

VoiceWhat it knowsHow it speaks
ClaudeCultural priors, language, whole-image gestaltBoxes, polygons, polylines, circles; evidence-based labels
Local model ([ai])Pixel statistics, SAM masks, CLIP embeddings, VLM captionsSuggests candidates, tightens boxes, verifies labels, grades placements
UserImage at full resolution + actual knowledge of the subjectAdjudicates, corrects, deletes
LoopOrigin of the artifactForcing functionWhat gets exposed
jscad-mcpClaude wrote itRender and lookCode-vs-intent mismatch
annomateSomeone else made itCommit coordinates and lookPrior-vs-reality mismatch

Both tools refuse to let the model get away with sounding right.