Skip to content

The companion corpus

Both jscad-mcp and annomate ship as two repositories, not one. Each tool has a companion examples repo — jscad-mcp-example and annomate-examples — and the companion is not a “demos folder.” It is the structural choice that lets the tool grow in response to being used.

This essay is about the cycle that pairing produces. It is the thing the perception-loop thesis takes for granted, the thing the bench lessons document the consequences of, and the thing the four-round engine and the 440-pixel pail are individual instances of. None of those essays is about the cycle itself.

When a session goes wrong, the friction has to live somewhere. Both projects use three places:

LayerFormCost to editPersistence
The serverTypeScript / Python source, MCP tool surfaceHighest — change → rebuild → reinstall → restart sessionPermanent across all users
The skillMarkdown that Claude loads on demandMedium — text edit, lands next sessionPermanent for users with the skill installed
The corpus / lab notebookPer-example READMEs, VIA project JSON, .training/ entries, iteration GIFsLowest — write down what happenedPermanent for the session, not enforced on future ones

The discipline is to put each friction in the cheapest layer where it can actually be fixed. A one-off mistake (Claude misidentified a particular figure in Night Watch) stays in the notebook. A recurring placement error (face boxes drift on leaning figures) becomes a skill rule. A capability gap (no way to fetch the image with overlays) becomes a server change.

Without the corpus, there is no rung-by-rung path for friction to move down — only the server, and the server gets cluttered with edits that should have been skill rules, or starved of edits that should have been tools.

What the corpus does that a test suite doesn’t

Section titled “What the corpus does that a test suite doesn’t”

It would be tempting to call the example repos “regression tests.” That undersells them. A regression test asserts that a known input still produces a known output. The corpus does that — every .jscad file still renders, every annotations.json still loads — but it also does three things a test suite can’t:

It freezes context. The cycloidal-drive example’s iteration GIF is not just an image; it is the visible record of the four wrong shapes that came out of the renderer on the way to the right one. When a future session brushes against the same failure mode, the GIF is there as a prior. The same is true of the bench’s per-example READMEs: the Vermeer one names the 440-pixel pail explicitly, so the next session that opens a Vermeer reads the failure mode before it makes it again.

It accumulates failure vocabulary. “Pratt diagonal rule,” “marching-cubes kink at f = 0,” “posture-induced face drift,” “specialist-vocabulary VLM blind spot” — none of these phrases existed at the start of either project. Each one was coined in a notebook entry to describe a specific session’s pain, then promoted up the stack once it had recurred enough times to be worth naming. The vocabulary is itself a tool: once a failure has a name, future sessions can talk about it.

It contains the negative space. git blame tells a future reader what code changed and when. It does not tell them what was tried and rejected, what almost shipped, what the user corrected before save. The corpus does — every annotated.gif is a record of where the boxes ended up after correction, and the per-example README often names the regions that were originally placed wrong. Those failures are not in any commit; they are only in the corpus.

The visible record of the cycle is in both repos’ commit histories.

For jscad-mcp, the trajectory is roughly:

  • render_test added after first-install pipelines kept silently producing white images and there was no way to localize the break.
  • slice added because per-part inspection wasn’t enough — the engine’s intake/exhaust ports were inside the block and visible only in cross-section.
  • Named parts + highlight + label_parts added because every demo walkthrough kept saying “the disc” or “the conrod” with no way for Claude to isolate the part it was talking about.
  • The skill split (jscad, jscad-wiki, jscad-examples) happened after sessions kept loading the full API surface to look up one function — paying token cost for context that wasn’t relevant to the task.

For annomate, the bench-lessons table is the same record, more explicit. Fraction coordinates after four sessions of arithmetic errors. overlay=true after the milkmaid. via_classify_scene and via_read_metadata after GRACE-FO. via_grade_annotations after “eyeball every box” stopped scaling at ~15 regions.

None of those changes would have shown up in a planning document. They were not foreseeable from the spec; they were extracted from use. The corpus is what made them extractable.

The non-obvious benefit of the cycle is not that any individual loop is fast — it is that each loop’s output makes the next loop cheaper.

  • A new MCP tool is permanent. After via_get_image_crop lands, every subsequent small-feature placement is one tool call instead of three.
  • A new skill rule is permanent. After “state your prior in writing” enters the skill, every subsequent session writes down the prior — the corpus stops accumulating the prior-shaped misplacement class.
  • A new corpus entry is permanent. After the Vermeer session ships, every future Vermeer-like session has a worked example to read first.

The three layers are different in cost, but they are the same in direction: each edit removes a class of friction from every future session. The model doesn’t get better. The tool around the model does.

This is why the estimates-vs-reality numbers are misleading on first read. The pace isn’t fast because Claude is fast. The pace is fast because the second demo benefits from infrastructure built for the first, and the seventh demo benefits from skills hardened by the third through sixth. By the time the lithophane demo shipped, the perception loop had a render-policy, a parameter-UI generator, a screenshot pipeline, a bundled-jscad export script, and a skill that knew when to invoke each. None of that existed when the cycloidal drive shipped.

The companion corpus is not documentation. It is the half of the development loop that lives outside the tool. Without it, an MCP server is a static surface — users install it, run it, hit friction, work around the friction, and the server never knows.

With it, the friction has somewhere to go. It moves into the notebook the first time it happens, into the skill the second or third time, into the server when it becomes a capability gap. Each layer is editable at a cost matched to how often the friction recurs, and each edit lands permanently in the next session.

The two projects are built the same way for the same reason: the loop is the unit of work, and the loop only closes if there is a place for what was learned to land.


← Back to projects · The perception loop · Lessons from the bench · Estimates vs reality