stuff.md
open experiment

Tokenmaxing for agent tools — building tools an AI agent finishes a job with in the fewest tokens, whether you ship a CLI, an MCP, or a code-exec API.

The idea

axi.md is a style guide. stuff.md is a sport.

An agent's only currency is tokens; its only failure modes are wasted tokens and wrong actions. So we measure every tool by one number and compete to drive it down — a living leaderboard instead of a one-time study. Builds on axi.md, which made the first case that principled tool design beats raw CLIs and MCP.

The metric

TPT

Tokens Per Task — tokens to correctly finish a job, counting how the agent even learns the tool. Lower is better.

Playbook — spend fewer tokens

1
Answer, don't dumpSmallest output that picks the next move.
2
No new syntaxCSV for flat, JSON for nested. Nothing bespoke.
3
Truncate, offer the restSize hint + a --full escape hatch.
4
Lead with the summaryTotals first — no round-trip to learn shape.
5
Stable top, fresh bottomKeep the cacheable prefix byte-stable.
6
Say "unchanged"Don't re-send what the agent already saw.

Playbook — spend them right

7
No-args = live state + next stepsOrient in one call, not a manual.
8
Errors carry the fixThe exact next command, not just what broke.
9
Validate before actingBad write fails pre-flight; changes nothing.
10
Return the new stateShow what changed — never a bare "OK".
Same rules, any surfaceCLI, MCP, or code-exec — judged on TPT.

What I'm trying to find

Which surface wins on TPT?CLI vs MCP vs code-exec vs raw fetch.
Does it change by domain?GitHub vs browser vs DB vs Slack.
Does it change by model?Opus / Sonnet / Haiku / GPT / Gemini.
Can a tokenmaxed MCP beat a CLI?We build both and check.

Benchmark conditions fair TPT

ghstuffour CLI + skill
ours
gh-axiaxi's CLI + skill/hook
cli
ghraw CLI, no skill — the floor
cli
gh-mcpGitHub MCP (full schemas)
mcp
gh-mcp-stuffour MCP — the head-to-head
ours
gh-curlraw API — absolute floor
fetch

What it looks like

$ ghstuff issues
14 open · 8771 total
issues[3]{number,title,state}:
  51815,"telegram plugin crash",open
  51803,"dark mode flicker",open
  51790,"slow cold start",open
next: ghstuff view <number>  ·  ghstuff issues --since a91f