The dials
Power-user settings. The defaults are honest; everything here is for people who want their hands on the loop.
Answering
Prompt versionPin a version for reproducible runs. “Latest promoted” follows every gate win automatically.
pinned — the trust chip will say so
One clarifying questionAllow the agent a single question when material facts are missing. Off = it must state its assumption in the answer instead.
Broker-review thresholdBelow this confidence, answers carry the “have a broker review” flag. Raising it makes Harmony more cautious, never more sure.
0.62
Rulings retrieved per answerHow many CROSS precedents the search hands the agent. More context, slower answers.
8
The loop
Nightly roundGrade → introspect → fix → gate, unattended, every night.
02:00 UTC
Nightly eval batchHeld-out rulings graded per round. Bigger = steadier numbers, longer rounds.
300
“Improve now” batchScoped so a full live round fits in about a minute.
20
Promotion gateThe tripwire. A new prompt ships only if it clears every line — edit the margins, not the principle.
10-digit exact ≥ +pt
4- and 6-digit ≥ −pt (no backsliding)
judge score ≥ incumbent
4- and 6-digit ≥ −pt (no backsliding)
judge score ≥ incumbent
Fold live traffic into evalsStranger classifications become candidate eval cases next round (text only, never identity).
Phoenix
ConnectionPhoenix Cloud project receiving every trace, experiment, prompt, and dataset.
checking…
MCP toolsetThe runtime introspection surface — what the agent reads its own history through.
—
Open in PhoenixSame data, no makeup. What you see here is what the agent sees over MCP.
Your data
Keep my classification historyStores your pastes and results on this device’s session so Catalog exports survive a refresh. Off = answers evaporate when the tab closes.
Download everythingYour classifications, codes, cited rulings, and trace links — one CSV.
Forget meDeletes your session history here. Traces already graded stay in the public eval record — anonymised, that’s the deal that keeps the curve honest.
Display: codes are grouped 8518.30.20.00 — switch to raw 8518302000 here. Round times shown in your timezone.