Agent Trust Boundaries¶
Agent operations have different reliability characteristics. This standard defines three trust tiers so that sessions and contributors know which outputs to verify.
Trust Tiers¶
| Tier | Operations | Verification Required |
|---|---|---|
| Trusted | File reads, searches, grep, lint output, build output | None — results are deterministic and verifiable by output |
| Verify-after | Code edits, construction site updates, file writes, refactoring | Run the verification gate. |
| Never-trust | Convergence self-assessment, "all done" claims, severity classification, coverage claims, "0 issues found" reports | Human or orchestrator must independently evaluate the evidence |
Trusted¶
Operations whose output is deterministic and machine-verifiable. The tool either succeeds or fails visibly — there is no gray zone where the agent could misinterpret the result.
Examples: Read (file contents), Grep (search results), Glob (file
matches), go build exit code, golangci-lint output.
Verify-after¶
Operations that mutate state. The agent may believe it made the correct change, but the only proof is running the build and test suite afterward.
Examples: code edits, struct field additions (construction site updates), file creation, multi-file refactoring.
Required verification: Run the verification gate after any Verify-after operation.
Never-trust¶
Subjective assessments where the agent's self-report has been empirically unreliable. These require independent verification by a human or by an orchestrator using different evidence than the agent's claim.
Examples: "all construction sites updated," "0 CRITICAL issues," "review converged," "tests cover all contracts," "coverage is complete."
Known Failure Modes¶
Each failure mode below was discovered in a real PR. The tier system exists because these failures occurred.
FM-1: Construction site misses (during #381 implementation)¶
Tier violated: Never-trust (the completeness claim was trusted without verification)
What happened: During SimConfig decomposition (#381 implementation), a sub-agent reported "all construction sites updated" for a struct field addition. Two construction sites were missed, causing silent field-zero bugs. The operation itself (code edits) is Verify-after, but the agent's completeness claim ("all sites updated") is Never-trust.
Lesson: Completeness claims about Verify-after operations are Never-trust.
Always grep 'StructName{' after the agent claims completion. See also R4.
FM-2: Severity inflation/deflation (during #390 review)¶
Tier violated: Never-trust (treated as Trusted)
What happened: During a convergence review of #390 (hypothesis batch PR), the reviewing agent reported "0 CRITICAL, 0 IMPORTANT" when the artifact actually had 3 CRITICAL and 18 IMPORTANT issues. The team lead accepted the self-report without independently reading the review output.
Lesson: Convergence self-assessment is a Never-trust operation. The orchestrator must independently tally severity counts from the raw review output, never from the agent's summary.
FM-3: Premature convergence claim (#430)¶
Tier violated: Never-trust (treated as Trusted)
What happened: During a convergence review, the agent reported convergence after a single round without re-running the review to verify that fixes actually resolved the issues. The team lead accepted the claim.
Lesson: "Review converged" is a Never-trust claim. Convergence requires evidence: a clean round with zero CRITICAL and zero IMPORTANT findings across all perspectives. The orchestrator must verify the round ran and produced clean results. See the convergence protocol.
Relationship to Other Standards¶
- Antipattern rules (rules.md): R4 (construction site audit) is the specific rule that FM-1 violates. The trust tiers provide the meta-framework for when to apply verification.
- PR workflow (pr-workflow.md): The verification gate in Step 4.5 is the procedural implementation of Verify-after tier requirements.
- Convergence protocol (convergence.md): The convergence protocol's round-based evidence requirement is the procedural implementation of Never-trust tier requirements for review claims.