source
Ardur MVP Evaluator Guide
Quickstart guide for evaluating Ardur — the runtime governance and evidence
Quickstart guide for evaluating Ardur — the runtime governance and evidence layer for AI agents.
30-Second Sanity Check
git clone https://github.com/ArdurAI/ardur.git && cd ardur
make demo
Wait for both services to report healthy (docker compose ps shows healthy),
then:
curl -k https://localhost:8443/health
# → {"status": "ok", "version": "vibap.v0.1", "sessions": 0}
What You’re Looking At
┌──────────┐ ┌──────────────────┐ ┌──────────┐
│ Agent │────▶│ Ardur Proxy │────▶│ Tools │
│ (Claude, │ │ (port 8443) │ │ (APIs, │
│ LangChn)│ │ │ │ cmds) │
└──────────┘ │ ┌─────────────┐ │ └──────────┘
│ │Policy Engine│ │
│ │(Cedar/Nativ)│ │
│ └─────────────┘ │
│ ┌─────────────┐ │
│ │Receipt Chain│ │
│ └─────────────┘ │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Personal Hub │
│ (port 8765) │
└──────────────────┘
The proxy sits between the agent and its tools, evaluates every tool call against declared policy, and emits hash-chained receipts proving what was allowed, denied, or unknown.
Walkthrough: Session Lifecycle
1. Start the proxy with a mission
In one terminal:
make demo
2. Issue a mission passport
TOKEN=$(curl -sk https://localhost:8443/issue \
-H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
-H "Content-Type: application/json" \
-d '{"agent_id":"demo-agent","mission":"evaluate the governance proxy","allowed_tools":["Read","Bash","WebSearch"],"max_tool_calls":10}')
echo $TOKEN | python3 -c "import sys,json;print(json.loads(sys.stdin.read())['token'])" > /tmp/passport.jwt
Or use the CLI directly:
ardur issue --agent-id demo-agent \
--mission "evaluate the governance proxy" \
--allowed-tools Read Bash WebSearch \
--max-tool-calls 10 \
> /tmp/passport.json
PASSPORT=$(python3 -c "import json;print(json.load(open('/tmp/passport.json'))['token'])")
3. Start a session
curl -sk https://localhost:8443/session/start \
-H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
-H "Content-Type: application/json" \
-d "{\"token\":\"$PASSPORT\"}"
# → {"session_id":"...","agent_id":"demo-agent","status":"active"}
Capture the session_id from the response.
4. Evaluate a tool call
curl -sk https://localhost:8443/evaluate \
-H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
-H "Content-Type: application/json" \
-d "{\"session_id\":\"SESSION_ID\",\"tool\":\"Read\",\"resource\":\"/tmp/test.txt\",\"action\":\"read\"}"
# → {"decision":"allow",...} or {"decision":"deny","reason":"..."}
5. Evaluate a forbidden tool call
curl -sk https://localhost:8443/evaluate \
-H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
-H "Content-Type: application/json" \
-d "{\"session_id\":\"SESSION_ID\",\"tool\":\"WebFetch\",\"resource\":\"https://evil.com\",\"action\":\"fetch\"}"
# → {"decision":"deny","reason":"tool not in allowed_tools"}
6. Attest the session
curl -sk https://localhost:8443/attest \
-H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
-H "Content-Type: application/json" \
-d "{\"session_id\":\"SESSION_ID\"}"
# → {"attestation":"eyJh...","receipt_count":2,...}
7. End the session
curl -sk https://localhost:8443/session/end \
-H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
-H "Content-Type: application/json" \
-d "{\"session_id\":\"SESSION_ID\"}"
# → {"status":"closed","receipt_count":2}
What’s Being Proven
Each receipt is cryptographically linked to its predecessor via a parent hash:
Receipt 1 (session_start) Receipt 2 (evaluate)
┌─────────────────────┐ ┌─────────────────────┐
│ receipt_id: r1 │◀─────────│ parent_hash: sha(r1) │
│ parent_hash: null │ │ receipt_id: r2 │
│ digest: sha(...) │ │ verdict: allow │
└─────────────────────┘ └─────────────────────┘
This means:
- You can verify the entire chain independently
- No receipt can be inserted, removed, or reordered without detection
- The verifier needs only the public key — no trust in the proxy
Kill Switch Demo
# Activate the kill switch
ardur kill-switch --api-token "TOKEN"
# → {"kill_switch":"activated"}
# Try to evaluate — denied
curl -sk https://localhost:8443/evaluate \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{"session_id":"SESSION_ID","tool":"Read","resource":"/tmp/x","action":"read"}'
# → {"error":"kill_switch_active"}
# Deactivate
ardur kill-switch --deactivate --api-token "TOKEN"
# → {"kill_switch":"deactivated"}
Health endpoint and metrics remain available even when the kill switch is active, so monitoring is not disrupted.
Observability
# Prometheus metrics (requires auth)
curl -sk https://localhost:8443/metrics \
-H "Authorization: Bearer TOKEN"
# Structured access logs on stderr
docker compose logs proxy | head -5
# → {"timestamp":"2026-...","remote_addr":"...","method":"GET","path":"/health",...}
Known Gaps (honest disclosure)
- Capture boundary: Ardur governs at the tool-call level. Side effects below
the tool boundary (subprocess trees, kernel events, network connections from
tool-spawned processes) are not captured. Roadmap: v0.5 (Linux eBPF), v1.0
(macOS Endpoint Security Framework). See
docs/coverage-map.md. - No SPIRE in docker-compose: The local demo uses auto-generated TLS certs. SPIFFE/SPIRE workload identity is available in the Python runtime and Helm chart but requires a Kubernetes cluster.
- Go AAT package: The Go AAT engine is fully implemented with constraint
checks, subsumption, issuance/derivation, PoP binding, and full §7 chain
verification (49 tests). See
go/README.md. - Python Token Status List: Token Status List revocation checking is implemented in the Go credential verifier but not yet in Python.
- Single-user: No multi-tenancy isolation in the local demo. The Helm chart provides namespace-level isolation.