Ardur MVP Evaluator Guide | Ardur Evidence

Quickstart guide for evaluating Ardur — the runtime governance and evidence layer for AI agents.

30-Second Sanity Check

git clone https://github.com/ArdurAI/ardur.git && cd ardur
make demo

Wait for both services to report healthy (docker compose ps shows healthy), then:

curl -k https://localhost:8443/health
# → {"status": "ok", "version": "vibap.v0.1", "sessions": 0}

What You’re Looking At

┌──────────┐     ┌──────────────────┐     ┌──────────┐
│  Agent   │────▶│  Ardur Proxy     │────▶│  Tools   │
│ (Claude, │     │  (port 8443)     │     │  (APIs,  │
│  LangChn)│     │                  │     │   cmds)  │
└──────────┘     │  ┌─────────────┐ │     └──────────┘
                 │  │Policy Engine│ │
                 │  │(Cedar/Nativ)│ │
                 │  └─────────────┘ │
                 │  ┌─────────────┐ │
                 │  │Receipt Chain│ │
                 │  └─────────────┘ │
                 └────────┬─────────┘
                          │
                 ┌────────▼─────────┐
                 │  Personal Hub    │
                 │  (port 8765)     │
                 └──────────────────┘

The proxy sits between the agent and its tools, evaluates every tool call against declared policy, and emits hash-chained receipts proving what was allowed, denied, or unknown.

Walkthrough: Session Lifecycle

1. Start the proxy with a mission

In one terminal:

make demo

2. Issue a mission passport

TOKEN=$(curl -sk https://localhost:8443/issue \
  -H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"demo-agent","mission":"evaluate the governance proxy","allowed_tools":["Read","Bash","WebSearch"],"max_tool_calls":10}')
echo $TOKEN | python3 -c "import sys,json;print(json.loads(sys.stdin.read())['token'])" > /tmp/passport.jwt

Or use the CLI directly:

ardur issue --agent-id demo-agent \
  --mission "evaluate the governance proxy" \
  --allowed-tools Read Bash WebSearch \
  --max-tool-calls 10 \
  > /tmp/passport.json
PASSPORT=$(python3 -c "import json;print(json.load(open('/tmp/passport.json'))['token'])")

3. Start a session

curl -sk https://localhost:8443/session/start \
  -H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
  -H "Content-Type: application/json" \
  -d "{\"token\":\"$PASSPORT\"}"
# → {"session_id":"...","agent_id":"demo-agent","status":"active"}

Capture the session_id from the response.

4. Evaluate a tool call

curl -sk https://localhost:8443/evaluate \
  -H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
  -H "Content-Type: application/json" \
  -d "{\"session_id\":\"SESSION_ID\",\"tool\":\"Read\",\"resource\":\"/tmp/test.txt\",\"action\":\"read\"}"
# → {"decision":"allow",...} or {"decision":"deny","reason":"..."}

5. Evaluate a forbidden tool call

curl -sk https://localhost:8443/evaluate \
  -H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
  -H "Content-Type: application/json" \
  -d "{\"session_id\":\"SESSION_ID\",\"tool\":\"WebFetch\",\"resource\":\"https://evil.com\",\"action\":\"fetch\"}"
# → {"decision":"deny","reason":"tool not in allowed_tools"}

6. Attest the session

curl -sk https://localhost:8443/attest \
  -H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
  -H "Content-Type: application/json" \
  -d "{\"session_id\":\"SESSION_ID\"}"
# → {"attestation":"eyJh...","receipt_count":2,...}

7. End the session

curl -sk https://localhost:8443/session/end \
  -H "Authorization: Bearer $(docker compose exec proxy printenv ARDUR_API_TOKEN)" \
  -H "Content-Type: application/json" \
  -d "{\"session_id\":\"SESSION_ID\"}"
# → {"status":"closed","receipt_count":2}

What’s Being Proven

Each receipt is cryptographically linked to its predecessor via a parent hash:

Receipt 1 (session_start)           Receipt 2 (evaluate)
┌─────────────────────┐           ┌─────────────────────┐
│ receipt_id: r1      │◀─────────│ parent_hash: sha(r1) │
│ parent_hash: null   │          │ receipt_id: r2       │
│ digest: sha(...)  │           │ verdict: allow       │
└─────────────────────┘           └─────────────────────┘

This means:

You can verify the entire chain independently
No receipt can be inserted, removed, or reordered without detection
The verifier needs only the public key — no trust in the proxy

Kill Switch Demo

# Activate the kill switch
ardur kill-switch --api-token "TOKEN"
# → {"kill_switch":"activated"}

# Try to evaluate — denied
curl -sk https://localhost:8443/evaluate \
  -H "Authorization: Bearer TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"session_id":"SESSION_ID","tool":"Read","resource":"/tmp/x","action":"read"}'
# → {"error":"kill_switch_active"}

# Deactivate
ardur kill-switch --deactivate --api-token "TOKEN"
# → {"kill_switch":"deactivated"}

Health endpoint and metrics remain available even when the kill switch is active, so monitoring is not disrupted.

Observability

# Prometheus metrics (requires auth)
curl -sk https://localhost:8443/metrics \
  -H "Authorization: Bearer TOKEN"

# Structured access logs on stderr
docker compose logs proxy | head -5
# → {"timestamp":"2026-...","remote_addr":"...","method":"GET","path":"/health",...}

Known Gaps (honest disclosure)

Capture boundary: Ardur governs at the tool-call level. Side effects below the tool boundary (subprocess trees, kernel events, network connections from tool-spawned processes) are not captured. Roadmap: v0.5 (Linux eBPF), v1.0 (macOS Endpoint Security Framework). See docs/coverage-map.md.
No SPIRE in docker-compose: The local demo uses auto-generated TLS certs. SPIFFE/SPIRE workload identity is available in the Python runtime and Helm chart but requires a Kubernetes cluster.
Go AAT package: The Go AAT engine is fully implemented with constraint checks, subsumption, issuance/derivation, PoP binding, and full §7 chain verification (49 tests). See go/README.md.
Python Token Status List: Token Status List revocation checking is implemented in the Go credential verifier but not yet in Python.
Single-user: No multi-tenancy isolation in the local demo. The Helm chart provides namespace-level isolation.