source
Testing
The public tree includes curated Python and Go runtime code under `python/`
The public tree includes curated Python and Go runtime code under python/
and go/. GitHub Actions now covers runtime tests, repository hygiene,
structured-file parsing, link checks, secret scanning, and CodeQL.
Do not claim broader coverage than the workflows provide. If a feature needs a manual smoke test, list the exact command and the observed result in the PR.
What Runs Today
Five GitHub Actions workflows. Most run on push to dev/main and on every
pull request; link-check runs on PRs and a weekly cron only.
secret-scan — gitleaks + forbidden-term gate
/.github/workflows/secret-scan.yml
- gitleaks scans the full git history (
fetch-depth: 0) for secrets — API keys, tokens, private key material. Pinned to commit SHAff98106e.... - forbidden-terms is a custom
grep -RInEjob. The configured pattern is defined inline in/.github/workflows/secret-scan.yml— read the workflow file for the authoritative regex (this page deliberately doesn’t reproduce the pattern, because doing so would self-trip the gate). The pattern targets a small set of historical-internal references the repo cannot leak. Excludes.github/,.git/,artifacts/. Includes Markdown, YAML, JSON, asciinema casts, TOML, Python, Go, shell,.gitignore,.env*,Dockerfile*,Makefile*.
link-check — lychee on Markdown links
/.github/workflows/link-check.yml
- Runs on PRs touching
**/*.mdand weekly via cron. Useslycheeverse/lychee-action@v2.8.0(commit-pinned). - Currently excludes one URL pattern that 404s for an unauthenticated checker:
security/advisories/new(the page requires being signed in to GitHub). The earlier Discussions-tab exclude was removed once Discussions was enabled on the repo.
validate-formats — JSON and YAML parsers
/.github/workflows/validate-formats.yml
- JSON job: every
.jsonfile (excluding.git,.claude,artifacts/) parses withpython3 -c "import json; json.load(...)". - YAML job: every
.yml/.yamlfile parses with PyYAML’ssafe_load_all(handles multi-document YAML). This workflow exists because a misplaced comma in a JSON schema or a stray indent in an issue-template YAML would otherwise sit broken silently. A Markdown-table heuristic was prototyped and removed because the false-positive rate was too high; use a real Markdown parser before adding that gate back.
codeql — CodeQL static analysis
- A pre-flight job (
detect-languages) checks whetherpython/orgo/carries source files. With the current dev tree, the matrix detects Python and Go and runs analysis per language. - Pinned to
github/codeql-action@ce64ddcb(commit-pinned;v3is an annotated tag whose tag-object is865f5f5c...and whose underlying commit isce64ddcb...). Same pin discipline as the rest of the workflow set. - Pairs with the
code_qualityruleset rule onmain: that rule reads from GitHub’s code-scanning alerts table, so it passes vacuously while the matrix is empty and substantively once code lands. The CI job name (codeql) is intentionally not in the required-status-checks list — the ruleset already gates merges via the alerts mechanism.
tests — Python and Go runtime tests
- Python job: installs
python/with dev extras and runspython -m pytest tests/ -q --tb=shortfrom thepython/directory on Python 3.10 and Python 3.13. - Go job: runs
go test -count=1 ./...andgo vet ./...fromgo/.
What’s Not Enforced By CI Today
Honest list, so the gap is visible:
- No content-fact verification (article claims, ADR cross-references) — caught only by review rounds and the cool-off re-read in the
dev → mainPR template. - No Markdown lint —
markdownlintadds noise we don’t want yet, and the earlier table-pipe heuristic was removed. - No YAML link-check (the issue-template
config.ymlURLs are not under**/*.md). - No spelling.
- No external link-check on YAML or
.castfiles.
Local Development Setup
# First-run setup — Python 3.13 required
cd /path/to/ardur/python
python3.13 -m venv .venv
.venv/bin/pip install -e '.[dev]'
# Run the curated test suite
.venv/bin/pytest tests/ -q
# Run a specific module
.venv/bin/pytest tests/test_passport.py -v
# End-to-end reproduce (Z3 proofs, signed proof bundle, corpus consistency)
make reproduce
Module-Specific Gotchas
test_mission_binding.py: one xfail (test_tampered_md_returns_chain_invalid) due to module-levelurllib.request.urlopenstate leak — runs green in isolation. CI invokes it as a separatepytestcall.test_biscuit_passport.py: requiresbiscuit-python==0.4.0. ABI breaks on 0.5+ and on Python 3.14.- Live LLM tests: tests under the semantic-judge / behavioral-fingerprint lanes need API access. Default test runs use test doubles; live runs require explicit env vars (
ARDUR_SEMANTIC_JUDGE=anthropic+ANTHROPIC_API_KEY).
Go AAT Test Suite
The go/pkg/aat package has 49 tests covering the full AAT specification:
cd go && go test ./pkg/aat/... -v
Covers: all 13 constraint Check/Subsumes functions, IssueRoot validation, DeriveChild depth/TTL/capability enforcement, BuildPoPJWT/VerifyPoPJWT round-trips, full §7 chain verification scenarios, and Registry operations.
Cloud Model Governance Tests
Real-world integration tests proving governance proxy enforcement with live
LLMs. Results are in python/tests/test-results/.
ARDUR_OLLAMA_API_KEY="<key>" python tests/run_cloud_model_test.py <model_name>
These are not CI-gated tests (they require live API access) but serve as integration proof that the proxy evaluates every tool call correctly with production models.
Ardur Personal And Claude Code RC
When touching the Hub, browser adapter, Claude Code hook, or ARDUR.md
profile setup, run:
PYTHONPATH=python python -m pytest -q \
python/tests/test_claude_code_hook.py \
python/tests/test_claude_code_telemetry.py \
python/tests/test_ardur_personal_hub.py \
python/tests/test_ardur_profile.py
PYTHONPATH=python python plugins/claude-code/scripts/smoke.py
claude plugin validate plugins/claude-code
node --check examples/ardur-personal-extension/src/service_worker.js
node --check examples/ardur-personal-extension/src/content_script.js
node --check examples/ardur-personal-extension/src/popup.js
node examples/ardur-personal-extension/scripts/auth-header-smoke.mjs
The Hub test confirms browser observations produce standard Ardur Execution
Receipts through GovernanceProxy, CLI policy can block a controllable command,
the export path includes Session Reviews, and authenticated Hub endpoints reject
untrusted browser-origin requests.
Coverage Targets
| Surface | Minimum coverage | Source of bar |
|---|---|---|
python/vibap/ | 80% | runtime package |
python/cli/ (when imported) | 60% | command surfaces |
python/integrations/<framework>/ (when imported) | 70% | public adapters |
Coverage runs against the renamed Ardur runtime only; legacy-era results are archived under artifacts/legacy-era-*/ for lineage but never count for gates.
Test-authoring rules (carry-over from private research, applies to all phases)
- No rigged adapters. Labels come from a separate file derived from public dataset labels. Adapters never see the ground truth. Violations are the single fastest way to get a benchmark retracted; the discipline that produced this rule is documented in the test-harness contract at the top of
python/tests/conftest.py. - Regression tests for every bug fix. If you fix bug X, write a test that fails on the pre-fix code and passes on the fixed code. The test goes in the same PR as the fix.
- Name tests after what they prove, not what they exercise.
test_passport_with_invalid_sig_is_rejectedbeatstest_verify_passport_case_3. - Avoid live-LLM tests by default. Unit suites run with local test doubles; live-LLM paths are explicit opt-in via env var. CI doesn’t burn API budget on every push.
Before claiming “tests pass”
For docs/config-only changes: run a pre-commit local sweep — JSON parse over all .json files, YAML parse over all .yml / .yaml files, plus the forbidden-term grep. The exact grep invocation lives in /.github/workflows/secret-scan.yml
; copy the include list, exclude list, and pattern string from there to run it locally:
# substitute <PATTERN> with the literal regex from secret-scan.yml's
# `PATTERN='...'` line; if you embed the pattern in this file the
# forbidden-term gate self-trips.
grep -RInE \
--include='*.md' --include='*.yml' --include='*.yaml' \
--include='*.json' --include='*.cast' --include='*.toml' \
--include='*.py' --include='*.go' --include='*.sh' \
--include='.gitignore' --include='.env*' \
--include='Dockerfile*' --include='Makefile*' \
--exclude-dir='.git' --exclude-dir='artifacts' \
--exclude-dir='.github' \
'<PATTERN>' .
For runtime changes:
- Exit code 0 on the full pytest suite
- Exit code 0 on the relevant Go build/test command when touching
go/ - Known-failing / known-collecting-error count has not grown
- No
xfailflipped to pass-or-fail without an explicit reason - The pytest summary line (
N passed, M skipped, K xfailed) pasted into the commit body so a reviewer can see the delta vs the known baseline without re-running - When touching the Claude Code hook plugin, run
PYTHONPATH=python python3 -m pytest python/tests/test_claude_code_hook.py python/tests/test_claude_code_telemetry.py python/tests/test_ardur_profile.py -q. Also run the end-to-end hook smoke:PYTHONPATH=python python3 plugins/claude-code/scripts/smoke.py(expectsPASS:output and exit 0). Validate the current Claude Code plugin package:claude plugin validate plugins/claude-code. Live-binary smoke against an actual Claude Code session is optional and not gated in CI because it requires a Claude Code install.
Why this page exists
Public security-software repos that fail their own CI on the first PR every time train contributors not to trust the gates. This page keeps the automated and manual checks explicit so release claims stay tied to evidence.