Receipts
Two receipts on this page. The Foundation v1 refactor moved the platform from a single-project consumer-Gmail-owned setup to a Workspace-rooted, multi-project, CMEK-encrypted, pause-by-default platform with per-domain Cloud Run services replacing the functions/main.py monolith. It substantially landed; what is and isn't live as of 2026-05-10 is below. The calibration pipeline ran from infrastructure-deploy through end-to-end trainer convergence on real PSL-derived data: 18 workflow smokes plus 5 manual trainer probes; 27 bugs across 8 categories; final probe converged the NumPyro hierarchical NUTS GRM at R-hat = 1.002, n_eff_min = 1,318, zero divergences in 9.0 seconds wallclock.
The full per-component receipts (the gated reports tier) carry the IAM bindings, the BigQuery DDL excerpts, the per-cluster controller logs, and the per-smoke ledger. This page is the public-facing summary.
Foundation v1 — what landed
The execution spec the refactor is built against is nexus_foundation_v1.md in the repo root (the F-3.x architectural commitments + S-4.x parallel security track). The receipts:
| Component | Status | Where to verify |
|---|---|---|
Workspace org neumatics.eu | ✅ live | Pre-existing; folder structure under infra/bootstrap/ |
Folder platform/ + shared-services/ | ✅ live | infra/bootstrap/, infra/projects/main.tf |
Project neumatics-prod (workload) | ✅ live | infra/projects/main.tf:34 |
Project neumatics-audit-logs (sealed sink) | ✅ live | infra/projects/main.tf:48 + infra/audit/ |
Project neumatics-network-host (Shared VPC) | ✅ live | infra/projects/main.tf:53 |
Project neumatics-staging | ⏳ deferred | infra/projects/main.tf:42 commented; billing-account project quota cap |
| Org-policy bundle (resourceLocations, disableServiceAccountKeyCreation, etc.) | ✅ live | infra/org-policies/ |
Essential contacts → neumatics.eu domain | ✅ live | infra/projects/main.tf:266 |
KMS keyring nexus-foundation (eu-west4) + 7 CMEK keys + 3 HSM keys | ✅ live | infra/kms/ |
| Per-resource CMEK applied (AlloyDB, BigQuery, GCS, Pub/Sub, etc.) | ✅ live | infra/kms/ IAM bindings; resource-level kms_key_name references |
Aggregated org log sink → neumatics-audit-logs BigQuery | ✅ live | infra/audit/ |
| GCS object-lock archive (10y retention) | ✅ live | infra/audit/ |
| Shared VPC + subnets (runtime / data / mgmt) | ✅ live | infra/network/ |
| VPC-SC perimeter (prod) | ✅ live | infra/network/main.tf google_access_context_manager_service_perimeter.nexus_prod |
| VPC-SC perimeter (staging) | ⏳ deferred | Commented as dryrun; depends on staging project |
| Private Service Connect (AlloyDB, googleapis) | ✅ live | infra/network/, infra/alloydb/ |
AlloyDB regional cluster nexus-prod (CMEK, pgvector, columnar) | ✅ live | infra/alloydb/cluster.tf |
| AlloyDB cost-control plane (controller + auto-pause + warming UX + operator CLI) | ✅ live | services/nexus-alloydb-controller/, services/nexus-alloydb-auto-pause/, src/lib/alloydb-warming.ts, scripts/nexus-alloydb*.ps1 |
| AlloyDB staging cluster | ⏳ deferred | Depends on staging project |
| BigQuery datasets (calibration_corpus, warehouse, substrate, alloydb_cdc, synth_substrate, etc.) | ✅ live | infra/bigquery/ — 7 datasets, all CMEK |
| Datastream AlloyDB → BigQuery CDC stream | ✅ live | infra/datastream/ (private connection + connection profiles + stream) |
| Firebase Stream-Firestore-to-BigQuery extension | ⏳ pending | docs/operations/firestore_extension.md runbook |
| Firestore region cutover (legacy → eu-west4 under CMEK) | ⏳ deferred | soulmap-v4 database name preserved; region cutover gated on dual-write shim |
| Per-domain Cloud Run services (24 deployed) | ✅ live | services/nexus-*/ — see catalogue at architecture |
Per-route cutover (BACKEND_ROUTING in src/lib/api-helpers.ts) | ✅ live, partial | 14 routes flipped to new; 7 routes still on legacy |
| Vertex AI Reasoning Engine redeploy in eu-west4 | ⏳ pending | Resource IDs in apphosting.yaml still on eu-west1 |
| Knowledge Catalog tagging baseline | ⏳ pending | Catalog service deployed; tagging not yet applied |
| Workforce groups + WIF pool + custom roles | ✅ live | infra/iam/ |
| PAM mediator (per-grant delays per OD-19) | ✅ live | services/nexus-pam-mediator/, infra/pam-mediator/ |
| Audit alerter (Pub/Sub → email on curated events) | ✅ live | services/nexus-audit-alerter/, infra/audit-alerter/ |
Workflows: iteration_runner, bq_inspect, iam_probe | ✅ live | workflows/ |
Workflows: erasure-cascade.yaml, calibration-promote.yaml, cohort-freeze.yaml | ⏳ reserved | Per F-3.9 / S-4.8; spec'd, not yet implemented |
| Operator playbook + access-pattern docs | ✅ live | docs/security/operator_playbook.md, docs/security/access_patterns.md, docs/security/claude_code_access.md, docs/security/incident_response.md |
| LinkML schema extension (vocabulary → marketplace + analytics + warehouse) | ✅ partial | vocabulary/schema.linkml.yaml extended; schemas/foundation_v1.linkml.yaml added |
Assured Workloads enrolment on neumatics-prod | ⏳ pending | Per OD-18 (resolved-optimistic 2026-05-08); operator-side confirmation queued |
The headline: the foundation Stages 0–2 (org / projects / network / KMS / audit) are substantially live. Stage 3a (AlloyDB + cost-control plane) is live for prod and waits on the staging project for full parity. Stage 4 (compute) is live with 24 services deployed and per-route cutover in flight. Stages 5–7 (CDC / frontend / decommission) are partially live; the Firestore region cutover and Vertex Reasoning Engine redeploy are the remaining visible-to-the-user pieces.
The pause-by-default cost-control plane was structural, not optional: Stage 3a does not exit until both an idle-pause cycle and a fail-fast-503-retry path are integration-tested. Both have been demonstrated against the live prod cluster.Calibration pipeline — what converged
Convergence threshold R-hat < 1.01; we are well inside
Healthy effective sample size across all parameters
No funnel-trap warnings from NumPyro NUTS
On n2-highmem-16, 10 personas × 6 openness constructs
Probe 5 was the final shakedown probe. Sampler: NumPyro NUTS, four chains × 1,000 warmup × 1,000 samples, target_accept = 0.85, non-centered parameterisation on per-item log-discrimination. Inputs: ten personas drawn from the Phase-1 smoke library, six openness-family constructs scored from real PSL evaluator output (not synthetic responses). Outputs: six calibration_metrics rows plus three invariance rows MERGE'd into BigQuery; HDI95 credible intervals on every parameter; JSON-roundtrippable grm.json artifact written to GCS.
Probe 5 is the single most informative receipt in the shakedown. It demonstrates that the fully-implemented pipeline — workflow → shard worker → Vertex Batch → BQ MERGE → Custom Training → NumPyro NUTS → BQ MERGE — produces a converged psychometric calibration on data the model has never seen, in less time and less cost than any single shakedown smoke iteration.
The graded-response model itself is Samejima 1969; convergence at this scale is the small-N receipt that the pipeline runs end-to-end. Production-scale Phase-2 fits will be larger but no harder.
Bug taxonomy
Eight categories absorbed all twenty-seven bugs. Every fix landed either in the production code path or in the eight-check local QA harness that gates Phase-2 readiness.
| Category | Count | Cost (est.) | Where the fix lives |
|---|---|---|---|
| Local toolchain | 1 | $0 | Operator-side; switched to gcloud-bundled Python |
| GCP IAM bindings | 4 | $0 | scripts/deploy/deploy_all.sh |
| Container build / requirements | 5 | ~$0 | Dockerfile contexts + test_container_imports.py |
| Vertex AI quirks | 4 | ~$50 | Worker config + workflow YAML + test_workflow_safety.py |
| Cloud Workflows YAML | 4 | ~$3 | Inline workflow YAML + test_workflow_safety.py |
| BigQuery schemas | 5 | ~$70 | Worker flatten_*_for_bq + test_bq_row_shape.py |
| Trainer logic | 3 | ~$10 | Trainer + test_attribute_safety.py |
| LLM output quality | 1 | ~$25 | maxOutputTokens=8192 cap |
The largest cost was BigQuery schemas at ~$70. The iteration touched five sub-classes — STRING-vs-FLOAT64 autodetect drift, JSON dict-vs-string mismatch, JSON_EXTRACT ambiguity on JSON-typed columns, LIKE on JSON column, and namespace collision risk. All five are now caught statically in 60 seconds for $0 by test_bq_row_shape.py.
The second-largest was Vertex AI quirks at ~$50. The iteration discovered that gemini-3-flash-preview is /global/-only (smoke #8), that Vertex Batch enforces same-location for job + model (smoke #9), that the metadata field is rejected by the validator, and that Cloud Run Job auto-retries can spawn duplicate batches if upstream MERGE fails. All four are now baked into the worker config and workflow YAML.
What every smoke caught
The full per-smoke ledger lives at docs/shakedown_ledger.md and at the gated R3 infrastructure report. Compressed:
| # | Layer | Fix |
|---|---|---|
| 1–4 | Local + IAM | gcloud Python; roles/logging.logWriter; IAM-propagation lag wait; roles/run.developer |
| 5–7 | Container | Dockerfile COPY path; image-digest refresh; first successful Vertex submit |
| 8–9 | Vertex | /global/-only; same-location for job + model |
| 10 | Workflow | LRO connector default 1,800 s timeout → 7,200 s |
| 11 | BQ schema | STRING vs FLOAT64 autodetect drift; canonical schema explicit |
| 12–13 | Workflow expr | YAML colon-in-string; BQ connector body-wrapper drift |
| 14 | Container | firebase_admin transitive import via synthesis.profile; extracted dep-free synthesis/construct_mappings.py |
| 15 | BQ JSON | LIKE on JSON column; switched to TO_JSON_STRING(...) |
| 16 | BQ JSON write | flatten_session_for_bq doing json.dumps() into JSON column; pass dict directly |
| 17 | BQ JSON read | JSON_EXTRACT(col, '$') ambiguous; switched to TO_JSON_STRING(col) |
| 18 | Trainer container | Missing pydantic + networkx requirements |
| Probe 1 | Trainer SQL | Family-filter tuple-unpacking; same firebase_admin chain via synthesis.profile |
| Probe 2 | Trainer attr | conformal_report.per_trait doesn't exist (real attribute is quantiles) |
| Probe 3 | Trainer attr (other) | Same firebase_admin chain via five additional callers |
| Probe 4 | Trainer logic | Final attribute drift fixed; conformal + BBN + Gate 10 + BQ writes reached |
| Probe 5 | — | All 3 succeeded. R-hat = 1.002, n_eff = 1,318, 0 divergences |
Phase-2 readiness signals
Five green, three open:
| Signal | Status |
|---|---|
| End-to-end calibration trainer success on real PSL data (Probe 5) | ✅ |
| 8-check local QA harness covers every bug class hit in shakedown | ✅ |
| Cohort-keyed calibration profile registry prevents smoke → prod config leak | ✅ |
| Iteration namespace convention (0–9 smoke; ≥ 10 production) prevents BQ MERGE collisions | ✅ |
| Cost engineering choices verified at small scale; forecasts re-derived against actual smoke spend | ✅ |
| Foundation v1 substantially landed (workload projects, KMS, AlloyDB pause-by-default, services, VPC-SC, Datastream live) | ✅ |
| Cloud Billing budget alerts (auto-pause drill) | ⏳ deferred — requires roles/billing.user |
| YourMorals.org DUA (parallel procurement; weeks lead time) | ⏳ |
The two open items: the first resolves with a roles/billing.user grant plus a half-day drill; the second proceeds on its own external schedule. None blocks Phase-2 milestone-review approval.
What is not yet drilled
Honest framing of the four R5 robustness drills, per the robustness report:
- Auto-pause end-to-end — Cloud Billing pathway: ❌ NOT EXECUTED. Infrastructure built (
functions/budget_check/main.py), gated onroles/billing.user. - Mid-shard worker crash + resume — ❌ NOT EXECUTED. No
CHAOS_FAULTbranch in worker; max-retries-3 untested under deliberate failure. - Vertex 429 backoff under live quota pressure — ◐ PARTIAL.
_RateLimiterfully built; shakedown never drove quota saturation. - Idempotency stress — ◐ PARTIAL. Smokes #11 and #16 incidentally exercised the BQ MERGE dedupe path; no deliberate 6-parallel-shards stress drill.
All four are on the Phase-2 hardening backlog. Two are blocked on external dependencies (roles/billing.user for #1; chaos-image build for #2); two can be drilled tomorrow if needed (#3, #4). Phase-2 corpus generation may incidentally exercise #3 even without the deliberate drill.
Foundation-side gaps from the table at the top of this page — Firestore region cutover, Vertex Reasoning Engine redeploy in eu-west4, Knowledge Catalog tagging baseline, the three reserved workflows, the staging project + cluster, Assured Workloads enrolment, the Firebase Stream-Firestore-to-BQ extension — are scheduled as the foundation refactor's later stages land. None blocks Phase-2 milestone-review approval; all are in the public roadmap at /about/science/roadmap.
What ships next
- Grant
roles/billing.userto an operator account → configure Cloud Billing budget → run the auto-pause drill. - Run idempotency stress drill (6 concurrent re-fires of the same
(iteration, shard_index)). - Run mid-shard crash drill once a
CHAOS_FAULT-flagged worker image is built. - Resolve the three BigQuery write-path divergences flagged in R7 (legacy
tabledata.insertAllcalls in pre-shakedown scaffolding). - Cut iteration 10 — first Phase-2 production iteration; full corpus (2,000 personas × 50 sessions × 12 turns); all ten calibration families parallel.
- Provision
neumatics-stagingonce the billing-account project quota raises; bring up the staging AlloyDB cluster + perimeter. - Redeploy the three Vertex Reasoning Engines in eu-west4 inside
neumatics-prod; flip theapphosting.yamlresource IDs. - Install the Firebase Stream-Firestore-to-BQ extension; cut over from the legacy nightly export.
The pipeline is shaken down. The foundation is substantially landed. The receipts are filed. Phase-2 is unblocked on roles/billing.user plus milestone-review approval.