Receipts

Two receipts on this page. The Foundation v1 refactor moved the platform from a single-project consumer-Gmail-owned setup to a Workspace-rooted, multi-project, CMEK-encrypted, pause-by-default platform with per-domain Cloud Run services replacing the functions/main.py monolith. It substantially landed; what is and isn't live as of 2026-05-10 is below. The calibration pipeline ran from infrastructure-deploy through end-to-end trainer convergence on real PSL-derived data: 18 workflow smokes plus 5 manual trainer probes; 27 bugs across 8 categories; final probe converged the NumPyro hierarchical NUTS GRM at R-hat = 1.002, n_eff_min = 1,318, zero divergences in 9.0 seconds wallclock.

The full per-component receipts (the gated reports tier) carry the IAM bindings, the BigQuery DDL excerpts, the per-cluster controller logs, and the per-smoke ledger. This page is the public-facing summary.


Foundation v1 — what landed

The execution spec the refactor is built against is nexus_foundation_v1.md in the repo root (the F-3.x architectural commitments + S-4.x parallel security track). The receipts:

ComponentStatusWhere to verify
Workspace org neumatics.eu✅ livePre-existing; folder structure under infra/bootstrap/
Folder platform/ + shared-services/✅ liveinfra/bootstrap/, infra/projects/main.tf
Project neumatics-prod (workload)✅ liveinfra/projects/main.tf:34
Project neumatics-audit-logs (sealed sink)✅ liveinfra/projects/main.tf:48 + infra/audit/
Project neumatics-network-host (Shared VPC)✅ liveinfra/projects/main.tf:53
Project neumatics-staging⏳ deferredinfra/projects/main.tf:42 commented; billing-account project quota cap
Org-policy bundle (resourceLocations, disableServiceAccountKeyCreation, etc.)✅ liveinfra/org-policies/
Essential contacts → neumatics.eu domain✅ liveinfra/projects/main.tf:266
KMS keyring nexus-foundation (eu-west4) + 7 CMEK keys + 3 HSM keys✅ liveinfra/kms/
Per-resource CMEK applied (AlloyDB, BigQuery, GCS, Pub/Sub, etc.)✅ liveinfra/kms/ IAM bindings; resource-level kms_key_name references
Aggregated org log sink → neumatics-audit-logs BigQuery✅ liveinfra/audit/
GCS object-lock archive (10y retention)✅ liveinfra/audit/
Shared VPC + subnets (runtime / data / mgmt)✅ liveinfra/network/
VPC-SC perimeter (prod)✅ liveinfra/network/main.tf google_access_context_manager_service_perimeter.nexus_prod
VPC-SC perimeter (staging)⏳ deferredCommented as dryrun; depends on staging project
Private Service Connect (AlloyDB, googleapis)✅ liveinfra/network/, infra/alloydb/
AlloyDB regional cluster nexus-prod (CMEK, pgvector, columnar)✅ liveinfra/alloydb/cluster.tf
AlloyDB cost-control plane (controller + auto-pause + warming UX + operator CLI)✅ liveservices/nexus-alloydb-controller/, services/nexus-alloydb-auto-pause/, src/lib/alloydb-warming.ts, scripts/nexus-alloydb*.ps1
AlloyDB staging cluster⏳ deferredDepends on staging project
BigQuery datasets (calibration_corpus, warehouse, substrate, alloydb_cdc, synth_substrate, etc.)✅ liveinfra/bigquery/ — 7 datasets, all CMEK
Datastream AlloyDB → BigQuery CDC stream✅ liveinfra/datastream/ (private connection + connection profiles + stream)
Firebase Stream-Firestore-to-BigQuery extension⏳ pendingdocs/operations/firestore_extension.md runbook
Firestore region cutover (legacy → eu-west4 under CMEK)⏳ deferredsoulmap-v4 database name preserved; region cutover gated on dual-write shim
Per-domain Cloud Run services (24 deployed)✅ liveservices/nexus-*/ — see catalogue at architecture
Per-route cutover (BACKEND_ROUTING in src/lib/api-helpers.ts)✅ live, partial14 routes flipped to new; 7 routes still on legacy
Vertex AI Reasoning Engine redeploy in eu-west4⏳ pendingResource IDs in apphosting.yaml still on eu-west1
Knowledge Catalog tagging baseline⏳ pendingCatalog service deployed; tagging not yet applied
Workforce groups + WIF pool + custom roles✅ liveinfra/iam/
PAM mediator (per-grant delays per OD-19)✅ liveservices/nexus-pam-mediator/, infra/pam-mediator/
Audit alerter (Pub/Sub → email on curated events)✅ liveservices/nexus-audit-alerter/, infra/audit-alerter/
Workflows: iteration_runner, bq_inspect, iam_probe✅ liveworkflows/
Workflows: erasure-cascade.yaml, calibration-promote.yaml, cohort-freeze.yaml⏳ reservedPer F-3.9 / S-4.8; spec'd, not yet implemented
Operator playbook + access-pattern docs✅ livedocs/security/operator_playbook.md, docs/security/access_patterns.md, docs/security/claude_code_access.md, docs/security/incident_response.md
LinkML schema extension (vocabulary → marketplace + analytics + warehouse)✅ partialvocabulary/schema.linkml.yaml extended; schemas/foundation_v1.linkml.yaml added
Assured Workloads enrolment on neumatics-prod⏳ pendingPer OD-18 (resolved-optimistic 2026-05-08); operator-side confirmation queued

The headline: the foundation Stages 0–2 (org / projects / network / KMS / audit) are substantially live. Stage 3a (AlloyDB + cost-control plane) is live for prod and waits on the staging project for full parity. Stage 4 (compute) is live with 24 services deployed and per-route cutover in flight. Stages 5–7 (CDC / frontend / decommission) are partially live; the Firestore region cutover and Vertex Reasoning Engine redeploy are the remaining visible-to-the-user pieces.

The pause-by-default cost-control plane was structural, not optional: Stage 3a does not exit until both an idle-pause cycle and a fail-fast-503-retry path are integration-tested. Both have been demonstrated against the live prod cluster.

Calibration pipeline — what converged

Probe 5 R-hat
1.002

Convergence threshold R-hat < 1.01; we are well inside

Probe 5 n_eff_min
1,318

Healthy effective sample size across all parameters

Probe 5 divergences
0

No funnel-trap warnings from NumPyro NUTS

Probe 5 wallclock
9.0 s

On n2-highmem-16, 10 personas × 6 openness constructs

Probe 5 was the final shakedown probe. Sampler: NumPyro NUTS, four chains × 1,000 warmup × 1,000 samples, target_accept = 0.85, non-centered parameterisation on per-item log-discrimination. Inputs: ten personas drawn from the Phase-1 smoke library, six openness-family constructs scored from real PSL evaluator output (not synthetic responses). Outputs: six calibration_metrics rows plus three invariance rows MERGE'd into BigQuery; HDI95 credible intervals on every parameter; JSON-roundtrippable grm.json artifact written to GCS.

Probe 5 is the single most informative receipt in the shakedown. It demonstrates that the fully-implemented pipeline — workflow → shard worker → Vertex Batch → BQ MERGE → Custom Training → NumPyro NUTS → BQ MERGE — produces a converged psychometric calibration on data the model has never seen, in less time and less cost than any single shakedown smoke iteration.

The graded-response model itself is Samejima 1969; convergence at this scale is the small-N receipt that the pipeline runs end-to-end. Production-scale Phase-2 fits will be larger but no harder.


Bug taxonomy

Eight categories absorbed all twenty-seven bugs. Every fix landed either in the production code path or in the eight-check local QA harness that gates Phase-2 readiness.

CategoryCountCost (est.)Where the fix lives
Local toolchain1$0Operator-side; switched to gcloud-bundled Python
GCP IAM bindings4$0scripts/deploy/deploy_all.sh
Container build / requirements5~$0Dockerfile contexts + test_container_imports.py
Vertex AI quirks4~$50Worker config + workflow YAML + test_workflow_safety.py
Cloud Workflows YAML4~$3Inline workflow YAML + test_workflow_safety.py
BigQuery schemas5~$70Worker flatten_*_for_bq + test_bq_row_shape.py
Trainer logic3~$10Trainer + test_attribute_safety.py
LLM output quality1~$25maxOutputTokens=8192 cap

The largest cost was BigQuery schemas at ~$70. The iteration touched five sub-classes — STRING-vs-FLOAT64 autodetect drift, JSON dict-vs-string mismatch, JSON_EXTRACT ambiguity on JSON-typed columns, LIKE on JSON column, and namespace collision risk. All five are now caught statically in 60 seconds for $0 by test_bq_row_shape.py.

The second-largest was Vertex AI quirks at ~$50. The iteration discovered that gemini-3-flash-preview is /global/-only (smoke #8), that Vertex Batch enforces same-location for job + model (smoke #9), that the metadata field is rejected by the validator, and that Cloud Run Job auto-retries can spawn duplicate batches if upstream MERGE fails. All four are now baked into the worker config and workflow YAML.


What every smoke caught

The full per-smoke ledger lives at docs/shakedown_ledger.md and at the gated R3 infrastructure report. Compressed:

#LayerFix
1–4Local + IAMgcloud Python; roles/logging.logWriter; IAM-propagation lag wait; roles/run.developer
5–7ContainerDockerfile COPY path; image-digest refresh; first successful Vertex submit
8–9Vertex/global/-only; same-location for job + model
10WorkflowLRO connector default 1,800 s timeout → 7,200 s
11BQ schemaSTRING vs FLOAT64 autodetect drift; canonical schema explicit
12–13Workflow exprYAML colon-in-string; BQ connector body-wrapper drift
14Containerfirebase_admin transitive import via synthesis.profile; extracted dep-free synthesis/construct_mappings.py
15BQ JSONLIKE on JSON column; switched to TO_JSON_STRING(...)
16BQ JSON writeflatten_session_for_bq doing json.dumps() into JSON column; pass dict directly
17BQ JSON readJSON_EXTRACT(col, '$') ambiguous; switched to TO_JSON_STRING(col)
18Trainer containerMissing pydantic + networkx requirements
Probe 1Trainer SQLFamily-filter tuple-unpacking; same firebase_admin chain via synthesis.profile
Probe 2Trainer attrconformal_report.per_trait doesn't exist (real attribute is quantiles)
Probe 3Trainer attr (other)Same firebase_admin chain via five additional callers
Probe 4Trainer logicFinal attribute drift fixed; conformal + BBN + Gate 10 + BQ writes reached
Probe 5All 3 succeeded. R-hat = 1.002, n_eff = 1,318, 0 divergences

Phase-2 readiness signals

Five green, three open:

SignalStatus
End-to-end calibration trainer success on real PSL data (Probe 5)
8-check local QA harness covers every bug class hit in shakedown
Cohort-keyed calibration profile registry prevents smoke → prod config leak
Iteration namespace convention (0–9 smoke; ≥ 10 production) prevents BQ MERGE collisions
Cost engineering choices verified at small scale; forecasts re-derived against actual smoke spend
Foundation v1 substantially landed (workload projects, KMS, AlloyDB pause-by-default, services, VPC-SC, Datastream live)
Cloud Billing budget alerts (auto-pause drill)⏳ deferred — requires roles/billing.user
YourMorals.org DUA (parallel procurement; weeks lead time)

The two open items: the first resolves with a roles/billing.user grant plus a half-day drill; the second proceeds on its own external schedule. None blocks Phase-2 milestone-review approval.


What is not yet drilled

Honest framing of the four R5 robustness drills, per the robustness report:

  • Auto-pause end-to-end — Cloud Billing pathway: ❌ NOT EXECUTED. Infrastructure built (functions/budget_check/main.py), gated on roles/billing.user.
  • Mid-shard worker crash + resume — ❌ NOT EXECUTED. No CHAOS_FAULT branch in worker; max-retries-3 untested under deliberate failure.
  • Vertex 429 backoff under live quota pressure — ◐ PARTIAL. _RateLimiter fully built; shakedown never drove quota saturation.
  • Idempotency stress — ◐ PARTIAL. Smokes #11 and #16 incidentally exercised the BQ MERGE dedupe path; no deliberate 6-parallel-shards stress drill.

All four are on the Phase-2 hardening backlog. Two are blocked on external dependencies (roles/billing.user for #1; chaos-image build for #2); two can be drilled tomorrow if needed (#3, #4). Phase-2 corpus generation may incidentally exercise #3 even without the deliberate drill.

Foundation-side gaps from the table at the top of this page — Firestore region cutover, Vertex Reasoning Engine redeploy in eu-west4, Knowledge Catalog tagging baseline, the three reserved workflows, the staging project + cluster, Assured Workloads enrolment, the Firebase Stream-Firestore-to-BQ extension — are scheduled as the foundation refactor's later stages land. None blocks Phase-2 milestone-review approval; all are in the public roadmap at /about/science/roadmap.


What ships next

  1. Grant roles/billing.user to an operator account → configure Cloud Billing budget → run the auto-pause drill.
  2. Run idempotency stress drill (6 concurrent re-fires of the same (iteration, shard_index)).
  3. Run mid-shard crash drill once a CHAOS_FAULT-flagged worker image is built.
  4. Resolve the three BigQuery write-path divergences flagged in R7 (legacy tabledata.insertAll calls in pre-shakedown scaffolding).
  5. Cut iteration 10 — first Phase-2 production iteration; full corpus (2,000 personas × 50 sessions × 12 turns); all ten calibration families parallel.
  6. Provision neumatics-staging once the billing-account project quota raises; bring up the staging AlloyDB cluster + perimeter.
  7. Redeploy the three Vertex Reasoning Engines in eu-west4 inside neumatics-prod; flip the apphosting.yaml resource IDs.
  8. Install the Firebase Stream-Firestore-to-BQ extension; cut over from the legacy nightly export.

The pipeline is shaken down. The foundation is substantially landed. The receipts are filed. Phase-2 is unblocked on roles/billing.user plus milestone-review approval.