Reference

Methodology

The pipeline — prompts, rules, weights, thresholds, reliability coefficients, invariance tests, persona validity reports, factory iteration logs — is peer-reviewable and open. The raw dataset is closed to external third parties.

Two registries · one source of truth

Vocabulary registry

functions/vocabulary/registry.yaml — every atomic predicate, observation, situation type, metric threshold, construct, and composite. LinkML-authored schema; CI generates Pydantic + JSON Schema + OWL + BigQuery DDL + OpenAPI from one source. Three-gate validation (pre-commit, CI, runtime fail-fast). Semantic versioning per atom.

Feature registry

nexus_feature_registry — every field stored in Firestore + BigQuery. Carries layer (Context / State / Trait), tier (A / B / C), retention, consent scope, invariance grouping, presented_as, and instrument citation. Generates DDL + API types + Pydantic via Cloud Build.

Reasoning substrate

The engine is deterministic, formally characterized, and embedding-free on the construct-scoring path:

Probabilistic Soft Logic (PSL)
Replaces ad-hoc confidence arithmetic with a formally characterized calculus over rule annotations. Determinstic, reproducible.
Item Response Theory (Graded Response Model)
Per-trait aggregation (Samejima 1969) instead of weighted sum. Quarterly re-fit of item parameters in the factory.
Conformal Prediction
Distribution-free 95% confidence intervals with empirical coverage held in [94%, 96%]. Reported as a CI on every score.
Bayesian belief network
Cross-construct propagation derived from the vocabulary `consumes` field. Quarterly CPT re-fit.

Embeddings are admitted at atom-level affective and intent detection only — a frozen, MIT-licensed BGE-small-en-v1.5 with author-curated seed-anchor sets, served from a Cloud Run service inside the perimeter. The downstream substrate remains embedding-free.

Calibration & factory

Persona Factory generates ~10k Big-Five-validated synthetic users quarterly (TinyTroupe + Serapio-García, ρ ≥ 0.80), runs them through the full pipeline, and feeds five refinement engines: Agent Optimizer (prompts), Rule Calibration Loop (PSL weights), Threshold Sweeper, Weight Tuner (IRT / BN), Persona Instruction Optimizer (sim-to-real gap).

Cloud Build Eval Gate enforces hard gates on every revision: ground-truth recovery ρ ≥ 0.80, test-retest not degraded > 10%, scalar invariance preserved, conformal coverage in [94%, 96%], Q3 residual-correlation < 0.20, embedding-anchored atom convergent validity ≥ 0.65. Auto-promote requires a 7-day 5% canary with no regression.

Tier A roster

The 13 canonical constructs that comprise the default API response and the Profile reading.

Construct	Instrument	Status
Openness big5_openness	BFI-2 / IPIP-NEO-120	GA
Conscientiousness big5_conscientiousness	BFI-2 / IPIP-NEO-120	GA
Extraversion big5_extraversion	BFI-2 / IPIP-NEO-120	GA
Agreeableness big5_agreeableness	BFI-2 / IPIP-NEO-120	GA
Emotional Sensitivity big5_neuroticism	BFI-2 / IPIP-NEO-120	GA
Honesty & Humility hexaco_h	HEXACO-60	Phase 4
Attachment — Anxiety attachment_anxiety	ECR-R	Phase 4
Attachment — Avoidance attachment_avoidance	ECR-R	Phase 4
Values orientation schwartz_values_higher_order	PVQ-RR taxonomy	Tier B default; opt-in Tier A
Emotional Intelligence emotional_intelligence	Treynor-Salovey-Mayer	GA
Moral foundations moral_foundations_profile	MFQ-20	GA
Resilience resilience	CD-RISC-10	GA — formalization Phase 6
Emotion granularity emotion_granularity	ICC of emotion-label corpus	Emergent

Versioning provenance

Every API response carries model_version, rule_set_version (+ SHA), vocabulary_version, irt_calibration_version, conformal_calibration_version, bn_cpt_version, and feature_registry_version. The atom-provenance URL resolves to a published page listing every atom with its theoretical_basis, lexicon/embedding anchors, calibration metrics, and version history.

Two registries · one source of truth

Vocabulary registry

Feature registry

Reasoning substrate

Probabilistic Soft Logic (PSL)

Item Response Theory (Graded Response Model)

Conformal Prediction

Bayesian belief network

Calibration & factory

Tier A roster

Versioning provenance