Reference

Methodology

The pipeline — prompts, rules, weights, thresholds, reliability coefficients, invariance tests, persona validity reports, factory iteration logs — is peer-reviewable and open. The raw dataset is closed to external third parties.

Two registries · one source of truth

Vocabulary registry

functions/vocabulary/registry.yaml — every atomic predicate, observation, situation type, metric threshold, construct, and composite. LinkML-authored schema; CI generates Pydantic + JSON Schema + OWL + BigQuery DDL + OpenAPI from one source. Three-gate validation (pre-commit, CI, runtime fail-fast). Semantic versioning per atom.

Feature registry

nexus_feature_registry — every field stored in Firestore + BigQuery. Carries layer (Context / State / Trait), tier (A / B / C), retention, consent scope, invariance grouping, presented_as, and instrument citation. Generates DDL + API types + Pydantic via Cloud Build.

Reasoning substrate

The engine is deterministic, formally characterized, and embedding-free on the construct-scoring path:

  • Probabilistic Soft Logic (PSL)

    Replaces ad-hoc confidence arithmetic with a formally characterized calculus over rule annotations. Determinstic, reproducible.

  • Item Response Theory (Graded Response Model)

    Per-trait aggregation (Samejima 1969) instead of weighted sum. Quarterly re-fit of item parameters in the factory.

  • Conformal Prediction

    Distribution-free 95% confidence intervals with empirical coverage held in [94%, 96%]. Reported as a CI on every score.

  • Bayesian belief network

    Cross-construct propagation derived from the vocabulary `consumes` field. Quarterly CPT re-fit.

Embeddings are admitted at atom-level affective and intent detection only — a frozen, MIT-licensed BGE-small-en-v1.5 with author-curated seed-anchor sets, served from a Cloud Run service inside the perimeter. The downstream substrate remains embedding-free.

Calibration & factory

Persona Factory generates ~10k Big-Five-validated synthetic users quarterly (TinyTroupe + Serapio-García, ρ ≥ 0.80), runs them through the full pipeline, and feeds five refinement engines: Agent Optimizer (prompts), Rule Calibration Loop (PSL weights), Threshold Sweeper, Weight Tuner (IRT / BN), Persona Instruction Optimizer (sim-to-real gap).

Cloud Build Eval Gate enforces hard gates on every revision: ground-truth recovery ρ ≥ 0.80, test-retest not degraded > 10%, scalar invariance preserved, conformal coverage in [94%, 96%], Q3 residual-correlation < 0.20, embedding-anchored atom convergent validity ≥ 0.65. Auto-promote requires a 7-day 5% canary with no regression.

Tier A roster

The 13 canonical constructs that comprise the default API response and the Profile reading.

ConstructInstrumentStatus

Openness

big5_openness

BFI-2 / IPIP-NEO-120GA

Conscientiousness

big5_conscientiousness

BFI-2 / IPIP-NEO-120GA

Extraversion

big5_extraversion

BFI-2 / IPIP-NEO-120GA

Agreeableness

big5_agreeableness

BFI-2 / IPIP-NEO-120GA

Emotional Sensitivity

big5_neuroticism

BFI-2 / IPIP-NEO-120GA

Honesty & Humility

hexaco_h

HEXACO-60Phase 4

Attachment — Anxiety

attachment_anxiety

ECR-RPhase 4

Attachment — Avoidance

attachment_avoidance

ECR-RPhase 4

Values orientation

schwartz_values_higher_order

PVQ-RR taxonomyTier B default; opt-in Tier A

Emotional Intelligence

emotional_intelligence

Treynor-Salovey-MayerGA

Moral foundations

moral_foundations_profile

MFQ-20GA

Resilience

resilience

CD-RISC-10GA — formalization Phase 6

Emotion granularity

emotion_granularity

ICC of emotion-label corpusEmergent

Versioning provenance

Every API response carries model_version, rule_set_version (+ SHA), vocabulary_version, irt_calibration_version, conformal_calibration_version, bn_cpt_version, and feature_registry_version. The atom-provenance URL resolves to a published page listing every atom with its theoretical_basis, lexicon/embedding anchors, calibration metrics, and version history.