The science

We map how a person's mind tilts across 29 trait dimensions using ten of the most-studied questionnaires in psychology — without ever asking you to fill out a questionnaire. We do this by listening to short reflective conversations the way a trained psychologist listens, then translating what we hear into calibrated trait estimates with explicit uncertainty bounds.

This page is the hub. The deeper material lives in seven sub-pages. If you have ten minutes, read the data layers page; if you have an hour, read all seven in order. If you want to see how your SoulMap readings hold up against the standards used to validate the gold-standard questionnaires themselves, methodology is the page you want.

The big picture, in three cards

Card	What it covers	Read
The four data layers	Signals → Rules → Constructs → Traits. How approximately 250 small linguistic and behavioural tells per conversational turn flow through psychologist-style rules into 113 mid-level construct atoms and ultimately into 29 broad trait dimensions. The hierarchy is the thing — every claim we make about a person is anchored in this chain.	9 min
Synthetic personas + scenarios	Why we model how traits cluster realistically — not as independent dice rolls — using a Gaussian copula whose dependence structure is assembled from five open empirical datasets, fifty-plus peer-reviewed meta-analyses, statistical bridges, and explicit conservative uncertainty where neither bridge nor data exists.	11 min
The calibration loop	The same statistical tools used to score the SAT and LSAT, applied to a corpus of two thousand synthetic personas across fifty conversational scenarios each. Probe 5 (the final shakedown probe) converged on real PSL data at R-hat = 1.002, n_eff_min = 1,318, zero divergences.	9 min

And three more:

Card	What it covers	Read
Methodology	Measurement invariance (Cheung & Rensvold 2002), the four primary KPIs, the multivariate Σ-fidelity check, the targeted test-retest sub-corpus, the holdout discipline. Frame: the same statistical playbook the best-respected academic teams use.	12 min
Validation	The 2026-05 Phase-1 validation run, against the canonical shakedown ledger. Numbers: R-hat = 1.002, n_eff = 1,318, 0 divergences, end-to-end pipeline proven, calibration_metrics + invariance rows live in BigQuery. What we promised in methodology, observed in the wild.	8 min
Roadmap	Phase 1 (done — pipeline validated end-to-end). Phase 2 (full corpus, locked design). Phase 3 (real-user invariance recalibration). Honest dates. Honest scope.	6 min

Plus an alphabetical citations roster that backs every footnote on every page in this tier.

Why this exists

Standardised questionnaires are the gold-standard measurement tool in personality psychology. They also have well-known limitations: they ask people to evaluate themselves on abstract claims ("I am the life of the party"), they suffer from acquiescence bias and social desirability bias, and they ask people to remember and average behaviour over weeks or months. The conversational evidence is richer than questionnaire evidence — when someone talks about a recent decision, they reveal things they would not endorse if asked directly — but it is also less standardised. Different conversations elicit different signals; different speakers reveal different facets of themselves.

We bridge this gap by taking the inferential machinery that makes questionnaires defensible — item-response theory, measurement invariance, calibrated uncertainty — and applying it to short reflective conversations instead. The result is a measurement substrate that is calibrated to questionnaire equivalents but elicited from natural language.

This is the same statistical playbook used in the best-respected academic teams. The difference is that we are doing it on conversational evidence rather than self-report scales, and we are doing it in a production system that updates calibrations as we learn what actually generalises to real users.

The conversational evidence is richer than questionnaire evidence — but it is also less standardised. The science here is in bridging that gap.

What we are NOT trying to do

The boundaries matter as much as the claims:

We do not diagnose. We are not a clinical assessment tool. The trait estimates we produce describe normal-range variation in psychological tendencies, calibrated against population-level reference data; they are not psychiatric diagnoses, they are not screening instruments, and they should not be used as either.
We do not replace clinical assessment. If you are wondering whether a person is at clinical risk for any condition, the right tool is a licensed clinician, not us. Our calibration data is from non-clinical adult populations; we have no claim on clinical sensitivity or specificity.
Demographic validity is recalibrated against real users, not assumed from synthetic data. Phase-1 ships a synthetic-corpus calibration as the first stable artifact. Phase-3 will recalibrate against real-user data with explicit measurement-invariance testing across the most consequential demographic boundaries. Synthetic-data calibration is the warm-up, not the destination.
We do not promise individual prediction at clinical-grade reliability. We promise calibrated population-level inference with explicit uncertainty bounds. The difference matters: at the population level, our estimates are well-calibrated and the uncertainty bands cover the truth at the rate we promise; at the individual level, the uncertainty bands are honest about how much they do not know.

This boundary is documented as the demographic-blind scoring policy: demographic information shapes the synthetic ground truth and the persona's behaviour in scenarios, but it is never an input feature to the downstream scoring engine (PSL → IRT → conformal → BBN). Calibration is global, with measurement-invariance testing as the discipline that proves the global model is fair across demographics. The personas page carries the full rationale; this is the strongest guarantee we make to the people who read their own results, and it avoids the high-risk-profiling flags of the EU AI Act.

A short reading order

For a journalist or a public visitor with ten minutes: just the data layers page, plus the "What we are NOT trying to do" box above.

For a technically trained reader who wants to audit the full chain behind their own readings: data layers → personas → methodology → validation. About 40 minutes.

For a journalist or analyst writing about the methodology: data layers → calibration → validation. About 25 minutes. The citations page is the bibliography.

Reviewers with passkey access can also reach the gated reports tier (available on request), which carries the proprietary IP behind the methodology in full reviewer-grade detail.

Where the data flows

The reflective conversations themselves stay private. Nothing about your conversation contents leaves the EU; nothing trains another user's model. Your trait estimates are visible in your account; you can export everything as a single JSON file, you can withdraw from the analytics export at any time, and you can delete your account with a verifiable cascade scrub. The product-level data controls live in the in-app Settings → Privacy & Data page.

This is the science page. The infrastructure that makes the calibration loop run on Google Cloud — the architecture, the operations, the cost engineering, the receipts — sits at /about/cloud-infrastructure. The full per-report deliverable surface is gated behind a passkey for reviewers and is available on request.