The four data layers

Every claim we make about a person is anchored in a four-layer chain: signals → rules → constructs → traits. The chain is what makes the system auditable. If we say someone tilts toward higher-than-average conscientiousness, you can follow that claim back through the trait estimate, the constructs that fed it, the rules that fired the constructs, and the signals the rules saw — all the way down to specific words and behaviours in actual conversations. Nothing is a black box.

This page walks each layer in plain English, then explains why the hierarchy exists and how we test that it is doing its job.

Signal: the small tells we listen for

The bottom layer is signals: small linguistic and behavioural tells the system extracts from each conversational turn. Approximately 250 distinct signals per turn, in the form of predicates — short structured facts that record something specific the speaker did in the conversation.

A signal is not "the speaker sounded conscientious." It is much smaller and much more concrete than that:

the speaker mentioned a deadline
the speaker named a specific person they were responsible to
the speaker used a hedge word ("maybe," "I guess") at the start of their turn
the speaker corrected themselves mid-sentence
the speaker made a value judgment about another person's behaviour

These are observations. They do not interpret; they record. The interpretation happens at the next layer.

Two reasons for this design. First, signals are inspectable. If we ever produce a trait estimate that surprises a reader, we can trace it back to the specific signals that fed it, and you can read the conversational fragment that produced each signal. Second, signals are reusable. The same signal — say, "the speaker named a future commitment" — feeds different rules in different contexts; it is part of the conscientiousness chain in one rule and part of the agreeableness chain in another. Re-using small primitives keeps the system compact.

Signals do not interpret. They record. The interpretation happens at the next layer.

The exact set of signals is proprietary engineering — the linguistic patterns that detect them are how we earn our keep — but the shape of what they detect is documented in this hierarchy. The numbers above (approximately 250 per turn) are the rough order of magnitude; the exact count drifts as we add and refine.

Rule: combining the tells into things a psychologist would notice

The next layer up is rules. A rule combines several signals into a single observation — the kind of thing a psychologist might write in a note after listening to the conversation.

Where a signal is "the speaker named a future commitment," a rule might be "the speaker described prioritising a future commitment over an immediate desire" — built from three or four signals that together suggest delay-of-gratification behaviour. Where a signal is "the speaker used a hedge word," a rule might be "the speaker hedged a value judgment about themselves" — which carries different inferential weight than hedging a value judgment about someone else.

Rules live in a system called PSL — Probabilistic Soft Logic — which lets us write each rule as a soft logical statement with an attached weight. The weights came out of careful psychometric work: which combinations of signals actually correlate with which constructs, and how strongly? The rule library is large (we currently have around 600 rules) and grows slowly as we encounter conversational patterns we are not yet capturing.

Crucially, rules can fire partially. A rule that detects "the speaker described prioritising a future commitment" does not fire as TRUE or FALSE; it fires as a probability between 0 and 1. This matters for uncertainty propagation: a rule that fires at 0.7 contributes seventy percent of its weight to the construct, not one hundred percent of its weight if any of its signals matched. The uncertainty at the bottom of the chain flows up through the chain rather than being discarded at each layer.

The PSL rule library itself is proprietary — it is the densest repository of clinically-informed structure in the system — but the methodology that produced it is documented openly: signal sets, rule weighting from psychometric data, soft-logic semantics for partial firing.

Construct: what specifically is being noticed

The third layer is constructs. A construct is a specific psychological tendency that has a name in the published literature: empathic concern, behavioural inhibition, cognitive flexibility, prosocial intentionality. There are 113 such constructs in our atom library. Each has a published definition, a published validation history, and at least one peer-reviewed measurement instrument that has been used to assess it.

Constructs are the layer where rules aggregate. A single construct — say, "empathic concern" — is fed by many rules; each rule contributes a weighted signal to the construct's score for the conversation. The construct's score is a calibrated estimate, not a raw count. The calibration is what the IRT machinery in the next layer is for.

Why 113? It is not a magical number. It is the count of distinct construct atoms that the published literature recognises across the ten frameworks we calibrate against (Big Five, HEXACO Honesty-Humility, Adult Attachment, Schwartz higher-order Values, Moral Foundations, Dark Triad, Emotional Intelligence, Cognitive Flexibility, Resilience, Emotion Granularity). Each construct atom maps to one or more of the 29 broader traits in the next layer; the mapping is many-to-many because a single construct often informs multiple traits.

The construct list is the most stable layer in the system. It updates only when the published literature evolves — for example, when a meta-analysis demonstrates that two constructs previously thought distinct are functionally equivalent, or when a new construct earns enough validation history to be added.

Trait: the broad strokes psychology has agreed on

The top layer is traits: the 29 broad dimensions that psychologists have worked out, often over decades, as the fundamental axes of normal-range human personality.

These are the names you may already know: the Big Five (Openness, Conscientiousness, Extraversion, Agreeableness, Emotional Stability), HEXACO Honesty-Humility, the two attachment dimensions (Anxiety, Avoidance), the four Schwartz higher-order values (Self-Transcendence, Self-Enhancement, Conservation, Openness-to-Change), six Moral Foundations (Care, Fairness, Loyalty, Authority, Sanctity, Liberty), the Dark Triad (Machiavellianism, Narcissism, Psychopathy), five Emotional Intelligence facets, Cognitive Flexibility, Resilience, and Emotion Granularity.

Each trait is fed by some subset of the 113 constructs through a learned weight matrix. The weights are not made up; they came out of statistical fits against the empirical correlation structure of the published instruments.

A trait estimate is not a single number; it is a calibrated distribution. We report a point estimate (where the centre of the distribution sits) and an uncertainty band (how confident we are about that centre). The uncertainty band is the most important single number on a trait estimate — it tells you whether the estimate is informative or whether the conversational evidence is too thin to support a confident reading.

The framework for producing these calibrated distributions is item-response theory — specifically the graded-response model, the same family of statistical tools used to score the SAT and the LSAT. The IRT layer is what turns the raw construct scores into trait estimates with proper uncertainty propagation.

Why a hierarchy

The four-layer chain costs more to build than a flat regression from raw signals to trait scores. We chose it for two reasons:

Auditability. When the system says someone tilts toward high conscientiousness, you can follow the claim back layer by layer: which constructs fed it, which rules fired, which signals supported each rule, and which conversational fragments produced each signal. Auditability matters for a system that produces psychometric estimates; it is the discipline that keeps "the model says so" out of the explanatory toolkit.

Uncertainty propagation. At each layer, partial firing and partial weight matter. Signals fire with probability; rules combine signals with weighted soft logic; constructs aggregate rules with calibrated weights; traits aggregate constructs through an IRT model with explicit uncertainty bands. The conversational evidence produces a calibrated distribution at the top, not a single point estimate, because the uncertainty was carried up through the chain rather than discarded at each layer.

A flat regression from signals to traits would compress all of this into one black box. We prefer the slower, traceable chain.

The four-layer hierarchy. Each layer aggregates a partial-firing pattern from the layer below; uncertainty propagates up through the chain rather than being discarded at each step.

How we test that it works

Two reliability checks anchor the trust we place in this chain.

Spearman-Brown reliability. For each user with at least eight conversations, we compute a split-half reliability coefficient — split the conversations into two halves, compute trait estimates from each half independently, measure how strongly the two halves agree, then apply the Spearman-Brown correction to project to the full conversation set. We require ≥ 0.70 Spearman-Brown reliability before reporting trait estimates as anything other than provisional. This is the standard psychometric threshold (Nunnally 1978).

Q3 residual correlations. For each item (rule) in the calibration corpus, we measure the residual correlation between item responses after the IRT model has accounted for the trait. If Q3 is large for some pair of items, the items share variance the IRT model is not capturing; we use this as a flag that the model needs refinement. We require |Q3| < 0.20 for the items we include in the production substrate.

Both checks are computed nightly. Both are visible in the data export you can request from the in-app Settings → Privacy & Data page. The exact thresholds we ship at are specified in the reviewer-grade R1 dataset-design report (available on request); the headlines (≥ 0.70 Spearman-Brown, < 0.20 Q3) are the public-facing summary.

What this layer does not try to do

The four-layer chain produces trait estimates with uncertainty bands. It does not:

Produce a single point score with no uncertainty. Trait estimates without uncertainty bands are not what we ship.
Produce diagnostic labels. The 29 traits are normal-range psychological tendencies, not psychiatric categories.
Produce predictions about specific future behaviours. Traits describe central tendencies, not deterministic forecasts.
Recover information the conversation does not contain. If a conversation does not exercise a particular construct, that construct's score will have a wider uncertainty band — the system tells you when the evidence is thin.

The next page, synthetic personas + scenarios, explains how we calibrate this chain against a corpus of two thousand synthetic personas across fifty conversational scenarios each — and why the synthetic-persona approach is scientifically defensible at this stage of the project.