Data ethics — the rules we bind ourselves to

SoulMap works by asking you to make choices under pressure — and by inferring, from those choices, things about you that you may never have said out loud. Data of that kind deserves rules stricter than the law requires, written down where anyone can read them. This page is those rules. The /privacy and /terms pages cover the legal mechanics; this one covers the architectural and regulatory reasoning underneath.

What SoulMap collects

Your responses inside the app — the choices you make in each session, and the basic interaction signals around them (timing, hesitation, reversals).
The trait estimates derived from those responses — the inference engine's output, held to the published methodology at /about/science/methodology.
Optional demographics — only what you explicitly volunteer, each item behind its own consent, each removable at any time.
Account basics — email and authentication identifiers.

Everything below exists to constrain what can happen to that data.

The non-negotiables

Seven invariants hold across SoulMap, without exception clauses:

No advertising. SoulMap is not ad-funded and never will be. Nothing you do in the app is used to target you with anything.
No sale of identifiable data. Data that can identify you is never sold, rented, or shared for commercial gain.
No training on your data without explicit opt-in. Your data does not train models — ours or anyone else's — unless you enable a dedicated, separable consent switch that defaults to off.
One-click total deletion. Deleting your account cascades through every store we operate — the application database, the analytics warehouse, and trace storage — with an audit record.
18+ only. Minors are excluded from trait-scoring distributions entirely.
No clinical diagnosis surface. Outputs are for reflection and self-discovery; they are not medical advice and we do not market them as such.
No hiring or educational-assessment use. These uses are EU AI Act high-risk and are excluded by design, not merely by policy.

Three layers of law and architecture hold these in place.

Layer 1 — Granular, separate, withdrawable consent

GDPR Articles 6(1)(a) and 9(2)(a) recognise explicit consent as a lawful basis, including for any special-category inferences. SoulMap exposes a dedicated, separable consent toggle in user settings — distinct from the consent that operates the product itself:

"License my anonymized profile data for AI / ML training research."

It defaults to off, is withdrawable at any time, and revoking it cascades:

Any future data export excludes you immediately.
Every export carries the consent vintages of the users it draws on; any downstream use would require re-checking those vintages before a model retrain.
Your raw data is removed from the analytics warehouse on the next deletion cycle.

Consent is the most legally defensible basis available. It is also, deliberately, not the only one we rely on — which is why the next two layers exist.

Layer 2 — Anonymization or synthesis (the GDPR Recital 26 bar)

GDPR Recital 26 takes truly anonymous data out of GDPR scope entirely. The bar is "no reasonable means of re-identification." The EDPB's Opinion 28/2024 on AI-model anonymity extends this to the model itself: a model is anonymous if it is very unlikely to directly or indirectly identify any training subject, including through queries against the model.

The pipeline is built to meet this bar:

Salted SHA-256 hashing of user identifiers before any export to the analytics warehouse.
Aggregation to construct-level distributions — any research view of the data is distributions across cohorts, never raw individual responses.
k-anonymity and differential-privacy controls on demographic facets before any release. Cells below a minimum group size are suppressed.
Synthetic-data generation for cases where downstream re-identification risk would otherwise be non-trivial.

A dataset that has passed this pipeline sits outside GDPR scope by Recital 26. We do not rely on this alone — Layer 1 consent vintages are always required as well — but it is the second layer of defence.

Layer 3 — EU AI Act Article 10 training-data governance

The EU AI Act becomes enforceable for high-risk AI systems on 2 August 2026. Article 10 — data governance — requires training-data documentation: sources, preprocessing steps, anonymization methodology, bias and representativeness assessment, and known limitations.

The architectural lineage that makes this paperwork tractable is already in place:

Consent registry — every Echo carries an explicit consent vector at write time.
Audit log — every data export references the underlying consent vintages, anonymization parameters, and aggregation thresholds.
Deletion cascades — deletion requests propagate across the application database, the analytics warehouse, and trace storage with an audit record.
Methodology releases — quarterly publication of measurement parameters, reliability metrics, and invariance proofs (see /about/science/methodology).

Any future dataset release would ship with an Article 10–ready data card — sources, preprocessing, anonymization methodology, bias assessment, known limitations. The paperwork exists before anything that would need it.

Tension to be honest about

We explicitly avoid the trap of arguing that psychometric inferences are never special-category data under GDPR Article 9. A regulator could reasonably take a more conservative view where inferences approach mental-health territory. Our non-goals (no clinical diagnosis, no political orientation on the primary surface) and Article 9(2)(a) explicit-consent posture together mitigate this — but we treat any unresolved ambiguity as a reason to be more conservative, not less.

A forward-looking note — consented research contributions

Nothing in this section exists today. It is a design intention, documented in advance so that if it ever ships, it ships under rules that were public first.

The intention is this: users who opt in — through the separate consent described in Layer 1 — could contribute anonymized, aggregated data to psychometric and AI research, with royalties from any resulting license settled back to the contributing users. That programme would ship only after the inference engine completes its validation programme — the published measurement parameters, reliability metrics, and invariance proofs described above meeting the bar we have set for them.

Until then, no dataset is exported and no license exists. The consent toggle, the anonymization pipeline, and the Article 10 lineage stand ready regardless — because the rules on this page bind the product that is live, not the one that is imagined.