ArchitectureRedaction vs split-knowledge

Redaction is a promise.
Architecture is a guarantee.

Four approaches to keeping PII out of AI inference: provider-side filters, client-side redaction libraries, AI-gateway proxies, or infrastructure-level split-knowledge. Three of them are software promises — a bug leaks PII. The fourth is an architectural property: the AI cannot see identity because the route does not exist.

TL;DR

Redaction is a software promise: code reads input, removes identifiers, then sends the rest to the model. A bug, a regex miss, a context-dependent identifier the matcher didn't catch — and PII reaches the model. Split-knowledge is an architectural property: identity data lives in Sandbox A, inference runs in Sandbox B, and there is no network path from B back to A. The difference matters when an auditor asks not "are you trying to keep PII out?" but "can you prove PII never crossed?"

See the comparison →Platform overview →

01Four approaches

How teams keep PII
out of the model today.

Three software approaches and one architectural one. Each protects against a different failure mode; each fails under a different one. Pick the lightest approach that survives your regulator's review.

Approach 01

Provider-side privacy mode

Vendor-provided privacy modes: the LLM provider's content filtering and data-handling settings. The vendor promises not to log or train on your data. Inference still sees the full input including PII. Compliance is policy-bound, not architectural.

Fine for non-regulated tooling. Fails any audit that requires the AI to provably not see identity.

Approach 02

Client-side redaction libraries

Presidio, spaCy NER, regex matchers, custom Python. Your application strips identifiers before the LLM call. Mappings (token → real value) typically held in app memory or a side database. Coverage is matcher-quality bound; context-dependent PII often missed.

Standard for early-stage products. Fails when an auditor asks for proof that redaction happened on every call.

Approach 03

AI-gateway proxy redaction

Cloudflare AI Gateway, Lakera, Robust Intelligence — middleware that sits between your app and the LLM, redacts on the way out, and re-hydrates on the way back. Centralised redaction policy; better than client-side. Still software, still bug-shaped.

Right when client-side is unmanageable. Fails when the gateway itself is the trust boundary the auditor pushes on.

Approach 04 · Lucairn

Infrastructure-level split-knowledge

Sandbox A holds identity (WHO). Sandbox B runs inference (WHAT). There is no network path from B to A. Even Lucairn operators with full Sandbox B access cannot re-identify a single response. Plus: every decision produces a signed receipt anchored in a public log.

Right when procurement requires architectural evidence, not vendor promises.

02Compare

Eight criteria,
four approaches.

The criteria below are what a DPO, CISO, or external auditor will actually push on. Lucairn's split-knowledge architecture wins on five, ties on two, loses on one (operational burden).

Criterion

Provider-side

Client redact

AI gateway

Lucairn split-knowledge

PII never reaches the inference model

—Vendor sees raw input

PartialMatcher-quality bound

StrongArchitectural — no path

Failure mode if redaction is incomplete

—PII to vendor

ContainedBug stays in Sandbox A

Per-decision proof of redaction

—

PartialGateway logs

YesSigned sanitizer manifest

Re-identification map custody

—Vendor or none

PartialApp memory / DB

PartialGateway memory

YesSandbox A only

Coverage of context-dependent PII

—

PartialPattern-bound

StrongThree-layer detection + zone isolation

Cryptographic audit trail

—

PartialBest-effort

YesEd25519 + Sigstore Rekor

Vendor / deployment portability

—Vendor lock-in

YesLibrary swap

PartialGateway lock-in

YesOpen protocol

Operational burden

Lowest

ModerateApp-coded

ModerateService to operate

ModerateBridge + witness

03When to choose what

Each approach is
the right answer somewhere.

Honest framing: not every workload needs split-knowledge. Pick the lightest approach your audit will accept.

Provider-side privacy mode (Approach 01) when…

Internal tooling, non-customer-facing decisioning
No regulator with audit authority over the data path
Vendor's privacy contract is acceptable evidence
Engineering convenience outweighs compliance depth

Client-side or gateway redaction (02 / 03) when…

PII detection is a defence-in-depth layer, not the primary control
Your auditor accepts software-based redaction with logs
Internal use; PII categories are well-bounded by patterns
You need to ship before the architectural option is operationally feasible

Split-knowledge (Approach 04) when…

Customer-impacting AI decisions in regulated industries
An external auditor will challenge the redaction integrity
Context-dependent PII is in scope (clinical notes, contracts)
DORA Art 28 or EU AI Act Art 12 is in your future
Procurement requires architectural evidence, not vendor promises

04Frequently asked

Redaction vs split-knowledge — questions,
answered.

Isn't a good redaction library 'good enough'?

It depends on what you're protecting against. For pattern-bound PII (IBANs, phone numbers, names in structured fields), a good library catches 95%+. For context-dependent PII (medical condition tied to a free-text identifier, transactional details that quasi-identify a customer), pattern matching misses. The deeper issue isn't the matcher quality — it's the failure mode. When redaction misses, PII reaches the model. When split-knowledge "misses," the bug still lives in Sandbox A; the model still cannot see it because the network path doesn't exist.

What about provider privacy modes — aren't those enough?

Provider privacy modes guarantee the vendor will not log or train on your data. They do not change what the model sees during inference. The model still processes the raw input including identity. For regulated work where the regulator's question is "prove the AI didn't see personal data," a vendor's promise that they won't keep it isn't the same as proof it never reached the inference path. EU AI Act Art 13 transparency and GDPR Art 25 by-design are about the data path, not the vendor's logging policy.

Is Lucairn's architecture overkill for typical SaaS?

For non-regulated SaaS, yes — the operational overhead of running a gateway, bridge, and witness isn't worth it if your AI is internal tooling and no auditor will examine the data path. For regulated work, the calculus inverts: provider-side redaction or client-side libraries leave you defending a software promise in front of an auditor, which is not where you want to be. Lucairn is heavier than gateway redaction by maybe 20% in operational complexity, but it changes the conversation from "trust us" to "here's the receipt."

Can I combine redaction libraries with Lucairn?

Yes — this is the production-default. Lucairn's sanitiser uses Presidio plus a quasi-identifier risk engine inside Sandbox A. The redaction libraries are the matcher; the architectural property is what makes them load-bearing. Together they give you both pattern-bound and architectural coverage. A custom-trained PII shield model fitted to your domain corpus is available as an Enterprise-only option (priced per scope).

What if a context-dependent identifier slips through the sanitiser anyway?

Two things happen. First, the bug stays in Sandbox A — the model in Sandbox B still cannot see it, because the bridge only carries the de-identified payload (whatever the sanitiser produced). Second, the receipt records the sanitiser scheme used. If a class of identifiers turns out to be undermatched, you can identify the affected receipts retroactively by querying for the scheme version — the audit chain helps you scope the incident, rather than complicating it.

Does split-knowledge work for context-rich inputs (clinical notes, legal contracts)?

Yes, and that's where the architectural property matters most. Clinical notes and contracts are full of context-dependent PII that pattern matching misses. Lucairn's three-layer sanitiser (Presidio + quasi-identifier risk + an optional custom-trained PII shield on the Enterprise tier) handles ~90%+ of these cases. The 10% that slips through stays in Sandbox A — the model never sees it. That's the architectural payoff in practice.

05Get started

From assessment
to production.

Run the self-serve assessment against your AI workflow and see whether software-based redaction is enough or whether split-knowledge is the right call. 15 minutes. Output goes to your DPO.

Start the assessment →Talk to sales →

How teams keep PIIout of the model today.

Provider-side privacy mode

Client-side redaction libraries

AI-gateway proxy redaction

Infrastructure-level split-knowledge

Eight criteria,four approaches.

Each approach isthe right answer somewhere.

Redaction vs split-knowledge — questions,answered.

Isn't a good redaction library 'good enough'?

What about provider privacy modes — aren't those enough?

Is Lucairn's architecture overkill for typical SaaS?

Can I combine redaction libraries with Lucairn?

What if a context-dependent identifier slips through the sanitiser anyway?

Does split-knowledge work for context-rich inputs (clinical notes, legal contracts)?

Private AI inference

Audit trail for AI

BYOK AI platform

GDPR & AI

From assessmentto production.

How teams keep PII
out of the model today.

Eight criteria,
four approaches.

Each approach is
the right answer somewhere.

Redaction vs split-knowledge — questions,
answered.

From assessment
to production.