Question 1

Why not just use Microsoft Presidio directly?

Accepted Answer

If your only need is text redaction at the application layer, Presidio direct is fine. Presidio is what we use for L2 — the Named Entity Recognition layer of the sanitizer. The case for Lucairn is not the redaction layer alone. It is the architectural separation between Sandbox A (identity) and Sandbox B (inference), the ID Bridge with audited re-linkage governance, and the signed certificate chain that makes every call independently verifiable. Presidio gives you redaction; Lucairn gives you redaction plus an infrastructure-level guarantee that even if a redactor regex misses a pattern, the AI sandbox cannot reach the identity store across the network.

Question 2

Why not OpenAI's content filtering + a system prompt?

Accepted Answer

If you trust the LLM provider's policy implementation against your DPO and your auditor, system prompts are simpler and cheaper. Content filtering and system prompts are policies the LLM is asked to follow. The model still receives the identity data and decides — at inference time — whether to honour the policy. Lucairn enforces at the infrastructure layer: the LLM never sees the identity data, period. The pseudonymisation happens before the request leaves the gateway, so policy compliance is not a behavioural property of the model; it is a structural property of the network. That distinction matters when an auditor asks for proof that PII never crossed, rather than proof that you tried to keep it out.

Question 3

Why not Nightfall, BigID, or commercial DLP vendors?

Accepted Answer

If you need network-egress DLP across your SaaS apps and endpoints, use those — they're complementary, not competitors. Those vendors operate at a different point in the architecture: DLP at endpoints, SaaS apps, network egress, or data-at-rest scanning. Lucairn operates at the LLM-call boundary — the moment a prompt is about to be sent to an inference provider. The two stack cleanly: DLP catches PII flowing out of your laptops and SaaS tools; Lucairn catches PII flowing into the AI hop and produces a signed evidence trail for that specific call. If your compliance program requires both surfaces covered, run both.

Question 4

Why not Skyflow?

Accepted Answer

If your problem is structured PII storage, use Skyflow. They can stack: Skyflow for storage, Lucairn for the LLM hop. Skyflow is a vault for structured PII — databases, customer profiles, payment records. It excels at tokenising structured records you already store and at proxying access to them through a privacy-aware API. Lucairn focuses on the LLM-call boundary: sanitising prompts in flight and producing a per-call certificate that an auditor can verify against the witness. The two solve different problems and can run side-by-side — Skyflow holds the canonical identity record; Lucairn ensures that when your AI workflow needs to reason about a record, no identity-linked content reaches the model.

Question 5

Why not just use Ollama, Llama 3, or a self-hosted open-weight model?

Accepted Answer

Self-hosting an open-weight model removes the third-party LLM provider as a sub-processor — that's a valid privacy posture on its own. Running a model on your hardware genuinely closes one threat (the LLM provider as a sub-processor). It does not close the others: an open-weight model running on your hardware can still leak PII into prompt logs, retrieval indexes, embeddings databases, fine-tuning corpora, and operator dashboards. Those leaks happen inside your infrastructure, but they happen. Lucairn gives you the architectural identity/inference separation regardless of where the model runs — BYOK supports self-hosted endpoints, so Sandbox B can call your own Ollama / vLLM / TGI deployment and still produce a signed certificate that proves no identity data crossed.

Question 6

Does streaming work?

Accepted Answer

Not yet. Per-chunk DLP is a hard problem: you cannot redact a PII pattern you haven't seen yet, and once a token has been streamed to the client, you cannot unsend it. Today the gateway gates streaming at services/gateway/internal/api/proxy.go:266-275 (default off via the STREAMING_ENABLED env var, controlled at services/gateway/cmd/server/main.go:405) and the OpenAI Chat Completions adapter rejects stream:true with HTTP 400 at services/gateway/internal/api/openai_handler.go:30-32 + :103-104. We chose hard-rejection over a fragile half-measure that would have shipped tokens before the sanitizer had finished. Streaming with an evidence-preserving chunking strategy is on the roadmap. The capability matrix on /integration tracks this honestly and will flip when it ships.

Question 7

Do tool-calls and function-calling work?

Accepted Answer

Not yet — the gateway does not forward tools or tool_choice today, and we surface it on every developer page. Tool inputs need their own sanitization pass, and the gap between sanitising tool-call definitions versus sanitising tool-call arguments has subtle correctness traps that are easy to ship wrong. Today the gateway handlers do not forward the tools or tool_choice fields at all. Messages with role:"tool" go through the freetext sanitizer at services/gateway/internal/api/anthropic_handler.go:230 and services/gateway/internal/api/openai_handler.go:309, but the tool definition array and call-argument payloads themselves are not sanitised. We disabled the field rather than ship it half-working. If you need PII-aware tool inputs today, use the DSA Proxy API with explicit field routing — you classify each input as identity, freetext, or passthrough and the bridge handles each path explicitly. The tool-call coverage gap is on the roadmap and tracked on the /integration capability matrix.

Question 8

Is GDPR or EU AI Act compliance “by default”?

Accepted Answer

Architecture-by-default helps. Compliance-by-default is something no vendor can sell you — it is your program. Lucairn ships the architectural building blocks: Article 25 data-protection-by-design via the dual sandbox separation, Article 32 security-of-processing via signed certificates and NetworkPolicy isolation, AI Act Article 10 + 12 + 14 + 15 mappings via the sanitizer pipeline and append-only audit chain. The compliance program around those building blocks is your responsibility: you still need a DPIA for your AI use case, sub-processor disclosures (Lucairn becomes one), a Data Processing Agreement with us, AI Act risk classification of your application, retention policies, and operator training. We provide reference templates; we do not — and cannot — issue your DPIA on your behalf.

Question 9

What's the latency overhead?

Accepted Answer

Single proxy hop typically adds 200–1500 ms over the raw LLM call. If sub-200 ms is non-negotiable, Lucairn is not the right hop. The variance comes from sanitization layer count and prompt size. A short German support ticket through L1 + L2 sits at the low end; a long multi-paragraph payload through all three sanitizer layers (L1 known-entity matching, L2 Presidio NER, L3 LLM PII Shield) sits at the high end. Anchored attestation — Time-Stamp Authority signature and Sigstore Rekor inclusion — is asynchronous and does not add user-facing latency; the certificate URL returns immediately and the anchors fill in within a few seconds. Streaming would lower perceived latency for long completions but is not yet supported (see above).

Question 10

What if the sanitizer misses a PII pattern?

Accepted Answer

Sanitizer misses can happen. The architectural separation is what makes them survivable. The sanitizer is layered — L1 known-entity matching against your customer-supplied recognizers, L2 Presidio NER for general identifiers, L3 LLM PII Shield for context-dependent and domain-specific patterns. None of those layers is perfect on its own; together they cover most production traffic. The reason this is acceptable rather than alarming is structural: even if a PII string slips past every sanitizer layer, Sandbox B (AI processing) is network-isolated from Sandbox A (identity store). The leak is bounded — it cannot cross the boundary back into a reusable identity-linked record, and the audit chain captures the request anyway. Customer-supplied recognizers and a custom-trained level-3 PII shield (an Enterprise-tier add-on, priced per scope) close most domain-specific gaps for teams that need to.

Why not just use X?

01.Why not just use Microsoft Presidio directly?

02.Why not OpenAI's content filtering + a system prompt?

03.Why not Nightfall, BigID, or commercial DLP vendors?

04.Why not Skyflow?

05.Why not just use Ollama, Llama 3, or a self-hosted open-weight model?

06.Does streaming work?

07.Do tool-calls and function-calling work?

08.Is GDPR or EU AI Act compliance “by default”?

09.What's the latency overhead?

10.What if the sanitizer misses a PII pattern?

Related reading

Redaction vs split-knowledge

What works today

Architecture and compliance FAQ

Want to see this in action?