The default mistake

A pattern recurs in almost every LLM architecture review we run. A team has built a customer-facing application that calls an upstream LLM provider — Anthropic, OpenAI, sometimes both. The user's prompt contains personal data, often identifiable: a name, an email address, occasionally a customer ID number. The application forwards the prompt to the model. The model returns an answer. The team stores the conversation log.

We ask to see the data-processing agreement with the LLM provider. The reply is some variation of: "We agreed to their terms of service. The processor agreement is included by reference, I think."

That answer is the violation. GDPR Article 28(3) requires that processing by a processor "shall be governed by a contract or other legal act under Union or Member State law, that is binding on the processor with regard to the controller and that sets out the subject-matter and duration of the processing, the nature and purpose of the processing, the type of personal data and categories of data subjects, and the obligations and rights of the controller." A box on a sign-up form titled "I agree to the Terms of Service" rarely satisfies this.

The error is not just paperwork. It is a structural misclassification. The team is treating the LLM provider as a vendor that supplies a tool. Article 28 treats it as a processor that handles personal data on the controller's behalf. The two relationships have very different obligations attached to them. Mistaking the first for the second is one of the most common findings in LLM-architecture compliance reviews under GDPR.

Controller vs processor — applied to LLM proxies

Article 4(7) of the GDPR defines a controller as "the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data." Article 4(8) defines a processor as "a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller."

Apply this to an LLM proxy that sits between your customer and an upstream LLM provider. Three distinct roles emerge:

Your customer is processing personal data of their end users for a purpose they decide. They are the controller.
You, running the proxy, are processing that data on behalf of your customer to deliver the service they bought from you. You are a processor to your customer.
The upstream LLM provider is processing the data again — on your behalf — to compute a model response. They are a sub-processor to you, and a sub-processor to your customer through you.

Already there are two Article 28 contracts required: one between your customer and you, and one between you and the LLM provider. Article 28(2) explicitly addresses sub-processors: "the processor shall not engage another processor without prior specific or general written authorisation of the controller." Your customer has to be told who their sub-processors are, and they have to be able to object.

There is a second wrinkle that catches many teams. When the proxy itself decides which LLM model to call, what system prompt to apply, what filtering to run — it is making decisions about means of processing. The European Data Protection Board's guidelines on the concepts of controller and processor (Guidelines 07/2020) are explicit: a processor that determines the essential means of processing is no longer a pure processor; it is at minimum a joint controller for those decisions.

For pure pass-through proxies — the request is forwarded verbatim to the configured upstream model — this stays cleanly within the processor role. For proxies that add system prompts, run sanitization, choose models based on content type, or apply post-processing rules, the role becomes more nuanced. The defensive position is to document those decisions transparently in the customer-facing data-processing agreement so the controller is not surprised.

Article 28 paragraph 3 — the binding clauses

GDPR Article 28(3) lists eight specific obligations that must be in the controller-to-processor contract. Not "should." Not "may." "Shall." Each one maps to a concrete clause in the DPA stack:

(a) Process only on documented instructions from the controller. The DPA must say what the processor is allowed to do. A vague "to deliver the service" does not satisfy this — the documented instructions must be specific enough that a regulator can determine whether a particular processing activity was authorised.
(b) Ensure persons authorised to process the data are committed to confidentiality. Employee NDAs that mention personal data, or a confidentiality clause in employment contracts. Auditable.
(c) Take the security measures required under Article 32. Pseudonymisation and encryption of personal data, ability to ensure ongoing confidentiality and integrity, ability to restore availability after incidents, regular testing of effectiveness. The DPA cites Article 32; the implementation evidence sits separately.
(d) Respect the conditions for engaging another processor. The sub-processor authorisation chain. Specific or general authorisation in writing. Notice of any intended changes.
(e) Assist the controller in responding to data-subject requests. Right of access, rectification, erasure, portability, objection. The DPA must commit the processor to assisting — typically by providing technical means for the controller to fulfil the request.
(f) Assist the controller in complying with Articles 32–36. Security, breach notification, impact assessments, prior consultation. Often this is where breach-notification SLAs go.
(g) Delete or return all personal data after the end of the provision of services. A deletion or return commitment, with a specific timeline and a deletion-confirmation artefact.
(h) Make available all information necessary to demonstrate compliance. Audit rights, inspection rights, the ability to bring in third-party auditors.

A DPA missing any of these eight is non-compliant. A DPA listing all eight in vague language ("the processor will assist") and with no operational mechanism to back the language up is also non-compliant in any meaningful sense — Article 28(3)(h) requires the controller be able to demonstrate compliance, which means the operational mechanism must exist somewhere.

Common gaps

Reviewing real LLM-stack DPAs, the same handful of gaps appear repeatedly.

Data localisation absent. The DPA does not say where the personal data will be processed. For an EU controller using a US-headquartered LLM provider, this is a Schrems II problem before any processor obligation question even arises. Article 28(3)(a)'s "documented instructions" should include geographical scope of processing — where the controller has decided the data is allowed to go.

Sub-processor list stale. The DPA includes a sub-processor list, but it was drafted six months ago. The LLM provider has since added a new infrastructure dependency — perhaps a new region of a cloud provider, perhaps a new content-safety partner. The controller was not notified. Article 28(2) requires prior authorisation; a stale list is a violation in slow motion.

No breach-notification SLA. Article 33 obliges the controller to notify the supervisory authority of a personal-data breach without undue delay and where feasible within 72 hours. The processor's contract has to give the controller time to do that. A processor agreement that says "we will notify the controller of a breach as soon as practicable" leaves no headroom for the controller to comply with their 72-hour clock.

"Training opt-out" not actually wired. Many LLM provider terms include language about not training on customer data without consent. The contractual clause exists. The technical implementation is sometimes uncertain. A controller cannot rely on a clause without being able to verify the technical mechanism that enforces it. This is one of the questions a competent regulator will ask, and the controller has to have the answer.

Joint-controllership ambiguity. Where the proxy itself adds material processing logic (sanitization, content filtering, model selection), the DPA may need to be redrafted as a joint-controller agreement under Article 26 for that subset of decisions. Most templates do not anticipate this. A clean architecture review classifies each processing decision and routes it to the correct legal instrument.

Schrems II and the LLM stack

The Court of Justice of the EU's 2020 ruling in Data Protection Commissioner v. Facebook Ireland and Maximillian Schrems (Case C-311/18, "Schrems II") invalidated the Privacy Shield framework that had governed US-EU data transfers. The replacement EU-US Data Privacy Framework adopted in 2023 is in force, but the underlying analytical structure remains: an EU controller transferring personal data to a third country has to assess whether the third-country legal regime offers protection essentially equivalent to that of the GDPR.

For an LLM stack, this means: every API call from your gateway to a US-based LLM provider is a third-country transfer of whatever personal data is in the prompt. Standard contractual clauses (SCCs) help. They are necessary. They are not sufficient if the data being transferred is identifiable and the third-country government has surveillance authorities that materially override the protections in the SCC.

This is where pseudonymisation upstream of the LLM call genuinely changes the calculus. The European Data Protection Board's guidelines on supplementary measures (Recommendations 01/2020) are explicit that pseudonymisation, performed before transfer in such a way that the recipient cannot re-identify data subjects, can be an effective supplementary measure. The architectural test is whether the recipient — including any government with lawful access to the recipient's systems — can identify a natural person from what they receive. If the answer is no, the data being transferred is no longer "personal data" in the same sense from the third-country recipient's perspective, and the Schrems II analysis becomes considerably more favourable.

This is not a loophole. It is the regulatory-design rationale behind GDPR Article 25 ("data protection by design") working as intended. If the architecture ensures that identifiability does not cross the transfer boundary, the transfer risk profile changes. We have written more on this in why-data-redaction-fails — redaction alone is not the same thing as architectural pseudonymisation, and the difference matters under Schrems II.

A working DPA stack for an EU AI deployment

A working DPA stack for a typical EU AI deployment looks like this:

Customer DPA. An Article 28(3) DPA with each customer. Includes geographical scope of processing, sub-processor list, breach-notification timeline aligned with Article 33's 72-hour window, deletion timeline, audit rights.
Internal Article 28 chain. Each named sub-processor in the customer DPA has its own Article 28 contract with you. Article 28(4) requires that "the same data protection obligations" applicable to you also apply to your sub-processors — the controller-facing commitments flow down.
Sub-processor list. Maintained as a living document. Notification mechanism documented in the customer DPA: who gets notified, on what channel, with what notice period. The Lucairn sub-processor list (Hetzner, Cloudflare, Supabase, Resend, Plausible, Anthropic, FreeTSA, Sigstore Rekor) is an example — each has a defined role and a defined contract.
Deletion timeline. Specific. "Within 30 days of contract termination" with a deletion-confirmation artefact. "Personal data will be deleted in accordance with our policies" is non-compliant under Article 28(3)(g) because it is not a commitment.
Audit rights. Article 28(3)(h) means the controller can audit the processor. The DPA specifies how (on-site, remote, third-party auditor), how often, and who pays for it.
Retention policy. Distinct from deletion-on-termination. Per-data-category retention durations, tied back to the legal basis for retention.

The LLM-specific addition is being explicit about model providers as sub-processors, about the geographical path of each request, and about which decisions in the proxy stack qualify as "essential means" under EDPB Guidelines 07/2020.

Where Lucairn fits

Pseudonymisation upstream of the LLM call is the architectural primitive that makes the rest of the DPA stack tractable. When the proxy strips identifiable elements from the prompt before the call leaves your boundary — substituting [PERSON_1], [EMAIL_1], [IBAN_1] for the underlying values — you do not need to extend the Article 28 trust chain to the LLM provider for those data elements. The LLM provider is processing tokens, not personal data, for the parts that have been pseudonymised. That changes the contract surface.

The Lucairn gateway runs that sanitization layer at the boundary. The cryptographic re-linkage state sits in a separate sandbox the LLM model has no network path to. The integrity of the redaction is captured in a signed manifest that becomes part of the audit record under Article 12 of the EU AI Act. When a regulator asks how the controller can be sure the LLM provider did not see the underlying personal data, the answer is structural rather than contractual: the gateway emitted a signed manifest stating which fields were redacted, the manifest is anchored in a public transparency log, and the LLM provider has no key material that would let it reverse the redaction even if it tried.

This does not eliminate the DPA stack. You still need the customer DPA, the sub-processor agreement with the LLM provider, and the rest of the Article 28 chain. What it does is shrink the trust surface of each of those agreements. The DPA with the LLM provider covers what the LLM provider actually sees — which, for sanitized fields, is an opaque token. The Schrems II analysis on that path improves accordingly.

For details on the operational architecture supporting this, see /private-ai-inference and /security. For the legal framing of how this maps to the broader GDPR Article 25 obligation, see GDPR Article 25 in practice.

GDPR Article 28 is not paperwork. It is the contractual scaffolding that lets a controller defend their architecture choices to a regulator. An LLM proxy that ignores Article 28 is not just incomplete — it is the controller's first finding when an investigation opens. The eight-clause checklist above is a useful self-test. If your current DPA stack misses any of them, the gap is a known-knowable issue that can be fixed before a regulator names it.

Why your LLM proxy is a GDPR data processor (and what that means for you)