Claim chain with signed-receipt curve
All posts
GDPRDPAdata-processorLLM-architecturecompliance

Why your LLM proxy is a GDPR data processor (and what that means for you)

Most teams treat the LLM provider as a vendor and skip the data-processing agreement. GDPR Article 28 disagrees. Here's why your LLM proxy is structurally a processor — and what your DPA stack actually has to contain.

Lucairn··10 min read
On this page
  1. The default mistake
  2. Controller vs processor — applied to LLM proxies
  3. Article 28 paragraph 3 — the binding clauses
  4. Common gaps
  5. Schrems II and the LLM stack
  6. A working DPA stack for an EU AI deployment
  7. Where Lucairn fits

The default mistake

A pattern recurs in almost every LLM architecture review we run. A team has built a customer-facing application that calls an upstream LLM provider — Anthropic, OpenAI, sometimes both. The user's prompt contains personal data, often identifiable: a name, an email address, occasionally a customer ID number. The application forwards the prompt to the model. The model returns an answer. The team stores the conversation log.

We ask to see the data-processing agreement with the LLM provider. The reply is some variation of: "We agreed to their terms of service. The processor agreement is included by reference, I think."

That answer is the violation. GDPR Article 28(3) requires that processing by a processor "shall be governed by a contract or other legal act under Union or Member State law, that is binding on the processor with regard to the controller and that sets out the subject-matter and duration of the processing, the nature and purpose of the processing, the type of personal data and categories of data subjects, and the obligations and rights of the controller." A box on a sign-up form titled "I agree to the Terms of Service" rarely satisfies this.

The error is not just paperwork. It is a structural misclassification. The team is treating the LLM provider as a vendor that supplies a tool. Article 28 treats it as a processor that handles personal data on the controller's behalf. The two relationships have very different obligations attached to them. Mistaking the first for the second is one of the most common findings in LLM-architecture compliance reviews under GDPR.

Controller vs processor — applied to LLM proxies

Article 4(7) of the GDPR defines a controller as "the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data." Article 4(8) defines a processor as "a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller."

Apply this to an LLM proxy that sits between your customer and an upstream LLM provider. Three distinct roles emerge:

Already there are two Article 28 contracts required: one between your customer and you, and one between you and the LLM provider. Article 28(2) explicitly addresses sub-processors: "the processor shall not engage another processor without prior specific or general written authorisation of the controller." Your customer has to be told who their sub-processors are, and they have to be able to object.

There is a second wrinkle that catches many teams. When the proxy itself decides which LLM model to call, what system prompt to apply, what filtering to run — it is making decisions about means of processing. The European Data Protection Board's guidelines on the concepts of controller and processor (Guidelines 07/2020) are explicit: a processor that determines the essential means of processing is no longer a pure processor; it is at minimum a joint controller for those decisions.

For pure pass-through proxies — the request is forwarded verbatim to the configured upstream model — this stays cleanly within the processor role. For proxies that add system prompts, run sanitization, choose models based on content type, or apply post-processing rules, the role becomes more nuanced. The defensive position is to document those decisions transparently in the customer-facing data-processing agreement so the controller is not surprised.

Article 28 paragraph 3 — the binding clauses

GDPR Article 28(3) lists eight specific obligations that must be in the controller-to-processor contract. Not "should." Not "may." "Shall." Each one maps to a concrete clause in the DPA stack:

A DPA missing any of these eight is non-compliant. A DPA listing all eight in vague language ("the processor will assist") and with no operational mechanism to back the language up is also non-compliant in any meaningful sense — Article 28(3)(h) requires the controller be able to demonstrate compliance, which means the operational mechanism must exist somewhere.

Common gaps

Reviewing real LLM-stack DPAs, the same handful of gaps appear repeatedly.

Data localisation absent. The DPA does not say where the personal data will be processed. For an EU controller using a US-headquartered LLM provider, this is a Schrems II problem before any processor obligation question even arises. Article 28(3)(a)'s "documented instructions" should include geographical scope of processing — where the controller has decided the data is allowed to go.

Sub-processor list stale. The DPA includes a sub-processor list, but it was drafted six months ago. The LLM provider has since added a new infrastructure dependency — perhaps a new region of a cloud provider, perhaps a new content-safety partner. The controller was not notified. Article 28(2) requires prior authorisation; a stale list is a violation in slow motion.

No breach-notification SLA. Article 33 obliges the controller to notify the supervisory authority of a personal-data breach without undue delay and where feasible within 72 hours. The processor's contract has to give the controller time to do that. A processor agreement that says "we will notify the controller of a breach as soon as practicable" leaves no headroom for the controller to comply with their 72-hour clock.

"Training opt-out" not actually wired. Many LLM provider terms include language about not training on customer data without consent. The contractual clause exists. The technical implementation is sometimes uncertain. A controller cannot rely on a clause without being able to verify the technical mechanism that enforces it. This is one of the questions a competent regulator will ask, and the controller has to have the answer.

Joint-controllership ambiguity. Where the proxy itself adds material processing logic (sanitization, content filtering, model selection), the DPA may need to be redrafted as a joint-controller agreement under Article 26 for that subset of decisions. Most templates do not anticipate this. A clean architecture review classifies each processing decision and routes it to the correct legal instrument.

Schrems II and the LLM stack

The Court of Justice of the EU's 2020 ruling in Data Protection Commissioner v. Facebook Ireland and Maximillian Schrems (Case C-311/18, "Schrems II") invalidated the Privacy Shield framework that had governed US-EU data transfers. The replacement EU-US Data Privacy Framework adopted in 2023 is in force, but the underlying analytical structure remains: an EU controller transferring personal data to a third country has to assess whether the third-country legal regime offers protection essentially equivalent to that of the GDPR.

For an LLM stack, this means: every API call from your gateway to a US-based LLM provider is a third-country transfer of whatever personal data is in the prompt. Standard contractual clauses (SCCs) help. They are necessary. They are not sufficient if the data being transferred is identifiable and the third-country government has surveillance authorities that materially override the protections in the SCC.

This is where pseudonymisation upstream of the LLM call genuinely changes the calculus. The European Data Protection Board's guidelines on supplementary measures (Recommendations 01/2020) are explicit that pseudonymisation, performed before transfer in such a way that the recipient cannot re-identify data subjects, can be an effective supplementary measure. The architectural test is whether the recipient — including any government with lawful access to the recipient's systems — can identify a natural person from what they receive. If the answer is no, the data being transferred is no longer "personal data" in the same sense from the third-country recipient's perspective, and the Schrems II analysis becomes considerably more favourable.

This is not a loophole. It is the regulatory-design rationale behind GDPR Article 25 ("data protection by design") working as intended. If the architecture ensures that identifiability does not cross the transfer boundary, the transfer risk profile changes. We have written more on this in why-data-redaction-fails — redaction alone is not the same thing as architectural pseudonymisation, and the difference matters under Schrems II.

A working DPA stack for an EU AI deployment

A working DPA stack for a typical EU AI deployment looks like this:

The LLM-specific addition is being explicit about model providers as sub-processors, about the geographical path of each request, and about which decisions in the proxy stack qualify as "essential means" under EDPB Guidelines 07/2020.

Where Lucairn fits

Pseudonymisation upstream of the LLM call is the architectural primitive that makes the rest of the DPA stack tractable. When the proxy strips identifiable elements from the prompt before the call leaves your boundary — substituting [PERSON_1], [EMAIL_1], [IBAN_1] for the underlying values — you do not need to extend the Article 28 trust chain to the LLM provider for those data elements. The LLM provider is processing tokens, not personal data, for the parts that have been pseudonymised. That changes the contract surface.

The Lucairn gateway runs that sanitization layer at the boundary. The cryptographic re-linkage state sits in a separate sandbox the LLM model has no network path to. The integrity of the redaction is captured in a signed manifest that becomes part of the audit record under Article 12 of the EU AI Act. When a regulator asks how the controller can be sure the LLM provider did not see the underlying personal data, the answer is structural rather than contractual: the gateway emitted a signed manifest stating which fields were redacted, the manifest is anchored in a public transparency log, and the LLM provider has no key material that would let it reverse the redaction even if it tried.

This does not eliminate the DPA stack. You still need the customer DPA, the sub-processor agreement with the LLM provider, and the rest of the Article 28 chain. What it does is shrink the trust surface of each of those agreements. The DPA with the LLM provider covers what the LLM provider actually sees — which, for sanitized fields, is an opaque token. The Schrems II analysis on that path improves accordingly.

For details on the operational architecture supporting this, see /private-ai-inference and /security. For the legal framing of how this maps to the broader GDPR Article 25 obligation, see GDPR Article 25 in practice.

GDPR Article 28 is not paperwork. It is the contractual scaffolding that lets a controller defend their architecture choices to a regulator. An LLM proxy that ignores Article 28 is not just incomplete — it is the controller's first finding when an investigation opens. The eight-clause checklist above is a useful self-test. If your current DPA stack misses any of them, the gap is a known-knowable issue that can be fixed before a regulator names it.

Related reading