GDPR Compliance with LLMs in 2026: What Each Provider Actually Does With Your Data

Every enterprise LLM buyer asks the same question: "What does the provider do with my data?" The answers differ significantly by provider, and they matter for GDPR compliance, HIPAA compliance, and enterprise security reviews.

This guide covers what each major provider actually does with your data — not marketing copy, but the specific policies.

What GDPR Requires for LLM Usage

Under GDPR, if you're processing data about EU residents using an LLM API:

Legal basis: You need a lawful basis for processing (typically legitimate interest or contract performance)
Data Processing Agreement (DPA): Required if the provider is a data processor on your behalf
Data residency: Data must not leave the EEA without adequate safeguards (SCCs or adequacy decisions)
Data retention limits: Data should not be retained longer than necessary
Purpose limitation: Data used only for the stated purpose (not training without consent)
Breach notification: Provider must notify you of breaches within 72 hours

The critical question for LLMs: does the provider use your API data to train their models? This determines whether you're sharing personal data for a purpose your users likely didn't consent to.

Anthropic (Claude)

Data Usage Policy

API data is NOT used for training by default.

From Anthropic's usage policy: "We do not train on inputs and outputs from the API by default." This applies to:

All API calls via the Anthropic API
All API calls via AWS Amazon Amazon's Foundation service
All API calls via Google Cloud Vertex AI

Note: Claude.ai (the consumer product) does use conversations to improve models unless you opt out in settings.

DPA Availability

Yes. Anthropic offers a Data Processing Agreement for business customers. Download at: anthropic.com/legal/data-processing-addendum

The DPA covers:

Subprocessor list
International data transfer (SCCs)
Security measures
Breach notification (72h)

Data Residency

Default: US-based processing
Via AWS Amazon Amazon's Foundation service: Can specify AWS regions including eu-west-1 (Ireland), eu-central-1 (Frankfurt)
Via Google Cloud Vertex AI: Can specify GCP regions including EU regions

For strict EU data residency, use Anthropic via Amazon Amazon's Foundation service or Vertex with EU region configured.

BAA (Business Associate Agreement for HIPAA)

Available via AWS Amazon Amazon's Foundation service with AWS's BAA. Direct Anthropic BAA available for enterprise customers — contact sales.

Retention Period

Anthropic retains API inputs/outputs for up to 30 days for trust & safety purposes, then deletes. With a DPA, you can negotiate shorter retention.

Summary: Anthropic

No training on API data: Yes
DPA available: Yes
EU data residency possible: Yes (via Amazon Amazon's Foundation service/Vertex)
HIPAA BAA: Yes (via Amazon Amazon's Foundation service)
SOC 2 Type II: Yes
ISO 27001: Yes

OpenAI

Data Usage Policy

API data is NOT used for training by default — but the history here matters.

OpenAI updated its API data policy in March 2023: API inputs and outputs are not used to train models by default. Before that date, they were. If you were using the OpenAI API before March 2023, your data may have been used for training.

Currently:

API: No training by default
ChatGPT (consumer): Yes, training by default (opt-out available in settings)
ChatGPT Team/Enterprise: No training by default

DPA Availability

Yes. Available at openai.com/policies/data-processing-addendum. The DPA includes:

Standard Contractual Clauses (SCCs)
Subprocessor list
72-hour breach notification
Annual security audits

Azure OpenAI

For many enterprise GDPR requirements, Azure OpenAI is the better choice:

Microsoft has comprehensive EU data residency options
Microsoft's DPA is well-established for enterprise compliance
Data stays in the Azure region you configure
Microsoft processes under your Azure subscription's legal agreements

# Azure OpenAI — data stays in your Azure region
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-01"
)

response = client.chat.completions.create(
    model="gpt-4o",  # Your deployment name
    messages=[{"role": "user", "content": "Hello"}]
)

Data Residency

Direct OpenAI API: US-based processing
Azure OpenAI: EU regions available (West Europe, North Europe, Sweden, etc.)

BAA (HIPAA)

Available via Azure OpenAI with Microsoft's HIPAA BAA.

Summary: OpenAI

No training on API data: Yes
DPA available: Yes
EU data residency: Yes (via Azure)
HIPAA BAA: Yes (via Azure)
SOC 2 Type II: Yes
ISO 27001: Yes

Google (Gemini)

Data Usage Policy

Google's position depends significantly on which product you use:

Google AI API (ai.google.dev): Google may use your data to improve its AI services. This is less appropriate for enterprise GDPR use.

Vertex AI (cloud.google.com/vertex-ai): Data is NOT used for model training by default. This is the enterprise-grade option and what Google recommends for business use.

From Google Cloud DPA: "Customer Data will not be used by Google to train its AI models unless Google has received explicit permission from Customer."

DPA Availability

Yes, via Google Cloud's standard Data Processing Addendum, which applies to all Google Cloud services including Vertex AI.

Data Residency

Vertex AI: Extensive EU region support (eu-central1 Frankfurt, eu-west1 Belgium, eu-west4 Netherlands, etc.)
Can specify that data must stay within EU boundaries

import vertexai
from vertexai.generative_models import GenerativeModel

# Specify EU region
vertexai.init(
    project="your-project",
    location="europe-west1"  # Belgium — data stays in EU
)

model = GenerativeModel("gemini-2.5-pro")
response = model.generate_content("Hello")

BAA (HIPAA)

Available via Google Cloud's HIPAA BAA.

Summary: Google Vertex AI

No training on API data: Yes
DPA available: Yes
EU data residency: Yes (strong EU region support)
HIPAA BAA: Yes
SOC 2 Type II: Yes
ISO 27001: Yes

The Self-Hosting Option

For the strictest data control requirements, self-hosting open-source models eliminates the third-party data concern entirely:

When Self-Hosting Is Warranted

Highly sensitive data (medical records, legal documents, financial data) where no third-party data exposure is acceptable
Regulated industries where auditors require proof that data never left your infrastructure
Organizations that cannot accept any contractual data sharing risk
Very high volume that makes self-hosting economically competitive

Self-Hosting Options

# Option 1: Ollama (local, for development/small scale)
import requests

response = requests.post(
    "http://localhost:11434/api/chat",
    json={
        "model": "llama3.3:70b",
        "messages": [{"role": "user", "content": "Hello"}],
        "stream": False
    }
)
print(response.json()["message"]["content"])

# Option 2: vLLM (production-grade self-hosted inference)
# Deploy: python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.3-70B-Instruct

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="http://your-vllm-server:8000/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[{"role": "user", "content": "Analyze this patient record: ..."}]
)

Self-Hosting Costs

For a rough comparison at 10M tokens/day:

Option

Monthly Cost

Data Risk

Claude Sonnet 4 (API)	~$1,500	Anthropic (with DPA)
Azure OpenAI	~$750	Microsoft (with DPA)
Llama 3.3 70B (self-hosted, A100)	~$500-800	None
Llama 3.3 70B (Groq)	~$700	Groq

Self-hosting becomes cost-competitive at significant scale while providing maximum data control.

Practical GDPR Compliance Checklist

For any LLM integration processing EU resident data:

Legal basis:

[ ] Identified lawful basis for processing (legitimate interest, contract, or consent)
[ ] Privacy policy updated to mention LLM usage

Provider due diligence:

[ ] Signed DPA with your LLM provider
[ ] Confirmed API data is not used for training (in writing)
[ ] Subprocessors reviewed and documented
[ ] Provider has EU data residency option configured (if required)

Data minimization:

[ ] Only sending data necessary for the task (not entire user profiles for simple queries)
[ ] PII stripped or pseudonymized where possible before sending to API
[ ] System prompts don't contain user PII they don't need

Technical controls:

[ ] Logging all API calls for audit trail
[ ] Retention period for your logs reviewed
[ ] Breach detection and notification process in place

For HIPAA/healthcare:

[ ] BAA signed with LLM provider (or using Amazon Amazon's Foundation service/Vertex with existing BAA)
[ ] PHI being sent to API documented in your BAA

the key point

All four major providers (Anthropic, OpenAI, Google, Azure OpenAI) offer:

No training on API data
DPA available
SOC 2 Type II certification
EU data residency options

For strict EU data residency requirements: route Claude through AWS Amazon's Foundation service (EU regions), use Azure OpenAI, or use Google Vertex AI with EU regions.

For the most sensitive data (healthcare, legal, financial): self-hosting open-source models or using one of the cloud provider's dedicated deployment options gives you the strongest technical controls alongside standard contractual protections.

Don't rely on provider marketing — get the DPA signed, confirm data residency in writing, and document your GDPR compliance reasoning. Auditors want documentation, not assurances.

GDPR Compliance with LLMs in 2026: What Each Provider Actually Does With Your Data

What GDPR Requires for LLM Usage

Anthropic (Claude)

Data Usage Policy

DPA Availability

Data Residency

BAA (Business Associate Agreement for HIPAA)

Retention Period

Summary: Anthropic

OpenAI

Data Usage Policy

DPA Availability

Azure OpenAI

Data Residency

BAA (HIPAA)

Summary: OpenAI

Google (Gemini)

Data Usage Policy

DPA Availability

Data Residency

BAA (HIPAA)

Summary: Google Vertex AI

The Self-Hosting Option

When Self-Hosting Is Warranted

Self-Hosting Options

Self-Hosting Costs

Practical GDPR Compliance Checklist

the key point

Related Tools