gdprcomplianceprivacysecurityenterprise

GDPR Compliance with LLMs in 2026: What Each Provider Actually Does With Your Data

GDPR Compliance with LLMs in 2026: What Each Provider Actually Does With Your Data

Every enterprise LLM buyer asks the same question: "What does the provider do with my data?" The answers differ significantly by provider, and they matter for GDPR compliance, HIPAA compliance, and enterprise security reviews.

This guide covers what each major provider actually does with your data — not marketing copy, but the specific policies.

What GDPR Requires for LLM Usage

Under GDPR, if you're processing data about EU residents using an LLM API:

  1. Legal basis: You need a lawful basis for processing (typically legitimate interest or contract performance)
  2. Data Processing Agreement (DPA): Required if the provider is a data processor on your behalf
  3. Data residency: Data must not leave the EEA without adequate safeguards (SCCs or adequacy decisions)
  4. Data retention limits: Data should not be retained longer than necessary
  5. Purpose limitation: Data used only for the stated purpose (not training without consent)
  6. Breach notification: Provider must notify you of breaches within 72 hours

The critical question for LLMs: does the provider use your API data to train their models? This determines whether you're sharing personal data for a purpose your users likely didn't consent to.

Anthropic (Claude)

Data Usage Policy

API data is NOT used for training by default.

From Anthropic's usage policy: "We do not train on inputs and outputs from the API by default." This applies to:

  • All API calls via the Anthropic API
  • All API calls via AWS Amazon Amazon's Foundation service
  • All API calls via Google Cloud Vertex AI

Note: Claude.ai (the consumer product) does use conversations to improve models unless you opt out in settings.

DPA Availability

Yes. Anthropic offers a Data Processing Agreement for business customers. Download at: anthropic.com/legal/data-processing-addendum

The DPA covers:

  • Subprocessor list
  • International data transfer (SCCs)
  • Security measures
  • Breach notification (72h)

Data Residency

  • Default: US-based processing
  • Via AWS Amazon Amazon's Foundation service: Can specify AWS regions including eu-west-1 (Ireland), eu-central-1 (Frankfurt)
  • Via Google Cloud Vertex AI: Can specify GCP regions including EU regions

For strict EU data residency, use Anthropic via Amazon Amazon's Foundation service or Vertex with EU region configured.

BAA (Business Associate Agreement for HIPAA)

Available via AWS Amazon Amazon's Foundation service with AWS's BAA. Direct Anthropic BAA available for enterprise customers — contact sales.

Retention Period

Anthropic retains API inputs/outputs for up to 30 days for trust & safety purposes, then deletes. With a DPA, you can negotiate shorter retention.

Summary: Anthropic

  • No training on API data: Yes
  • DPA available: Yes
  • EU data residency possible: Yes (via Amazon Amazon's Foundation service/Vertex)
  • HIPAA BAA: Yes (via Amazon Amazon's Foundation service)
  • SOC 2 Type II: Yes
  • ISO 27001: Yes

OpenAI

Data Usage Policy

API data is NOT used for training by default — but the history here matters.

OpenAI updated its API data policy in March 2023: API inputs and outputs are not used to train models by default. Before that date, they were. If you were using the OpenAI API before March 2023, your data may have been used for training.

Currently:

  • API: No training by default
  • ChatGPT (consumer): Yes, training by default (opt-out available in settings)
  • ChatGPT Team/Enterprise: No training by default

DPA Availability

Yes. Available at openai.com/policies/data-processing-addendum. The DPA includes:

  • Standard Contractual Clauses (SCCs)
  • Subprocessor list
  • 72-hour breach notification
  • Annual security audits

Azure OpenAI

For many enterprise GDPR requirements, Azure OpenAI is the better choice:

  • Microsoft has comprehensive EU data residency options
  • Microsoft's DPA is well-established for enterprise compliance
  • Data stays in the Azure region you configure
  • Microsoft processes under your Azure subscription's legal agreements

# Azure OpenAI — data stays in your Azure region
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-01"
)

response = client.chat.completions.create(
    model="gpt-4o",  # Your deployment name
    messages=[{"role": "user", "content": "Hello"}]
)

Data Residency

  • Direct OpenAI API: US-based processing
  • Azure OpenAI: EU regions available (West Europe, North Europe, Sweden, etc.)

BAA (HIPAA)

Available via Azure OpenAI with Microsoft's HIPAA BAA.

Summary: OpenAI

  • No training on API data: Yes
  • DPA available: Yes
  • EU data residency: Yes (via Azure)
  • HIPAA BAA: Yes (via Azure)
  • SOC 2 Type II: Yes
  • ISO 27001: Yes

Google (Gemini)

Data Usage Policy

Google's position depends significantly on which product you use:

Google AI API (ai.google.dev): Google may use your data to improve its AI services. This is less appropriate for enterprise GDPR use.

Vertex AI (cloud.google.com/vertex-ai): Data is NOT used for model training by default. This is the enterprise-grade option and what Google recommends for business use.

From Google Cloud DPA: "Customer Data will not be used by Google to train its AI models unless Google has received explicit permission from Customer."

DPA Availability

Yes, via Google Cloud's standard Data Processing Addendum, which applies to all Google Cloud services including Vertex AI.

Data Residency

  • Vertex AI: Extensive EU region support (eu-central1 Frankfurt, eu-west1 Belgium, eu-west4 Netherlands, etc.)
  • Can specify that data must stay within EU boundaries

import vertexai
from vertexai.generative_models import GenerativeModel

# Specify EU region
vertexai.init(
    project="your-project",
    location="europe-west1"  # Belgium — data stays in EU
)

model = GenerativeModel("gemini-2.5-pro")
response = model.generate_content("Hello")

BAA (HIPAA)

Available via Google Cloud's HIPAA BAA.

Summary: Google Vertex AI

  • No training on API data: Yes
  • DPA available: Yes
  • EU data residency: Yes (strong EU region support)
  • HIPAA BAA: Yes
  • SOC 2 Type II: Yes
  • ISO 27001: Yes

The Self-Hosting Option

For the strictest data control requirements, self-hosting open-source models eliminates the third-party data concern entirely:

When Self-Hosting Is Warranted

  • Highly sensitive data (medical records, legal documents, financial data) where no third-party data exposure is acceptable
  • Regulated industries where auditors require proof that data never left your infrastructure
  • Organizations that cannot accept any contractual data sharing risk
  • Very high volume that makes self-hosting economically competitive

Self-Hosting Options

# Option 1: Ollama (local, for development/small scale)
import requests

response = requests.post(
    "http://localhost:11434/api/chat",
    json={
        "model": "llama3.3:70b",
        "messages": [{"role": "user", "content": "Hello"}],
        "stream": False
    }
)
print(response.json()["message"]["content"])

# Option 2: vLLM (production-grade self-hosted inference)
# Deploy: python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.3-70B-Instruct

from openai import OpenAI

client = OpenAI(
    api_key="not-needed",
    base_url="http://your-vllm-server:8000/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[{"role": "user", "content": "Analyze this patient record: ..."}]
)

Self-Hosting Costs

For a rough comparison at 10M tokens/day:

OptionMonthly CostData Risk
Claude Sonnet 4 (API)~$1,500Anthropic (with DPA)
Azure OpenAI~$750Microsoft (with DPA)
Llama 3.3 70B (self-hosted, A100)~$500-800None
Llama 3.3 70B (Groq)~$700Groq

Self-hosting becomes cost-competitive at significant scale while providing maximum data control.

Practical GDPR Compliance Checklist

For any LLM integration processing EU resident data:

Legal basis:

  • [ ] Identified lawful basis for processing (legitimate interest, contract, or consent)
  • [ ] Privacy policy updated to mention LLM usage

Provider due diligence:

  • [ ] Signed DPA with your LLM provider
  • [ ] Confirmed API data is not used for training (in writing)
  • [ ] Subprocessors reviewed and documented
  • [ ] Provider has EU data residency option configured (if required)

Data minimization:

  • [ ] Only sending data necessary for the task (not entire user profiles for simple queries)
  • [ ] PII stripped or pseudonymized where possible before sending to API
  • [ ] System prompts don't contain user PII they don't need

Technical controls:

  • [ ] Logging all API calls for audit trail
  • [ ] Retention period for your logs reviewed
  • [ ] Breach detection and notification process in place

For HIPAA/healthcare:

  • [ ] BAA signed with LLM provider (or using Amazon Amazon's Foundation service/Vertex with existing BAA)
  • [ ] PHI being sent to API documented in your BAA

the key point

All four major providers (Anthropic, OpenAI, Google, Azure OpenAI) offer:

  • No training on API data
  • DPA available
  • SOC 2 Type II certification
  • EU data residency options

For strict EU data residency requirements: route Claude through AWS Amazon's Foundation service (EU regions), use Azure OpenAI, or use Google Vertex AI with EU regions.

For the most sensitive data (healthcare, legal, financial): self-hosting open-source models or using one of the cloud provider's dedicated deployment options gives you the strongest technical controls alongside standard contractual protections.

Don't rely on provider marketing — get the DPA signed, confirm data residency in writing, and document your GDPR compliance reasoning. Auditors want documentation, not assurances.

Your ad here

Related Tools