Question 1

How accurate is text-to-SQL on real-world enterprise databases?

Accepted Answer

Accuracy varies significantly by query complexity. On single-table SELECT queries, frontier models (GPT-4o, Claude Sonnet 4) achieve 90-95% execution accuracy with proper schema injection. Multi-table joins drop accuracy to 75-85%. Queries requiring business logic (e.g., 'active customers' defined as purchased in last 90 days) drop further to 60-75% without domain-specific schema annotations. The Spider and BIRD benchmarks show GPT-4o achieving ~87% on standardized multi-table benchmarks, but enterprise schemas with inconsistent naming and missing documentation typically score 10-20 points lower.

Question 2

How do I prevent users from running destructive queries or accessing sensitive data?

Accepted Answer

Defense in depth: (1) Connect text-to-SQL to a read-only database user with no INSERT/UPDATE/DELETE/DROP permissions at the database level. (2) Parse every generated query with a SQL parser (sqlglot, pg_query) before execution to detect any DML or DDL statements and reject them. (3) Enforce hard row limits (LIMIT 10000) by rewriting the query if no limit is present. (4) Maintain an allowlist of schemas and tables the user role can access and validate the query only references allowed objects. (5) Log all generated queries with the originating user for audit purposes. Never rely solely on the LLM to self-censor dangerous queries.

Question 3

What schema injection strategy works best for large databases?

Accepted Answer

For schemas under 50 tables, inject the full DDL for all tables. For 50-200 tables, inject DDL for all tables but truncate column descriptions and examples. For 200+ tables, use a two-stage approach: first retrieve the top 10-20 relevant tables via vector search on table/column names and descriptions, then inject only those table DDLs with full column descriptions. Critically, augment column names with semantic descriptions in SQL comments (e.g., `cust_acq_dt -- date customer first made a purchase`) and provide 2-3 example values per column — this alone improves accuracy by 15-25% on ambiguous schemas.

Question 4

Should I use text-to-SQL or a BI tool for business user self-service?

Accepted Answer

Text-to-SQL is best for exploratory, ad hoc queries where users ask questions in natural language that they couldn't express as a drag-and-drop query. Traditional BI tools (Looker, Tableau, Metabase) are better for standardized, recurring reports with established metrics definitions, role-based dashboards, and governed metric layers. The optimal architecture for most companies combines both: a semantic layer (dbt metrics, Looker LookML) defining approved business metrics, with a text-to-SQL interface that translates natural language into queries against that semantic layer — combining governance with flexibility.

Question 5

How do I handle ambiguous questions where multiple SQL translations are valid?

Accepted Answer

Surface ambiguity to the user before executing. If the question 'show me top customers' could mean top by revenue, by order count, or by recency, have the LLM ask a clarifying question first. Alternatively, generate 2-3 candidate queries with plain-English explanations and let the user confirm which interpretation is correct. For repeated ambiguities, build a feedback loop where users' selections train a few-shot example library that makes future disambiguation automatic. Track the 20 most common ambiguities in your codebase and add explicit schema annotations to resolve them at the source.

Question 6

What is the latency profile of a text-to-SQL pipeline?

Accepted Answer

A typical pipeline has three latency contributors: (1) schema retrieval via vector search — 50-150ms. (2) LLM query generation — 500-2000ms for Claude Sonnet 4 or GPT-4o. (3) Query execution on the database — 50ms to 30+ seconds depending on query complexity and data volume. For most business intelligence queries, total P50 latency is 1-4 seconds, which is acceptable for an analytical context. If you need sub-second response, pre-generate and cache SQL for the 50 most common question templates, falling back to live generation for novel questions. Use streaming to show the generated SQL immediately while the database query runs.

AI for Natural Language to SQL

The problem

Core workflows

Schema-Injected Query Generation

Query Validation and Safety Layer

Natural Language Results Explanation

RAG-Powered Schema Discovery

Self-Healing Query Retry

Top tools

Top models

FAQs

Related architectures