📊 Data & Analysisadvancedetldata-pipelinedata-engineeringdbt

Design an ETL Pipeline

Design a complete ETL pipeline architecture with transformation logic, error handling, and monitoring.

The Prompt

prompt.txt
Design an ETL pipeline for the following data flow. Provide:
1. Pipeline architecture diagram (text-based)
2. Extract: source systems, extraction method, frequency
3. Transform: cleaning, joining, business logic rules
4. Load: target destination, load strategy (full/incremental/upsert)
5. Error handling: what to do with malformed or missing data
6. Monitoring: what to alert on
7. Technology recommendations with trade-off notes
8. Code skeleton for the transform step

Data pipeline details:
- Source: [WHERE DATA COMES FROM]
- Target: [WHERE DATA SHOULD GO]
- Transformation needed: [WHAT NEEDS TO HAPPEN TO THE DATA]
- Frequency: [BATCH / NEAR-REAL-TIME / STREAMING]
- Volume: [ROWS PER DAY / GB]
- Tech constraints: [EXISTING STACK OR PREFERENCES]

Example Output

Architecture: PostgreSQL → dbt transform → BigQuery → Looker. Extraction: daily cron job using pg_dump → GCS staging bucket. Transform: 4 dbt models with incremental materialization, handling SCD Type 2 for user attributes. Load: BigQuery MERGE statement for upsert. Alert on: row count drop >10%, transform failures, load latency >30 min.

FAQ

Which AI model is best for Design an ETL Pipeline?

Claude Sonnet 4 — excellent at data engineering architecture and code scaffolding.

How do I use the Design an ETL Pipeline prompt?

Copy the prompt, replace the [BRACKETED] placeholders with your specific information, and paste into your preferred AI assistant (ChatGPT, Claude, Gemini, etc.). Architecture: PostgreSQL → dbt transform → BigQuery → Looker. Extraction: daily cron job using pg_dump → GCS staging bucket. Transform: 4 dbt models with incremental materialization, handling SCD Type 2 for user attributes. Load: BigQuery MERGE statement for upsert. Alert on: row count drop >10%, transform failures, load latency >30 min.