Data Engineering
Go from developer to production data engineer in 14 weeks.
- Duration
- 14 weeks
- Duration
- Sessions
- 21
- Sessions
- Labs
- 14
- Labs
- Projects
- 3
- Projects
What You'll Be Able To Do
After completing this course, you will confidently:
- Design and implement batch ETL pipelines that process millions of records with Apache Spark
- Orchestrate complex data workflows with Apache Airflow using DAGs, sensors, and dynamic task mapping
- Build real-time streaming pipelines with Apache Kafka and Spark Structured Streaming
- Transform and model data using dbt with testing, documentation, and incremental materialization
- Design data warehouse schemas in Snowflake and BigQuery using dimensional modeling techniques
- Implement data quality checks with Great Expectations and automated pipeline monitoring
- Apply the medallion architecture (bronze, silver, gold) for organizing data lake storage
- Deploy data pipelines with Docker and manage infrastructure with Terraform
What You'll Build
Real portfolio projects that showcase your skills to employers.
Batch ETL Pipeline
Build an ETL pipeline with Airflow and Spark that ingests raw data from S3, cleans and transforms it, loads into a Snowflake data warehouse, and runs data quality checks with Great Expectations.
Interview value:
Batch ETL is the bread and butter of data engineering interviews. This project demonstrates Spark, Airflow, and data quality skills together.
Real-Time Streaming Pipeline
Design a streaming pipeline that consumes events from Kafka, processes them with Spark Structured Streaming, computes real-time aggregations, and writes results to both a data warehouse and a Redis cache for dashboards.
Interview value:
Streaming is the highest-demand skill in data engineering. This project shows you can build real-time systems with proper windowing and state management.
Modern Data Platform (Capstone)
Build a complete data platform using medallion architecture β bronze layer (raw ingestion), silver layer (cleaned and enriched), gold layer (business-ready aggregations). Includes dbt transformations, Airflow orchestration, data quality gates, and a monitoring dashboard.
Interview value:
The capstone demonstrates platform-level thinking β the ability to design an entire data architecture, not just individual pipelines.
Course Curriculum
14 weeks of structured, hands-on learning.
1Data Engineering Fundamentals
- Data engineering landscape β roles, responsibilities, and career paths
- Data pipeline architectures β ETL, ELT, and streaming
- Data modeling fundamentals β star schema, snowflake schema, OBT
- Python for data engineering β file handling, generators, and parallelism
2SQL for Data Engineers
- Advanced SQL β window functions, CTEs, and recursive queries
- Query optimization β EXPLAIN plans, indexes, and partitioning
- DDL design β constraints, triggers, and materialized views
- Data quality SQL patterns β null checks, uniqueness, referential integrity
3Apache Spark Fundamentals
- Spark architecture β driver, executors, and the DAG scheduler
- RDDs, DataFrames, and the Spark SQL engine
- Transformations vs actions and lazy evaluation
- Reading and writing data β CSV, Parquet, JSON, and Delta
4Spark Advanced β Joins, Aggregations & Optimization
- Join strategies β broadcast, sort-merge, and shuffle hash
- Aggregations, window functions, and UDFs in Spark
- Partition tuning, bucketing, and predicate pushdown
- Spark UI β understanding stages, tasks, and shuffle metrics
5Apache Airflow Fundamentals
- Airflow architecture β scheduler, web server, workers, and metadata DB
- DAG design β tasks, dependencies, and trigger rules
- Operators β BashOperator, PythonOperator, and custom operators
- Connections, variables, and XCom for task communication
6Airflow Advanced Patterns
- Dynamic task mapping and TaskFlow API
- Sensors β file sensors, external task sensors, and custom sensors
- Error handling β retries, alerts, SLAs, and callbacks
- Airflow best practices β idempotency, testing, and deployment
7Apache Kafka & Streaming Fundamentals
- Kafka architecture β brokers, topics, partitions, and replication
- Producers and consumers β serialization, deserialization, offsets
- Consumer groups, partition assignment, and exactly-once semantics
- Schema Registry and Avro for data contracts
8Spark Structured Streaming
- Structured Streaming β micro-batch and continuous processing
- Windowed aggregations β tumbling, sliding, and session windows
- State management and watermarking for late data
- Kafka to Spark Streaming to data warehouse pipeline
9Cloud Data Warehousing β Snowflake & BigQuery
- Snowflake architecture β virtual warehouses, storage, and caching
- BigQuery architecture β serverless, slots, and partitioned tables
- Loading data β COPY INTO, external stages, and streaming ingest
- Cost optimization β warehouse sizing, clustering, and query profiling
10dbt β Data Transformation
- dbt project structure β models, sources, and seeds
- Materializations β table, view, incremental, and ephemeral
- Testing β schema tests, custom tests, and data contracts
- Documentation, lineage graphs, and dbt exposures
11Data Quality & Testing
- Great Expectations β expectations, checkpoints, and data docs
- Data quality dimensions β completeness, accuracy, consistency, timeliness
- Pipeline testing β unit tests, integration tests, and contract tests
- Data observability and anomaly detection
12Data Lake & Medallion Architecture
- Data lake vs data warehouse vs data lakehouse
- Medallion architecture β bronze (raw), silver (cleaned), gold (aggregated)
- Delta Lake β ACID transactions, time travel, and schema evolution
- File formats β Parquet, ORC, Avro, and Delta comparison
13Deployment & Monitoring
- Dockerizing data pipelines and services
- CI/CD for data pipelines β dbt CI, Airflow deployment
- Pipeline monitoring β SLA tracking, alerting, and runbooks
- Terraform for data infrastructure provisioning
14Capstone Project & Interview Preparation
- End-to-end capstone project execution and presentation
- Data engineering interview patterns β SQL, Spark, system design
- Common data engineering case studies and whiteboard exercises
- Portfolio presentation and resume optimization
Hands-On Labs Included
You build these yourself β guided exercises with real tools, not passive demos.
Spark DataFrame Operations
Docker Lab2.5 hours
Airflow DAG β Orchestrate ETL Pipeline
Docker Lab2.5 hours
Kafka Producer & Consumer Pipeline
Docker Lab2 hours
Real-Time Pipeline β Kafka to Spark Streaming
Docker Lab3 hours
dbt Transformation Layer β Medallion Architecture
Docker Lab2.5 hours
Data Quality Pipeline with Great Expectations
Docker Lab2 hours
Who Is This For?
Career Switchers
Moving from another domain into tech? The structured curriculum and real-world projects bridge the gap between theory and what employers actually look for.
Working Professionals
Already in tech and looking to upskill? Deepen your expertise with production-grade labs and system design patterns used at top companies.
Ideal If You Are:
- Software developers who want to move into data engineering
- Data analysts who want to build pipelines instead of just querying data
- Career switchers with programming experience entering the data space
- BI developers who want to modernize their skills with Spark, Airflow, and dbt
Prerequisites
- Basic Python programming (functions, classes, file handling)
- SQL proficiency (SELECT, JOIN, GROUP BY, subqueries)
- A laptop with at least 16 GB RAM for Spark and Kafka Docker environments
- No prior experience with Spark, Airflow, or Kafka required
Career Support Included
We don't just teach you β we help you land the job.
Mock Interviews
Practice with real-world interview scenarios. Get feedback on technical depth, communication, and problem-solving approach.
Resume Review
One-on-one review sessions to craft a resume that highlights your projects, skills, and achievements the right way.
Portfolio Coaching
Guidance on presenting your course projects as professional portfolio pieces that stand out to hiring managers.
LinkedIn Optimization
Tips and templates to optimize your LinkedIn profile so recruiters find you and reach out.
Learn from Industry Practitioners
Our instructors are working professionals who build production systems daily. They bring real-world experience, battle-tested patterns, and the kind of practical insight that textbooks can't teach.
Course Details
| Format | Live Online |
|---|---|
| Duration | 14 weeks |
| Schedule | 21 sessions |
| Batch Size | Max 15 students |
| Certificate | Yes, on completion |
| Lab Setup | Docker-based (runs on your laptop) |
| Price | Enquire for pricing |
Frequently Asked Questions
Will I get a job after completing this program?
Data engineering is one of the fastest-growing specializations in tech. Every company building data infrastructure or AI systems needs data engineers. Our curriculum covers the exact tools and practices hiring managers evaluate β Spark, Airflow, Kafka, dbt, and cloud warehouses. While we cannot guarantee placement, graduates are well-prepared for data engineering interviews.
Do I need experience with Spark or Kafka?
No. We teach every tool from fundamentals. You need basic Python and SQL skills, but all Spark, Airflow, Kafka, and dbt concepts are taught from the ground up.
Is this different from the Data Science course?
Yes. Data science focuses on building ML models and statistical analysis. Data engineering focuses on building the pipelines that collect, clean, and deliver data to those models. Think of it this way: data engineers build the roads, data scientists drive the cars.
Do I need cloud accounts for Snowflake or BigQuery?
Snowflake offers a free trial that covers the course labs. BigQuery offers a generous free tier. Most other labs run entirely in Docker on your local machine at zero cost.
Is 16 GB RAM really necessary?
We strongly recommend 16 GB for running Spark and Kafka clusters locally in Docker. With 8 GB you can complete most labs but may need to reduce cluster sizes for some exercises.
What if I miss a live session?
All sessions are recorded and available on the student portal within 24 hours. The instructor and TAs are available on Slack for questions.
Explore Related Courses
Continue your learning journey with these complementary courses.
Data Science & Machine Learning
Go from spreadsheet analyst to ML engineer in 12 weeks.
Cloud DevOps Engineering
Go from zero to production-ready DevOps engineer in 18 weeks.
Data Analytics Accelerator
Go from Excel user to data analyst in 8 weeks.
Ready to Start Your Data Engineering Journey?
Talk to us to learn about upcoming batches, pricing, and payment plans.