SkilDock
Ideal for career switchersFor experienced engineers

Data Engineering

Go from developer to production data engineer in 14 weeks.

Duration
14 weeks
Duration
Sessions
21
Sessions
Labs
14
Labs
Projects
3
Projects

What You'll Be Able To Do

After completing this course, you will confidently:

  • Design and implement batch ETL pipelines that process millions of records with Apache Spark
  • Orchestrate complex data workflows with Apache Airflow using DAGs, sensors, and dynamic task mapping
  • Build real-time streaming pipelines with Apache Kafka and Spark Structured Streaming
  • Transform and model data using dbt with testing, documentation, and incremental materialization
  • Design data warehouse schemas in Snowflake and BigQuery using dimensional modeling techniques
  • Implement data quality checks with Great Expectations and automated pipeline monitoring
  • Apply the medallion architecture (bronze, silver, gold) for organizing data lake storage
  • Deploy data pipelines with Docker and manage infrastructure with Terraform

What You'll Build

Real portfolio projects that showcase your skills to employers.

1

Batch ETL Pipeline

Build an ETL pipeline with Airflow and Spark that ingests raw data from S3, cleans and transforms it, loads into a Snowflake data warehouse, and runs data quality checks with Great Expectations.

Apache SparkApache AirflowSnowflakeGreat ExpectationsDocker

Interview value:

Batch ETL is the bread and butter of data engineering interviews. This project demonstrates Spark, Airflow, and data quality skills together.

2

Real-Time Streaming Pipeline

Design a streaming pipeline that consumes events from Kafka, processes them with Spark Structured Streaming, computes real-time aggregations, and writes results to both a data warehouse and a Redis cache for dashboards.

Apache KafkaSpark StreamingRedisPostgreSQLDocker

Interview value:

Streaming is the highest-demand skill in data engineering. This project shows you can build real-time systems with proper windowing and state management.

3

Modern Data Platform (Capstone)

Build a complete data platform using medallion architecture β€” bronze layer (raw ingestion), silver layer (cleaned and enriched), gold layer (business-ready aggregations). Includes dbt transformations, Airflow orchestration, data quality gates, and a monitoring dashboard.

dbtApache AirflowApache SparkSnowflakeDocker Compose

Interview value:

The capstone demonstrates platform-level thinking β€” the ability to design an entire data architecture, not just individual pipelines.

Course Curriculum

14 weeks of structured, hands-on learning.

1Data Engineering Fundamentals
  • Data engineering landscape β€” roles, responsibilities, and career paths
  • Data pipeline architectures β€” ETL, ELT, and streaming
  • Data modeling fundamentals β€” star schema, snowflake schema, OBT
  • Python for data engineering β€” file handling, generators, and parallelism
Lab: Python Data Pipeline β€” CSV to PostgreSQLDocker Lab
2SQL for Data Engineers
  • Advanced SQL β€” window functions, CTEs, and recursive queries
  • Query optimization β€” EXPLAIN plans, indexes, and partitioning
  • DDL design β€” constraints, triggers, and materialized views
  • Data quality SQL patterns β€” null checks, uniqueness, referential integrity
Lab: Advanced SQL β€” Analytics QueriesDocker Lab
3Apache Spark Fundamentals
  • Spark architecture β€” driver, executors, and the DAG scheduler
  • RDDs, DataFrames, and the Spark SQL engine
  • Transformations vs actions and lazy evaluation
  • Reading and writing data β€” CSV, Parquet, JSON, and Delta
Lab: Spark DataFrame OperationsDocker Lab
4Spark Advanced β€” Joins, Aggregations & Optimization
  • Join strategies β€” broadcast, sort-merge, and shuffle hash
  • Aggregations, window functions, and UDFs in Spark
  • Partition tuning, bucketing, and predicate pushdown
  • Spark UI β€” understanding stages, tasks, and shuffle metrics
Lab: Spark ETL Pipeline β€” Large Dataset ProcessingDocker Lab
5Apache Airflow Fundamentals
  • Airflow architecture β€” scheduler, web server, workers, and metadata DB
  • DAG design β€” tasks, dependencies, and trigger rules
  • Operators β€” BashOperator, PythonOperator, and custom operators
  • Connections, variables, and XCom for task communication
Lab: Airflow DAG β€” Orchestrate ETL PipelineDocker Lab
6Airflow Advanced Patterns
  • Dynamic task mapping and TaskFlow API
  • Sensors β€” file sensors, external task sensors, and custom sensors
  • Error handling β€” retries, alerts, SLAs, and callbacks
  • Airflow best practices β€” idempotency, testing, and deployment
Lab: Airflow Advanced β€” Dynamic DAGs & MonitoringDocker Lab
7Apache Kafka & Streaming Fundamentals
  • Kafka architecture β€” brokers, topics, partitions, and replication
  • Producers and consumers β€” serialization, deserialization, offsets
  • Consumer groups, partition assignment, and exactly-once semantics
  • Schema Registry and Avro for data contracts
Lab: Kafka Producer & Consumer PipelineDocker Lab
8Spark Structured Streaming
  • Structured Streaming β€” micro-batch and continuous processing
  • Windowed aggregations β€” tumbling, sliding, and session windows
  • State management and watermarking for late data
  • Kafka to Spark Streaming to data warehouse pipeline
Lab: Real-Time Pipeline β€” Kafka to Spark StreamingDocker Lab
9Cloud Data Warehousing β€” Snowflake & BigQuery
  • Snowflake architecture β€” virtual warehouses, storage, and caching
  • BigQuery architecture β€” serverless, slots, and partitioned tables
  • Loading data β€” COPY INTO, external stages, and streaming ingest
  • Cost optimization β€” warehouse sizing, clustering, and query profiling
Lab: Snowflake Data Loading & Query OptimizationDocker Lab
10dbt β€” Data Transformation
  • dbt project structure β€” models, sources, and seeds
  • Materializations β€” table, view, incremental, and ephemeral
  • Testing β€” schema tests, custom tests, and data contracts
  • Documentation, lineage graphs, and dbt exposures
Lab: dbt Transformation Layer β€” Medallion ArchitectureDocker Lab
11Data Quality & Testing
  • Great Expectations β€” expectations, checkpoints, and data docs
  • Data quality dimensions β€” completeness, accuracy, consistency, timeliness
  • Pipeline testing β€” unit tests, integration tests, and contract tests
  • Data observability and anomaly detection
Lab: Data Quality Pipeline with Great ExpectationsDocker Lab
12Data Lake & Medallion Architecture
  • Data lake vs data warehouse vs data lakehouse
  • Medallion architecture β€” bronze (raw), silver (cleaned), gold (aggregated)
  • Delta Lake β€” ACID transactions, time travel, and schema evolution
  • File formats β€” Parquet, ORC, Avro, and Delta comparison
Lab: Medallion Architecture on Data LakeDocker Lab
13Deployment & Monitoring
  • Dockerizing data pipelines and services
  • CI/CD for data pipelines β€” dbt CI, Airflow deployment
  • Pipeline monitoring β€” SLA tracking, alerting, and runbooks
  • Terraform for data infrastructure provisioning
Lab: Docker + Terraform Data Pipeline DeploymentDocker Lab
14Capstone Project & Interview Preparation
  • End-to-end capstone project execution and presentation
  • Data engineering interview patterns β€” SQL, Spark, system design
  • Common data engineering case studies and whiteboard exercises
  • Portfolio presentation and resume optimization
Lab: Capstone β€” Modern Data PlatformDocker Lab

Hands-On Labs Included

You build these yourself β€” guided exercises with real tools, not passive demos.

Spark DataFrame Operations

Docker Lab

2.5 hours

Apache SparkPythonPySpark

Airflow DAG β€” Orchestrate ETL Pipeline

Docker Lab

2.5 hours

Apache AirflowPythonDocker

Kafka Producer & Consumer Pipeline

Docker Lab

2 hours

Apache KafkaPythonDocker

Real-Time Pipeline β€” Kafka to Spark Streaming

Docker Lab

3 hours

Spark StreamingKafkaDocker

dbt Transformation Layer β€” Medallion Architecture

Docker Lab

2.5 hours

dbtPostgreSQLDocker

Data Quality Pipeline with Great Expectations

Docker Lab

2 hours

Great ExpectationsPythonDocker

Who Is This For?

Career Switchers

Moving from another domain into tech? The structured curriculum and real-world projects bridge the gap between theory and what employers actually look for.

Working Professionals

Already in tech and looking to upskill? Deepen your expertise with production-grade labs and system design patterns used at top companies.

Ideal If You Are:

  • Software developers who want to move into data engineering
  • Data analysts who want to build pipelines instead of just querying data
  • Career switchers with programming experience entering the data space
  • BI developers who want to modernize their skills with Spark, Airflow, and dbt

Prerequisites

  • Basic Python programming (functions, classes, file handling)
  • SQL proficiency (SELECT, JOIN, GROUP BY, subqueries)
  • A laptop with at least 16 GB RAM for Spark and Kafka Docker environments
  • No prior experience with Spark, Airflow, or Kafka required

Career Support Included

We don't just teach you β€” we help you land the job.

Mock Interviews

Practice with real-world interview scenarios. Get feedback on technical depth, communication, and problem-solving approach.

Resume Review

One-on-one review sessions to craft a resume that highlights your projects, skills, and achievements the right way.

Portfolio Coaching

Guidance on presenting your course projects as professional portfolio pieces that stand out to hiring managers.

LinkedIn Optimization

Tips and templates to optimize your LinkedIn profile so recruiters find you and reach out.

Learn from Industry Practitioners

Our instructors are working professionals who build production systems daily. They bring real-world experience, battle-tested patterns, and the kind of practical insight that textbooks can't teach.

Course Details

FormatLive Online
Duration14 weeks
Schedule21 sessions
Batch SizeMax 15 students
CertificateYes, on completion
Lab SetupDocker-based (runs on your laptop)
PriceEnquire for pricing

Frequently Asked Questions

Will I get a job after completing this program?

Data engineering is one of the fastest-growing specializations in tech. Every company building data infrastructure or AI systems needs data engineers. Our curriculum covers the exact tools and practices hiring managers evaluate β€” Spark, Airflow, Kafka, dbt, and cloud warehouses. While we cannot guarantee placement, graduates are well-prepared for data engineering interviews.

Do I need experience with Spark or Kafka?

No. We teach every tool from fundamentals. You need basic Python and SQL skills, but all Spark, Airflow, Kafka, and dbt concepts are taught from the ground up.

Is this different from the Data Science course?

Yes. Data science focuses on building ML models and statistical analysis. Data engineering focuses on building the pipelines that collect, clean, and deliver data to those models. Think of it this way: data engineers build the roads, data scientists drive the cars.

Do I need cloud accounts for Snowflake or BigQuery?

Snowflake offers a free trial that covers the course labs. BigQuery offers a generous free tier. Most other labs run entirely in Docker on your local machine at zero cost.

Is 16 GB RAM really necessary?

We strongly recommend 16 GB for running Spark and Kafka clusters locally in Docker. With 8 GB you can complete most labs but may need to reduce cluster sizes for some exercises.

What if I miss a live session?

All sessions are recorded and available on the student portal within 24 hours. The instructor and TAs are available on Slack for questions.

Ready to Start Your Data Engineering Journey?

Talk to us to learn about upcoming batches, pricing, and payment plans.