Job Description
What success looks like in this role:
End-to-End Pipeline Engineering: Build and automate robust ETL/ELT pipelines using Azure Data Factory (ADF), AWS Glue, and Apache Airflow.
ยท Distributed Computing: Develop large-scale data processing jobs using PySpark and Scala within Databricks or EMR environments.
ยท Streaming & Real-time Integration: Design and implement real-time data ingestion and processing layers using Apache Kafka, Confluent, or AWS Kinesis.
ยท Data Lakehouse : Manage and optimize cloud storage using ADLS Gen2 and S3, implementing ACID transactions with Delta Lake or Apache Iceberg.
ยท Advanced Data Modeling: Design highly performant schemas for cloud data warehouses like Snowflake, Amazon Redshift, or Google BigQuery.
ยท Data Transformation & Quality: Use dbt (data build tool) for modeling and implement automated quality checks using Great Expectations or Soda.
ยท Infrastructure & CI/CD: Deploy and manage data infras...