Senior Data Engineer

Technologies we use

About the project

Your responsibilities

Design and build scalable data pipelines for ingestion, transformation, and serving of structured and unstructured data — supporting both batch and real-time AI workloads.
Develop and maintain Databricks-based data processing workflows: Delta Lake table management, PySpark transformations, notebook orchestration, and Unity Catalog governance.
Architect and optimise PostgreSQL data models: schema design, indexing strategies, partitioning, query performance tuning, and integration patterns for AI service consumption.
Build and maintain data orchestration workflows using Apache Airflow, Databricks Workflows, or equivalent — ensuring reliable scheduling, dependency management, and failure recovery.
Implement data quality frameworks: validation rules, anomaly detection, data contracts, and automated alerting on pipeline health and data freshness.
Design and manage feature engineering pipelines: transforming raw data into ML-ready feature sets, integrating with feature stores, and versioning feature definitions.
Own data integration patterns between operational PostgreSQL databases and the Databricks lakehouse: CDC (Change Data Capture), event-driven ingestion via Kafka, and batch export strategies.
Implement data governance standards: lineage tracking, cataloguing, access control, PII handling, data retention policies, and audit logging.
Collaborate with ML Engineers to design and deliver data pipelines supporting model training, batch inference, and real-time feature serving.
Monitor and operate data infrastructure: pipeline observability dashboards, SLA tracking, incident response, and root-cause analysis for data issues.
Champion Claude Code as an active daily tool for pipeline development, SQL generation, data exploration scripting, and documentation.

Our requirements

6+ years of professional data engineering experience, with a strong track record of delivering production data pipelines at scale.
Expert-level SQL and strong PostgreSQL expertise: advanced query optimisation, schema design, indexing, partitioning, and understanding of MVCC and connection management.
Strong Databricks experience: Delta Lake, PySpark, Databricks Workflows, Unity Catalog, and performance tuning of large-scale Spark jobs.
Proficiency in Python for data pipeline development: pandas, PySpark, data validation libraries (Great Expectations or equivalent), and scripting for automation.
Experience with data orchestration frameworks: Apache Airflow, Databricks Workflows, or equivalent DAG-based scheduling tools.
Solid understanding of data integration patterns: CDC with Debezium or equivalent, Kafka-based event streaming, and batch ingestion strategies.
Hands-on experience with data lakehouse architecture: medallion architecture (Bronze/Silver/Gold), Delta Lake ACID transactions, and table optimisation.
Experience implementing data quality frameworks and data contracts in production pipelines.
Familiarity with Azure data services: Azure Data Factory, Azure Event Hubs, Azure Data Lake Storage, or equivalent cloud-native data tooling.
Hands-on proficiency with Claude Code: using it daily for pipeline development, SQL authoring, data exploration, and documentation tasks.
Strong communication skills: able to collaborate with data consumers (ML Engineers, analysts, product teams) to understand requirements and deliver reliable data products.

This is how we organize our work

This is how we work

This is how we work on a project

What we offer

Contract under Polish law: B2B or Umowa o Pracę
Benefits such as private medical care, group insurance, Multisport card
English classes available
Hybrid work (at least 1 day/week on-site) in Warsaw (Mokotów)
Opportunity to work with excellent professionals
High standards of work and focus on the quality of code
New technologies in use
Continuously learning and growth
International team
Pinball, PlayStation & much more (on-site)

Opublikowana	3 dni temu
Wygasa	za 27 dni
Rodzaj umowy	B2B
Tryb pracy	Hybrydowa
Źródło

Senior Data Engineer

Status

Hexjobs Insights

Słowa kluczowe

Technologies we use

About the project

Your responsibilities

Our requirements

This is how we organize our work

This is how we work

This is how we work on a project

What we offer

Benefits

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Senior Data Engineer"

Status