Mid/Senior Data Engineer

Hexjobs ATS

Mid/Senior Data Engineer

CodiLime

+3 mehr

16500 - 27500 PLN

B2B

🐍 Python

💼 B2B

🤖 Airflow

PostgreSQL

☁️ Azure Databricks

Snowflake

SQL

Elasticsearch

Apache Spark

Must have

Snowflake
dbt
Apache Spark
Apache Airflow
Azure Data Factory
OOP
Docker
Kubernetes
CI/CD
SQL
Data Lake
Python
English

Nice to have

Azure Databricks
PostgreSQL
GitHub Actions
FastAPI
ETL
API Gateway
REST API
LLM
Azure OpenAI
Agentic AI

Requirements description

Do we have a match?

As a Data Engineer, you must meet the following criteria:

Strong experience with Snowflake and DBT (must-have)
Experience with data processing frameworks, such as Apache Spark (preferably on Azure Databricks)
Experience with orchestration tools like Apache Airflow, Azure Data Factory (ADF), or similar
Experience with Docker, Kubernetes, and CI/CD practices for data workflows
Strong SQL skills, including experience with query optimization
Experience with large-scale datasets
Very good understanding of data pipeline design concepts and approaches
Experience with data lake architectures for large-scale data processing and analytics
Strong Python coding skills

- Writing clean, scalable, and testable code (unit testing)

- Understanding and applying object-oriented programming (OOP)

Experience with version control systems: Git
Good knowledge of English (minimum C1 level)

Beyond the criteria above, we would appreciate the nice-to-haves:

Experience with PostgreSQL (ideally Azure Database for PostgreSQL)
Experience with GitHub Actions for CI/CD workflows
Experience with API Gateway, FastAPI (REST, async)
Experience with Azure AI Search or AWS OpenSearch
Familiarity with developing ETL/ELT processes (a plus)
Optional but valuable: familiarity with LLMs, Azure OpenAI, or Agentic AI system

Offer description

Get to know us better

CodiLime is a software and network engineering industry expert and the first-choice service partner for top global networking hardware providers, software providers and telecoms. We create proofs-of-concept, help our clients build new products, nurture existing ones and provide services in production environments. Our clients include both tech startups and big players in various industries and geographic locations (US, Japan, Israel, Europe).

While no longer a startup - we have 250+ people on board and have been operating since 2011 we’ve kept our people-oriented culture. Our values are simple:

Act to deliver.
Disrupt to grow.
Team up to win.

The project and the team

The goal of this project is to build a centralized, large-scale business data platform for one of the biggest global consulting firms. The final dataset must be enterprise-level, providing consultants with reliable, easily accessible information to help them quickly and effectively analyze company profiles during Mergers & Acquisitions (M&A) projects.

You will be involved in building data pipelines that ingest, clean, transform, and integrate large datasets from over 10 different data sources, creating a unified database of over 300 million company records. The data must be accurate, well-structured, and optimized for low-latency queries. The platform will support multiple internal applications, enabling efficient search across massive datasets and ensuring that your work has a direct impact on the entire organization.

The data will provide company- and site-level information, including firmographics, technographics, and hierarchical relationships (e.g., GU, DU, subsidiary, site). This platform will serve as a key data backbone for consultants, providing critical metrics such as revenue, CAGR, EBITDA, number of employees, acquisitions, divestitures, competitors, industry classification, web traffic, related brands, and more. Technology stack:

Languages: Python, SQL
Data Stack: Snowflake + DBT, PostgreSQL, Elasticsearch
Processing: Apache Spark on Azure Databricks
Workflow Orchestration: Apache Airflow
Cloud Platform: Microsoft Azure

- Compute / Orchestration: Azure Databricks (Spark clusters), Azure Kubernetes Service (AKS), Azure Functions, Azure API Management.

- Database & Storage: Azure Database for PostgreSQL, Azure Cosmos DB, Azure Blob Storage

- Security & Configuration: Azure Key Vault, Azure App Configuration, Azure Container Registry (ACR)

- Search & Indexing: Azure AI Search

CI/CD: GitHub Actions
Static Code Analysis: SonarQube
AI Integration (Future Phase): Azure OpenAI

What else you should know:

Team Structure:

Data Architecture Lead
Data Engineers
Backend Engineers
DataOps Engineers
Product Owner

Work culture:

Agile, collaborative, and experienced work environment.
As this project will significantly impact the organization, we expect a mature, proactive, and results-oriented approach.
You will work with a distributed team across Europe and India.

We work on multiple interesting projects at the time, so it may happen that we’ll invite you to an interview for another project if we see that your competencies and profile are well suited for it.

More reasons to join us

Flexible working hours and approach to work: fully remotely, in the office or hybrid
Professional growth supported by internal training sessions and a training budget
Solid onboarding with a hands-on approach to give you an easy start
A great atmosphere among professionals who are passionate about their work
The ability to change the project you work on

Your responsibilities

Data Pipeline Development: Designing, building, and maintaining scalable, end-to-end data pipelines for ingesting, cleaning, transforming, and integrating large structured and semi-structured datasets
Optimizing data collection, processing, and storage workflows
Conducting periodic data refresh processes (via data pipelines)
Building a robust ETL infrastructure using SQL technologies.
Assisting with data migration to a new platform
Automating manual workflows and optimizing data delivery

show all (24)

Bericht

Veröffentlicht	vor 21 Tagen
Läuft ab	in 27 Tagen
Art des Vertrags	B2B
Quelle