Senior Platform Reliability Engineer (Praca zdalna)

Hexjobs ATS

Aplikuj teraz

Senior Platform Reliability Engineer (Praca zdalna)

DataArt Poland

Kraków (+4 innych)

19000-21500 PLN / miesiąc (b2b)

B2B

bash

debian

🚢 kubernetes

linux

🐍 python

ubuntu

💼 B2B

Client

Our client is a division of the global business and financial news and information company, It's a leading market index provider and is the owner and distributor of multiple financial services, a dynamic information network with data, news and analytics including cash, derivatives markets, money markets, government and municipal bonds, currencies, commodities, mortgages, indices, insurance, and legal information.

Join a great company, not merely an individual project

Position overview

We’re seeking a Senior Platform Reliability Engineer to keep our Kubernetes-centric provisioning and Linux estate running smoothly. You’ll coordinate fixes when OS builds or upgrades hit exceptions working across teams to find root causes from logs/metrics and recommend changes.

You’ll automate repeat work (Bash/Python), strengthen runbooks and observability, and document configurations and procedures. You’ll be partnering with hands-on engineers and architects in a highly technical, delivery-focused environment.

Responsibilities

Operate and improve a Kubernetes-centric, open-source platform across provisioning and maintenance workflows.
Coordinate resolution of exceptions in a multi-stage (≈10) provisioning pipeline; engage the right owners with clear, actionable context.
Build and maintain automation and runbooks (Bash/Python) to reduce toil and increase reliability.
Lead triage, log analysis, and root-cause investigation to minimize downtime.
Enhance observability (metrics/logs/traces) and promote SLO-oriented practices.
Operate and tune distributed data stores (e.g., Cassandra) and platform services.
Evolve OS/network provisioning (PXE boot, Subiquity, Foreman, imaging) and server management (BMCs, multi-NIC).
Partner with platform teams to improve automation, performance, security, and cost efficiency.
Document system configurations, procedures, and changes for repeatability.

Requirements

Strong Linux administration and troubleshooting (Ubuntu/Debian preferred).
Production experience with Kubernetes (or similar orchestrator).
Hands-on network/OS provisioning (PXE, Foreman, Subiquity, imaging) and server hardware management (BMCs, multiple NICs).
Proficiency in scripting (Bash, Python) for automation and diagnostics.
Ability to debug across the stack (infrastructure, workloads, automation, networks) and deliver RCA.
Experience with distributed databases (Cassandra or similar).
Familiarity with runbooks, incident management, and SRE/reliability practices.
Clear communicator and process facilitator: knows whom to engage, what signals to collect, and how to drive issues to closure.
CI/CD and IaC mindset (Git and pipelines; Terraform/Ansible a plus).

Nice to have

Observability stacks (Prometheus, Grafana, ELK/EFK, OpenTelemetry).
Workflow systems and retry logic (Argo Workflows, Jenkins).
Python for internal tooling (Go a plus).
Distributed systems fundamentals (consistency, replication, partition tolerance).
Experience operating Cassandra at scale.
Experience with Agile development methodologies.
Experience working with foreign clients.

Wyświetlenia: 5

Zgłoś

Opublikowana	około 18 godzin temu
Wygasa	za 27 dni
Rodzaj umowy	B2B
Źródło

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Senior Platform Reliability Engineer"

Dlaczego nikt nie odpowiada na Twoje CV?

Milczenie jest przytłaczające. Wysyłasz aplikacje jedna po drugiej, ale Twoja skrzynka odbiorcza pozostaje pusta. Nasze AI ujawnia ukryte bariery, które utrudniają Ci dotarcie do rekruterów.

Nie znaleziono ofert, spróbuj zmienić kryteria wyszukiwania.