Site Reliability Engineer (Praca zdalna)

We’re looking for a seasoned Site Reliability Engineer to support a high‑performance, mission‑critical risk and analytics platform used across global trading and finance environments. You’ll play a key role in ensuring the stability, scalability, and observability of complex distributed systems running across hybrid cloud infrastructure.In this role, you’ll take ownership of production reliability driving incident response, conducting root‑cause analysis, improving monitoring capabilities, and delivering automation that reduces operational toil. You’ll work closely with development teams, platform engineers, and service management leads to strengthen resilience, refine processes, and enhance the engineering culture around availability and performance.This is a hands on technical position suited to someone who thrives in high‑throughput environments, communicates clearly, and enjoys solving deep engineering problems in real time.Core ResponsibilitiesMaintain and improve the reliability, uptime, and performance of distributed applications.Lead incident response, triage complex issues, coordinate recoveries, and deliver structured post‑incident reviews.Enhance observability—designing and evolving monitoring, alerting, logging, and tracing frameworks.Drive continuous improvement across automation, deployment processes, and service stability.Collaborate with cross‑functional teams to influence architecture, design, and operational standards.Support CI/CD pipelines, environment configuration, and vulnerability remediation.Contribute to a knowledge‑driven culture through documentation, tooling, and best‑practice adoption.Required Skills & ExperienceStrong Java background with proven experience supporting or developing distributed systems.Observability tooling expertise (Grafana, Prometheus, Loki, OpenTelemetry or similar).Hands‑on with hybrid cloud environments, ideally with GCP or another major cloud provider.CI/CD and automation experience (e.g., Jenkins, Ansible).Solid understanding of Linux, RDBMS fundamentals, and job schedulers (e.g., Control‑M or equivalents).Strong analytical mindset with a methodical approach to troubleshooting.Excellent communication skills and comfort working in Agile teams.

Opublikowana	16 dni temu
Wygasa	za 2 miesiące
Rodzaj umowy	B2B, Praca stała
Źródło

Site Reliability Engineer

Status

Hexjobs Insights

Słowa kluczowe

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Site Reliability Engineer"

Status