Site Reliability Engineer

Site Reliability Engineer

Caspian One

Praca zdalna

Cracow
B2B
Praca stała
Java
Grafana
Prometheus
OpenTelemetry
CI/CD
Jenkins
Ansible
Linux
RDBMS
Control-M

Hexjobs Insights

Seeking a Site Reliability Engineer for a risk and analytics platform to ensure uptime, manage incident response, automate processes, and enhance observability across hybrid cloud environments.

Słowa kluczowe

Java
Grafana
Prometheus
OpenTelemetry
CI/CD
Jenkins
Ansible
Linux
RDBMS
Control-M

We’re looking for a seasoned Site Reliability Engineer to support a high‑performance, mission‑critical risk and analytics platform used across global trading and finance environments. You’ll play a key role in ensuring the stability, scalability, and observability of complex distributed systems running across hybrid cloud infrastructure.In this role, you’ll take ownership of production reliability driving incident response, conducting root‑cause analysis, improving monitoring capabilities, and delivering automation that reduces operational toil. You’ll work closely with development teams, platform engineers, and service management leads to strengthen resilience, refine processes, and enhance the engineering culture around availability and performance.This is a hands on technical position suited to someone who thrives in high‑throughput environments, communicates clearly, and enjoys solving deep engineering problems in real time.Core ResponsibilitiesMaintain and improve the reliability, uptime, and performance of distributed applications.Lead incident response, triage complex issues, coordinate recoveries, and deliver structured post‑incident reviews.Enhance observability—designing and evolving monitoring, alerting, logging, and tracing frameworks.Drive continuous improvement across automation, deployment processes, and service stability.Collaborate with cross‑functional teams to influence architecture, design, and operational standards.Support CI/CD pipelines, environment configuration, and vulnerability remediation.Contribute to a knowledge‑driven culture through documentation, tooling, and best‑practice adoption.Required Skills & ExperienceStrong Java background with proven experience supporting or developing distributed systems.Observability tooling expertise (Grafana, Prometheus, Loki, OpenTelemetry or similar).Hands‑on with hybrid cloud environments, ideally with GCP or another major cloud provider.CI/CD and automation experience (e.g., Jenkins, Ansible).Solid understanding of Linux, RDBMS fundamentals, and job schedulers (e.g., Control‑M or equivalents).Strong analytical mindset with a methodical approach to troubleshooting.Excellent communication skills and comfort working in Agile teams.

Wyświetlenia: 3
Opublikowana16 dni temu
Wygasaza 2 miesiące
Rodzaj umowyB2B, Praca stała
Źródło
Logo

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Site Reliability Engineer"

Nie znaleziono ofert, spróbuj zmienić kryteria wyszukiwania.