Aplikuj teraz

Senior Site Reliability Engineer (Praca zdalna)

DCG

Warszawa, Towarowa
26 000 - 29 500 PLN
Zdalna
B2B
Terraform
🚢 Kubernetes
☁️ AWS
CI/CD
GitHub
CDN
DNS
WAF
Cloudflare
💼 B2B
🌐 Zdalna
Pełny etat

As a recruitment company, DCG understands that every business is powered by experienced professionals. Our management style and partnership approach enable us to meet your needs and provide continuous support. Due to our ongoing growth and the large number of recruitment projects we undertake for our partners, we are currently looking for:Senior Site Reliability EngineerResponsibilities: Design, implement, and scale resilient infrastructure across multiple AWS accounts Manage Kubernetes workloads with Helm, ArgoCD and Terraform — ensuring smooth, auditable deployments Collaborate with product and platform teams to drive SRE best practices (SLIs, SLOs, error budgets) Improve observability with Dynatrace and open-source monitoring tools Optimize Cloudflare WAF, caching, and routing rules to ensure secure, low-latency user experiences Automate infrastructure, deployments, and routine tasks using GitHub Actions and scripting (Python/Bash) Lead incident response and postmortems — turning learnings into measurable improvements Work cross-functionally in English with international teams across Europe and the US  Requirements: 5+ years of DevOps/SRE experience managing production workloads in AWS Strong with Terraform, Helm, ArgoCD, and GitHub Actions Deep understanding of Kubernetes (EKS preferred), including autoscaling, rollout strategies, and cluster troubleshooting Experience with cost optimization and capacity planning Ability to build and maintain observability pipelines (logs, metrics, traces, SLOs, error budgets) Proven ability to design fault-tolerant systems with high availability and performance Solid understanding of CI/CD pipelines and GitOps principles Comfortable optimizing Cloudflare rulesets and understanding DNS, WAF, and CDN flows Hands-on experience with monitoring and alerting tools (Dynatrace, Prometheus, Grafana, etc.) Clear English communication and ability to collaborate with distributed, multicultural teams Strong incident response experience: on-call participation, post-mortems, and RCA writing Curious, pragmatic, and driven by reliability and continuous improvement Nice to have: Share an example where you defined or improved SLOs/SLIs that reduced alert fatigue or downtime Contributed to automation or observability improvements through open-source or internal tooling Ability to automate toil and reduce operational overhead Experience leading reliability reviews or driving postmortem culture across teams Passion for metrics, resilience engineering, and teaching SRE concepts to others  Offer: Private medical care Training & learning opportunities

Wyświetlenia: 5
Opublikowana4 dni temu
Wygasaza 26 dni
Rodzaj umowyB2B
Tryb pracyZdalna
Źródło
Logo

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Senior Site Reliability Engineer"