Senior LLM-MLops Engineer

Hexjobs ATS

Aplikuj teraz

Senior LLM-MLops Engineer

Square One Resources

+6 więcej

26880 - 30240 PLN

B2B

☁️ AWS

💼 B2B

Must have

AWS
DevOps
MLOps
AWS S3
AWS EC2
Amazon EKS
Amazon RDS
PostgreSQL
IAM
AWS Lambda
CloudWatch
Kubernetes
GPU
Autoscaling
Docker
Jenkins
ArgoCD
Python
FastAPI
Django
pandas
NumPy
Machine Learning
scikit-learn
TensorFlow
PyTorch
Data pipelines
Infrastructure as Code
Terraform
CloudFormation
Helm
GitLab CI
Prometheus
Grafana
Splunk
Datadog
Security
Linux
Polish (C2)
English (B2)

Requirements description

Essential Experience:

8+ years of experience in DevOps, Platform Engineering, or Site Reliability Engineering, with at least 2+ years focused on MLOps/LLMOps
Deep hands-on expertise with AWS services, including Bedrock, S3, EC2, EKS, RDS/PostgreSQL, ECR, IAM, Lambda, Step Functions, and CloudWatch
Production experience managing Kubernetes workloads in EKS, including GPU workloads, autoscaling, resource quotas, and multi-tenant configurations
Proficient in container orchestration (Docker, Kubernetes), secrets management, and implementing GitOps-style deployments using Jenkins, ArgoCD, FluxCD, or similar tools
Practical understanding of deploying and scaling LLMs (e.g., GPT and Claude-family models), including prompt engineering, latency/performance tradeoffs, and model evaluation
Strong programming skills in Python (FastAPI, Django, Pydantic, boto3, Pandas, NumPy) with solid computer science fundamentals (performance, concurrency, data structures)
Working knowledge of Machine Learning techniques and frameworks (e.g., scikit-learn, TensorFlow, PyTorch)
Experience building and operating data pipelines with principles of idempotency, retries, backfills, and reproducibility
Expertise in Infrastructure as Code (IaC) using Terraform, CloudFormation, and Helm
Proven track record designing and maintaining CI/CD pipelines with GitLab CI, Jenkins, or similar tools
Observability experience with Prometheus/Grafana, Splunk, Datadog, Loki/Promtail, OpenTelemetry, and Sentry, including implementing sensible alerting strategies
Strong grasp of networking, security concepts, and Linux systems administration
Excellent communication skills with ability to collaborate across development, QA, operations, and product teams
Self-motivated, proactive, with a strong sense of ownership and a passion for removing friction and improving developer experience

Nice to Have:

Experience with distributed compute frameworks such as Dask, Spark, or Ray
Familiarity with NVIDIA Triton, TorchServe, or other inference servers
Experience with ML experiment tracking platforms like Weights & Biases, MLflow, or Kubeflow
FinOps best practices and cost attribution strategies for multi-tenant ML infrastructure
Exposure to multi-region and multi-cloud designs, including dataset replication strategies, compute placement, and latency optimization
Experience with LakeFS, Apache Iceberg, or Delta Lake for data versioning and lakehouse architectures
Knowledge of data transformation tools such as DBT
Experience with data pipeline orchestration tools like Airflow or Prefect
Familiarity with Snowflake or other cloud data warehouses
Understanding of responsible AI practices, model governance, and compliance frameworks

Offer description

The Role:

As a Senior MLOps/LLMOps Engineer, you will be at the forefront of building and scaling our AI/ML infrastructure, bridging the gap between cutting-edge large language models and production-ready systems. You will play a pivotal role in designing, deploying, and operating the platforms that power our AI-driven products, working at the intersection of DevOps, MLOps, and emerging LLM technologies.

In this role, you'll architect robust, scalable infrastructure for deploying and monitoring large language models (LLMs) such as GPT and Claude-family models in AWS Bedrock & AWS AI Foundry, while ensuring security, observability, and reliability across multi-tenant ML workloads. You will collaborate closely with data scientists, ML engineers, platform teams, and product stakeholders to create seamless, self-serve experiences that accelerate AI innovation across the organization.

This is a hands-on leadership role that blends strategic thinking with deep technical execution. You'll own the end-to-end ML platform lifecycle; from infrastructure provisioning and CI/CD automation to model deployment, monitoring, and cost optimization. As a senior technical leader, you'll champion best practices, mentor team members, and drive a culture of continuous improvement, experimentation, and operational excellence.

Your responsibilities

Run and evolve our ML/LLM compute infrastructure on Kubernetes/EKS (CPU/GPU) for multi-tenant workloads, ensuring portability across AWS/Azure AI Foundry regions with region-aware scheduling, cross-region data access, and artifact management
Engage with platform and infrastructure teams to provision and maintain access to cloud environments (AWS, Azure), ensuring seamless integration with existing systems
Setup and maintain deployment workflows for LLM-powered applications, handling environment-specific configurations across development, staging/UAT, and production
Build and operate GitOps-native delivery pipelines using GitLab CI, Jenkins, ArgoCD, Helm, and FluxCD to enable fast, safe rollouts and automated rollbacks
Deploy, scale, and optimize large language models (GPT, Claude, and similar) with deep consideration for prompt engineering, latency/performance tradeoffs, and cost efficiency
Operate and maintain Argo Workflows as reliable, self-serve orchestration platforms for data preparation, model training, evaluation, and large-scale batch compute

show all (11)

Wyświetlenia: 5

Zgłoś

Opublikowana	8 dni temu
Wygasa	za 24 dni
Rodzaj umowy	B2B
Źródło

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Senior LLM-MLops Engineer"

Dlaczego nikt nie odpowiada na Twoje CV?

Milczenie jest przytłaczające. Wysyłasz aplikacje jedna po drugiej, ale Twoja skrzynka odbiorcza pozostaje pusta. Nasze AI ujawnia ukryte bariery, które utrudniają Ci dotarcie do rekruterów.

Nie znaleziono ofert, spróbuj zmienić kryteria wyszukiwania.