Alcor
Czym będziesz się zajmować? We are open for seniors, mids, and juniors (of course, then with lower salary), but at least first experience with GPU performance and kernel optimization is must have.RemoteExplore and analyze performance bottlenecks in ML training and inference.Develop and optimize high-performance computing kernels in Triton, CUDA, and/or ROCm.Implement programming solutions in C/C++ and Python.Deep dive into GPU performance optimizations to maximize efficiency and speed.Collaborate with the team to extend and improve existing machine learning compilers or frameworks such as MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. (This is optional but beneficial)
Kogo poszukujemy? Qualifications:Strong programming skills in C/C++ and Python.Deep understanding and experience in GPU performance optimizations.Proven experience with kernel optimizations on CUDA, ROCm, or other accelerators.General experience with the training and deployment of ML modelsExperience with distributed systems development or distributed ML workloadsBachelor's, Masters or PhDs degree in Computer Science, Electrical Engineering, or a related field.Bonus Points:Experience with innovative OSS projects like FlashAttention, mlc-llm, vllm, SGLang.Experience with machine learning compilers or frameworks such as TVM, MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT.
| Opublikowana | 21 dni temu |
| Wygasa | za 10 dni |
| Tryb pracy | Pełny etat |
| Źródło |
Milczenie jest przytłaczające. Wysyłasz aplikacje jedna po drugiej, ale Twoja skrzynka odbiorcza pozostaje pusta. Nasze AI ujawnia ukryte bariery, które utrudniają Ci dotarcie do rekruterów.
Nie znaleziono ofert, spróbuj zmienić kryteria wyszukiwania.