
Senior Applied Research Engineer | Barcelona | Up to €150k
WILTSHIRE,, United Kingdom
Apply by 19 Jul 2026
UK £150,000
Job Ref.: 57191
Stellenbeschreibung
Senior Applied Research Engineer | Barcelona, Spain
We are partnered with a cutting-edge AI company shaping the future of enterprise decision-making. Founded by experienced technologists from leading research environments, the firm has developed a market-leading platform purpose-built for the structured data that underpins critical business decisions.
Backed by top-tier investors and trusted by some of the world’s largest organisations, the company helps enterprises unlock significant value by enabling more accurate, forward-looking decision-making.
You will work on novel technical challenges in large-scale model development and contribute to technology that is changing how major organisations operate. This is an opportunity to join a category-defining company at an early stage and help shape its trajectory.
Location & compensation
- Location: Barcelona, Spain
- Salary: Up to €150,000 (plus equity)
- Industry: Technology
Key responsibilities
- Profile end-to-end distributed training runs to identify bottlenecks across compute, GPU memory, and inter-GPU communication.
- Influence architectural decisions to improve efficiency and reliability of large-scale training jobs, including developing Triton/CUDA kernels when needed.
- Design and implement model scaling, parallelisation, and memory optimisation techniques for training workloads with very large context sizes.
- Collaborate closely with ML Researchers to diagnose architectural inefficiencies, ensure new research ideas scale efficiently in practice, and share internal knowledge on optimisation.
- Drive productionisation and serving of models from the research side, including improving inference efficiency via techniques such as quantisation.
Must have
- Strong understanding of modern ML architectures and large-scale training pipelines.
- Hands-on experience running distributed training jobs on multi-GPU systems.
- Advanced profiling and debugging across CPU, GPU, memory usage, latency, and inter-GPU communication.
- Strong programming skills in Python.
- Experience with model scaling and parallelisation strategies, including tensor and pipeline parallelism.
Nice to have
- Familiarity with NCCL, MPI, and distributed communication primitives.
- Knowledge of PyTorch and Triton internals.
- Programming experience with C++ and CUDA.
Benefits
- Competitive compensation with salary and equity and comprehensive benefits
- Relocation support for employees moving to join the team in an office location.
- A mission-driven, low-ego culture valuing diversity of thought, ownership, and bias towards action.