
Senior Applied Research Engineer | Barcelona | Up to €150k
Barcelona, Spain
Apply by 16 May 2026
€85000 - €150000 per annum
Job Ref.: BH-57191
Job Type: Permanent
Job Description
We are partnered with a cutting-edge AI company shaping the future of enterprise decision-making. Founded by experienced technologists from leading research environments, the firm has developed a market-leading platform purpose-built for the structured data that underpins critical business decisions. Backed by top-tier investors and trusted by some of the world’s largest organisations, the company helps enterprises unlock significant value by enabling more accurate, forward-looking decision-making.
You will work on novel technical challenges in large-scale model development and contribute to technology that is changing how major organisations operate. This is an opportunity to join a category-defining company at an early stage and help shape its trajectory. Location & compensation
- Location: Barcelona, Spain, Hybrid working
- Salary: Up to €150,000 (plus equity)
- Industry: Technology
- Profile end-to-end distributed training runs to identify bottlenecks across compute, GPU memory, and inter-GPU communication.
- Influence architectural decisions to improve efficiency and reliability of large-scale training jobs, including developing Triton/CUDA kernels when needed.
- Design and implement model scaling, parallelisation, and memory optimisation techniques for training workloads with very large context sizes.
- Collaborate closely with ML Researchers to diagnose architectural inefficiencies, ensure new research ideas scale efficiently in practice, and share internal knowledge on optimisation.
- Drive productionisation and serving of models from the research side, including improving inference efficiency via techniques such as quantisation.
- Strong understanding of modern ML architectures and large-scale training pipelines.
- Hands-on experience running distributed training jobs on multi-GPU systems.
- Advanced profiling and debugging across CPU, GPU, memory usage, latency, and inter-GPU communication.
- Strong programming skills in Python.
- Experience with model scaling and parallelisation strategies, including tensor and pipeline parallelism.
- Familiarity with NCCL, MPI, and distributed communication primitives.
- Knowledge of PyTorch and Triton internals.
- Programming experience with C and CUDA.
- Competitive compensation with salary and equity and comprehensive benefits
- Relocation support for employees moving to join the team in an office location.
- A mission-driven, low-ego culture valuing diversity of thought, ownership, and bias towards action.