Senior Applied Research Engineer | Barcelona, Spain

We are partnered with a cutting-edge AI company shaping the future of enterprise decision-making. Founded by experienced technologists from leading research environments, the firm has developed a market-leading platform purpose-built for the structured data that underpins critical business decisions. Backed by top-tier investors and trusted by some of the world’s largest organisations, the company helps enterprises unlock significant value by enabling more accurate, forward-looking decision-making.
You will work on novel technical challenges in large-scale model development and contribute to technology that is changing how major organisations operate. This is an opportunity to join a category-defining company at an early stage and help shape its trajectory. Location & compensation

Location: Barcelona, Spain, Hybrid working
Salary: Up to €150,000 (plus equity)
Industry: Technology

Key responsibilities

Profile end-to-end distributed training runs to identify bottlenecks across compute, GPU memory, and inter-GPU communication.
Influence architectural decisions to improve efficiency and reliability of large-scale training jobs, including developing Triton/CUDA kernels when needed.
Design and implement model scaling, parallelisation, and memory optimisation techniques for training workloads with very large context sizes.
Collaborate closely with ML Researchers to diagnose architectural inefficiencies, ensure new research ideas scale efficiently in practice, and share internal knowledge on optimisation.
Drive productionisation and serving of models from the research side, including improving inference efficiency via techniques such as quantisation.

Must have

Strong understanding of modern ML architectures and large-scale training pipelines.
Hands-on experience running distributed training jobs on multi-GPU systems.
Advanced profiling and debugging across CPU, GPU, memory usage, latency, and inter-GPU communication.
Strong programming skills in Python.
Experience with model scaling and parallelisation strategies, including tensor and pipeline parallelism.

Nice to have

Familiarity with NCCL, MPI, and distributed communication primitives.
Knowledge of PyTorch and Triton internals.
Programming experience with C and CUDA.

Benefits

Competitive compensation with salary and equity and comprehensive benefits
Relocation support for employees moving to join the team in an office location.
A mission-driven, low-ego culture valuing diversity of thought, ownership, and bias towards action.

Senior Applied Research Engineer | Barcelona | Up to €150k

Job Description

Recent Jobs.