Senior Networking Solution Test Engineer – AI Cluster Debugging

2 months ago
Seniority
Senior
Posted
3 Mar 2026 (2 months ago)

We are looking for a Senior Networking Test Engineer with strong system‑level debugging skills to join our End‑to‑End Verification team! You will work on pioneering NVLink, Ethernet and InfiniBand ‑ based AI clusters. Additionally, you will ow complex issues across hardware, system software and AI workloads.

What you’ll be doing:

  • Design and review test and product requirements across the NVLink, Ethernet and InfiniBand / NIC / DPU / Switch portfolio, focusing on large‑scale AI cluster behavior.

  • Build and maintain realistic customer‑like testbeds, including heterogeneous hardware, OS / driver combinations and complex network fabrics.

  • Own end‑to‑end cluster troubleshooting: reproduce customer scenarios, triage across the stack and drive issues to root cause and fix.

  • Read and understand relevant source code to identify defects, validate fixes and improve logging and instrumentation.

  • Collaborate closely with development teams to debug NCCL, RoCE/RDMA and related networking components using logs, code inspection and targeted experiments.

  • Define tests and guide the automation team to implement robust, debuggable suites that produce actionable logs, metrics and traces.

  • Run Regression, Performance, Functional and Scale testing, analyze results and provide clear, data‑driven reports to collaborators.

  • Profile and benchmark deep learning training and inference workloads, correlating model‑level metrics with system and network telemetry to uncover bottlenecks.

What we need to see:

  • B.A./B.Sc. in Computer Science, Electrical Engineering, or equivalent IT/Network/Systems experience.

  • 8+ years of hands‑on networking or system‑level testing and debugging on Linux.

  • Strong Linux networking and debugging skills (for example perf, tcpdump, ethtool, iproute2).

  • Proven production‑grade debugging experience: forming hypotheses, running experiments, and driving issues to root cause under pressure.

  • Expertise in host‑side NIC validation and tuning (offloads, queues, interrupts, firmware/driver interactions).

  • Strong knowledge of AI networking libraries (such as NCCL) and protocols (such as RoCE and RDMA), including performance and correctness debugging.

  • Ability to read and reason about source code (C/C++/Python or similar) and collaborate closely with developers on fixes.

  • Solid scripting and automation skills with Bash / Python / Ansible for setup, log collection, and experiment orchestration.

  • Fast learner, familiar with modern AI tools and workflows, able to adapt quickly.

  • Excellent analytical, problem‑solving and communication skills, with strong ownership and a collaborative approach.

Ways to stand out from the crowd:

  • Hands‑on debugging of collective communication libraries (for example NCCL) or large‑scale LLM training / inference clusters.

  • Experience with large cluster environments (tens to thousands of GPUs or nodes), including incident response and post‑mortem analysis.

  • Deep expertise in tuning and debugging congestion control and lossless Ethernet for AI workloads (for example DCQCN, ECN, PFC).

  • Familiarity with NVIDIA networking technologies (for example BlueField / BF3, ConnectX NICs) and their software stack and diagnostics.

  • Experience debugging issues that span multiple layers (L2/L3, transport, AI frameworks) or contributing to open‑source networking / AI systems.

At NVIDIA, we value diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We provide reasonable accommodations to ensure all individuals can participate in the job application or interview process, perform essential job functions, and receive other benefits and privileges of employment. Join us and be part of a team that's pushing the boundaries of technology and making a real impact in the world.

Related Jobs

View all jobs
Spotlight

Semiconductor Test Engineering Team Leader

Fractile Bristol, United Kingdom
£80,000 – £120,000 pa On-site
Spotlight

Senior Processor Architect

Fractile London, United Kingdom

Senior Networking Solution Test Engineer – AI Cluster Debugging

£80,000 – £120,000 pa Remote

Senior Networking Solution Test Engineer – AI Cluster Debugging

Senior Networking Solution Test Engineer – AI Cluster Debugging

Senior Networking Solution Test Engineer – AI Cluster Debugging

£80,000 – £120,000 pa Remote

Senior Networking Solution Test Engineer – AI Cluster Debugging

£80,000 – £120,000 pa Remote

Board Test Engineering Team Leader

Fractile Bristol, United Kingdom
£80,000 – £120,000 pa On-site

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Where to Advertise Semiconductor Jobs in the UK (2026 Guide)

Advertising semiconductor jobs in the UK requires a fundamentally different approach to most technical hiring. The candidate pool is one of the smallest and most specialised in any engineering discipline — spanning IC design engineers, process engineers, fab technicians, EDA tool developers, compound semiconductor physicists and power electronics specialists. General job boards are largely ineffective for semiconductor hiring. The community is tight-knit, highly academic in its roots and concentrated around a small number of university groups, fab facilities and design centres. Specialist boards, academic channels and direct community engagement are the primary sourcing strategies that work. This guide, published by SemiconductorJobs.co.uk, covers where to advertise semiconductor roles in the UK in 2026, how the main platforms compare, what employers should expect to pay, and what the data says about hiring across different role types.

Semiconductor Jobs UK 2026: What to Expect Over the Next 3 Years

Semiconductors are the foundational technology of the modern world. Every smartphone, electric vehicle, data centre, medical device, satellite, and AI accelerator depends on them. And yet for much of the past decade, the strategic importance of semiconductor design and manufacturing was something that governments, investors, and employers took largely for granted — until supply chain crises, geopolitical tensions, and the insatiable compute demands of artificial intelligence made the vulnerability of global semiconductor supply chains impossible to ignore. The response has been significant and sustained. The UK's National Semiconductor Strategy, the US CHIPS Act, the EU Chips Act, and parallel investment programmes across Japan, South Korea, and Taiwan have collectively committed hundreds of billions of pounds to semiconductor research, design, and manufacturing capability. In the UK specifically, that investment is beginning to translate into real hiring — across compound semiconductor manufacturing, chip design, semiconductor equipment, advanced packaging, and the growing ecosystem of fabless design companies that are choosing Britain as their base. For job seekers, the semiconductor jobs market of 2026 represents an opportunity that is more commercially urgent, more geographically distributed, and more technically diverse than at any previous point in the sector's UK history. The roles being created span the full semiconductor value chain — from fundamental materials research and process engineering through chip design, verification, and the software that makes silicon useful. The candidates who will thrive over the next three years are those who understand where that value chain is being built, which technical areas are attracting the most investment, and how to position their skills at the intersection of the sector's greatest needs. This article breaks down what the UK semiconductor jobs market is likely to look like through to 2028 — covering the titles emerging right now, the technologies driving employer demand, the skills that will matter most, and how to position your career at the leading edge of one of the most strategically important technology sectors in the UK economy.

New Semiconductor Employers to Watch in 2026: UK and International Companies Transforming Chip Careers

The semiconductor industry is entering a new era of investment, geopolitical significance, and technological innovation. As advanced chips power everything from artificial intelligence and edge computing to autonomous vehicles and 5G infrastructure, demand for skilled professionals across design, verification, fabrication, and test engineering continues to rise. For professionals exploring opportunities on www.SemiconductorJobs.co.uk , understanding which employers are scaling, raising funds, winning contracts, or establishing UK operations is critical. This article highlights the new semiconductor employers to watch in 2026, including UK innovators, major international players expanding locally, and emerging firms driving next‑generation semiconductor technologies.