Senior Systems Development Engineer
Austin, Texas
Full Time
2 hours ago
Senior LevelEngineering
Over $120K

USD per year

Job Description

Senior Systems Development Engineer

Our customers’ system requirements are usually highly complex. Bringing together hardware and software systems design, Systems Development Engineering operates at the very cutting edge of technology to meet them. We design and develop electronic and electro-mechanical or systems-orientated products, conduct feasibility studies on engineering proposals and prepare installation, operation and maintenance specifications and instructions. We’re proud to deliver programs and products to the highest quality standards, on time and within budget. Join us to do the best work of your career and make a profound social impact as a Senior Systems Development Engineer on our Systems Development Engineering Team in Austin, Texas.

What you’ll achieve

As a Senior Systems Development Engineer, you will design, define and implement complex system requirements for customers and prepare studies and analyses of existing systems.

You will:

  • System Platform Engineering: Lead bring‑up, configuration, and validation of system platforms supporting AI workloads (servers, GPU racks, accelerators, networking fabrics); work with BIOS/UEFI, BMC, firmware, drivers, and kernel subsystems to ensure system readiness for large‑scale AI deployments; perform hardware–software co-validation of CPUs, GPUs, DPUs, NICs, accelerators, and memory subsystems under AI‑heavy workloads; validate PCIe fabric behavior, NUMA topology, and data‑path efficiency for model training and inference.
  • System Debugging & Hardware–Software Interaction: Diagnose complex issues across BIOS, firmware, OS, driver stack, container runtime, orchestration layer, and AI frameworks; analyze system logs, kernel traces, hardware event telemetry, GPU health signals, and fabric diagnostics; conduct root‑cause analysis of performance bottlenecks, training failures, model divergence, and hardware stability issues; collaborate with silicon, firmware, OS, and AI software teams to resolve issues rapidly.
  • AI Cluster & Rack‑Level Operations: Deploy and manage AI clusters: GPU servers,...
How to Apply