Jobs / NVI***

Principal Software Engineer, At-Scale Reliability and Fleet Intelligence — CSP Engagements

NVI*** · Santa Clara, CA, United States
Visa sponsorship details are locked. Unlock company name and apply link with .
Santa Clara, CA, United States272,000-431,250 USD/yearlyOnsite
Remuneration
272,000-431,250 USD/yearly
Location
Santa Clara, CA, United States
Visa sponsorship
Sponsors visa

Job summary

We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal point for fleet-scale reliability, working directly with engineering teams of key CSP / hyperscale customers to ensure NVI*** platforms achieve target MTBI (Mean Time Between Interruptions) in production.

Benefits

Familiarity with NVIDIA GPU error taxonomy (Xid errors, NVLink error counters, tThe base salary range is 272,000 USD - 431,250 USD.Applications for this job will be accepted at least until June 30, 2026.This posting is for an existing vacancy.NVIDIA uses AI

Qualifications

  • Define burn-in reliability test environment and cluster certification criteria in collaboration with quality teams, validating with customers that criteria are meaningful
  • 15+ years of experience in systems software at datacenter scale, or reliability engineering with focus on at-scale challenges.
  • BS or MS in Computer Science, Electrical Engineering, Statistics, or related field (or equivalent experience)
  • Experience with fleet-level telemetry and observability systems: time-series databases, anomaly detection, health scoring, event correlation
  • Experience defining or operating burn-in, stress testing, or certification frameworks for complex hardware systems.
  • Experience in fleet reliability at a hyperscaler (hardware health, fleet reliability at leading CSP/Hyperscaler)
  • Experience building health scoring or predictive failure models for accelerator or HPC infrastructure
  • Background in defining MTBI/MTBF measurement standards or certification programs for complex multi-component systems
  • Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

Responsibilities

  • In this role, you will augment NVI***'s internal software/firmware and quality teams with a dedicated CSP-facing focus.
  • What you'll be doing:
  • You will also be eligible for equity and

Skills

CommunicationLeadership

Degrees

AssociateBachelor

Industry

AutomotiveEnergyMedia

Company size

Smb