Jobs / NVI***
Principal Software Engineer, At-Scale Reliability and Fleet Intelligence — CSP Engagements
NVI*** · Santa Clara, CA, United States
Visa sponsorship details are locked. Unlock company name and apply link with .
Santa Clara, CA, United States272,000-431,250 USD/yearlyOnsite
Remuneration
272,000-431,250 USD/yearly
Location
Santa Clara, CA, United States
Visa sponsorship
Sponsors visa
Job summary
We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal point for fleet-scale reliability, working directly with engineering teams of key CSP / hyperscale customers to ensure NVI*** platforms achieve target MTBI (Mean Time Between Interruptions) in production.
Benefits
Familiarity with NVIDIA GPU error taxonomy (Xid errors, NVLink error counters, tThe base salary range is 272,000 USD - 431,250 USD.Applications for this job will be accepted at least until June 30, 2026.This posting is for an existing vacancy.NVIDIA uses AI
Qualifications
- Define burn-in reliability test environment and cluster certification criteria in collaboration with quality teams, validating with customers that criteria are meaningful
- 15+ years of experience in systems software at datacenter scale, or reliability engineering with focus on at-scale challenges.
- BS or MS in Computer Science, Electrical Engineering, Statistics, or related field (or equivalent experience)
- Experience with fleet-level telemetry and observability systems: time-series databases, anomaly detection, health scoring, event correlation
- Experience defining or operating burn-in, stress testing, or certification frameworks for complex hardware systems.
- Experience in fleet reliability at a hyperscaler (hardware health, fleet reliability at leading CSP/Hyperscaler)
- Experience building health scoring or predictive failure models for accelerator or HPC infrastructure
- Background in defining MTBI/MTBF measurement standards or certification programs for complex multi-component systems
- Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
Responsibilities
- In this role, you will augment NVI***'s internal software/firmware and quality teams with a dedicated CSP-facing focus.
- What you'll be doing:
- You will also be eligible for equity and
Skills
CommunicationLeadership
Degrees
AssociateBachelor
Industry
AutomotiveEnergyMedia
Company size
Smb