Jobs / Ser***

Site Reliability Engineer, AI & Agentic Systems

Ser*** · Plano, TX, United States
Visa sponsorship details are locked. Unlock company name and apply link with .
Plano, TX, United States40-45 USD/hourlyHybrid
Remuneration
40-45 USD/hourly
Location
Plano, TX, United States
Visa sponsorship
Sponsors visa

Job summary

Overview: As our SRE charter continues to evolve, this role demands strong hands-on ownership of production reliability and troubleshooting, coupled with advanced capabilities in AI- and agentic-driven automation and performance engineering. The Site Reliability Engineer will play a critical role in ensuring reliability, scalability, performance, and operational excellence of our platforms.

Qualifications

  • reliability, scalability, performance)
  • Core SRE

Responsibilities

  • In this role, you will…
  • Reliability Engineering & Production Ownership
  • Own end-to-end reliability of large-scale, Azure-hosted production systems, ensuring high availability, fault tolerance, and graceful degradation
  • Lead hands-on incident troubleshooting, root cause analysis (RCA), and post-incident reviews with actionable follow-ups
  • Define, measure, and enforce Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets aligned with business outcomes
  • Drive proactive reliability improvements based on operational insights, failure mode analysis, and capacity planning
  • Participate in on-call rotations and take real-time ownership during production incidents
  • Platform & Automation Engineering
  • Build and operate resilient, scalable services on Microsoft Azure (AKS, App Services, Functions, Event Hubs, etc.)
  • Design and maintain comprehensive observability platforms using Prometheus for metrics, Loki for log aggregation, Tempo for distributed tracing, and Grafana for dashboarding and alerting
  • Create automation to eliminate manual operational tasks and reduce Mean Time to Recovery (MTTR)
  • Implement self-healing mechanisms, automated remediation workflows, and runbook automation

Skills

Leadership

Certifications

GitHub Actions

Degrees

Associate

Work schedule

On-callRotationShift

Industry

AutomotiveEnergyMediaOil-gas

Company size

Smb