Senior Site Reliability Engineer
Job description
We are seeking a Site Reliability Engineer to help build and operate reliable, scalable cloud-native platforms and services that support Ora*** Health’s next-generation healthcare technology initiatives. This role works across platform engineering, cloud infrastructure, distributed systems, integration services, healthcare applications, and operational tooling. The ideal candidate is passionate about reliability, automation, observability, and performance. You will collaborate closely with software engineering, product, security, operations, and customer-facing teams to design resilient systems, improve service health, respond to incidents, and support modernization efforts across healthcare workflows, interoperability, data platforms, and AI-driven capabilities. This role bridges software engineering and operations: designing infrastructure and services for reliability, developing automation, improving monitoring and alerting, supporting incident response and root cause analysis, and contributing to secure, scalable systems running on Ora*** Cloud Infrastructure. Design, build, test, and operate reliable cloud infrastructure, platform capabilities, and services on Ora*** Cloud Infrastructure and legacy deployment models. Partner with software engineering teams to develop scalable, resilient services, APIs, integrations, and distributed systems. Forecast capacity needs, analyze service trends, and take proactive steps to ensure systems can support current and future workloads. Monitor service health, availability, latency, performance, and capacity using observability and reporting tools. Participate in incident response, troubleshooting, root cause analysis, postmortems, and follow-up remediation. Develop automation, scripts, and tooling to support provisioning, deployment, monitoring, metrics collection, mitigation, and remediation. Support CI/CD, DevOps, infrastructure automation, and operational readiness practices. Investigate and debug issues across applications, infrastructure, services, and dependencies to help teams meet service level objectives. Identify performance bottlenecks and reliability risks, then recommend and implement improvements. Collaborate with product managers, architects, engineers, security, operations, and customer teams to deliver secure, customer-focused healthcare solutions. Support modernization efforts involving cloud-native architectures, healthcare interoperability, large-scale healthcare data platforms, and AI-enabled capabilities. Communicate service health, operational risks, capacity concerns, and the potential impact of infrastructure, feature, or tooling changes. Contribute to documentation, runbooks, incident records, operational standards, and knowledge sharing. Participate in on-call rotations and operational support for production services.