Jobs / UKG***
Staff Site Reliability Engineer- Eng
UKG*** · Lowell, MA, United States
Visa sponsorship details are locked. Unlock company name and apply link with .
Lowell, MA, United States129,500-186,100 USD/yearlyHybrid
Remuneration
129,500-186,100 USD/yearly
Location
Lowell, MA, United States
Visa sponsorship
Sponsors visa
Job summary
Job description Company and benefits Job ID STAFF017752 Employment Type Regular Work Style hybrid Location Lowell,MA,United States Travel Up to 25% Role Staff Site Reliability Engineer- Eng Why UKG***: At UKG***, the work you do matters. The code you ship, the decisions you make, and the care you show a customer all add up to real impact.
Benefits
Job IDSTAFF017752Employment TypeRegularWork StyleHybridLocationLowell,MA,United StatesTravelUp to 25%Staff Site Reliability Engineer- EngWhy UKG:
Qualifications
- 5+ years of hands-on experience in software engineering, systems engineering, or cloud-based environments.
- 5+ years of experience working with public cloud platforms (e.g., GCP (preferred), AWS, or Azure).
- 5+ years of experience configuring, operating, and maintaining applications and/or systems infrastructure in a large-scale, customer-facing environment.
- Demonstrated understanding of observability best practices, including metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing.
- Experience coding in one or more higher-level programming languages (e.g., Python, Java, or C++).
- Strong working knowledge of Linux systems, including troubleshooting, performance analysis, and scripting in production environments.
- Experience with GitHub Actions and modern CI/CD practices.
- Experience building operational dashboards and alerts using observability
- Experience with distributed system design and architecture.
- Hands-on experience with cloud-native applications and containerization
- required for the job.
- Management reserves the right to revise the job or require that other or different tasks be performed if or when circumstances change.
Responsibilities
- Engage in and improve the lifecycle of services from conception to end-of-life, including system design reviews, capacity planning, and production readiness.
- Support service, product, and engineering teams by providing common tooling and frameworks to increase availability and improve incident detection and response.
- Improve system performance, availability, and efficiency through automation, process refinement, post-incident reviews, and in-depth configuration analysis.
- Collaborate closely with engineering teams across the organization to deliver and operate reliable services.
- Increase operational efficiency, effectiveness, and service quality by treating operational challenges as software engineering problems (reducing toil).
- Guide junior team members and serve as a champion for Site Reliability Engineering best practices.
- Actively participate in incident responses, including on-call rotations and post-incident reviews, collaborating with engineering teams to restore service and reduce recurrence.
- Partner with stakeholders to influence and help drive the best possible technical and business outcomes.
- Required
- They do not present a comprehensive, detailed inventory of all
- and
Skills
CommunicationLeadership
Certifications
GitHub Actions
Degrees
Associate
Work schedule
On-callRotationShift
Travel
Travel
Industry
AutomotiveEnergyOil-gas