Jobs / UKG***

Staff Site Reliability Engineer- Eng

UKG*** · Lowell, MA, United States

Visa sponsorship details are locked. Unlock company name and apply link with .

Lowell, MA, United States129,500-186,100 USD/yearlyHybrid

Remuneration

129,500-186,100 USD/yearly

Location

Lowell, MA, United States

Visa sponsorship

Sponsors visa

Job summary

Job description Company and benefits Job ID STAFF017752 Employment Type Regular Work Style hybrid Location Lowell,MA,United States Travel Up to 25% Role Staff Site Reliability Engineer- Eng Why UKG***: At UKG***, the work you do matters. The code you ship, the decisions you make, and the care you show a customer all add up to real impact.

Benefits

Job IDSTAFF017752Employment TypeRegularWork StyleHybridLocationLowell,MA,United StatesTravelUp to 25%Staff Site Reliability Engineer- EngWhy UKG:

Qualifications

5+ years of hands-on experience in software engineering, systems engineering, or cloud-based environments.
5+ years of experience working with public cloud platforms (e.g., GCP (preferred), AWS, or Azure).
5+ years of experience configuring, operating, and maintaining applications and/or systems infrastructure in a large-scale, customer-facing environment.
Demonstrated understanding of observability best practices, including metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing.
Experience coding in one or more higher-level programming languages (e.g., Python, Java, or C++).
Strong working knowledge of Linux systems, including troubleshooting, performance analysis, and scripting in production environments.
Experience with GitHub Actions and modern CI/CD practices.
Experience building operational dashboards and alerts using observability
Experience with distributed system design and architecture.
Hands-on experience with cloud-native applications and containerization
required for the job.
Management reserves the right to revise the job or require that other or different tasks be performed if or when circumstances change.

Responsibilities

Engage in and improve the lifecycle of services from conception to end-of-life, including system design reviews, capacity planning, and production readiness.
Support service, product, and engineering teams by providing common tooling and frameworks to increase availability and improve incident detection and response.
Improve system performance, availability, and efficiency through automation, process refinement, post-incident reviews, and in-depth configuration analysis.
Collaborate closely with engineering teams across the organization to deliver and operate reliable services.
Increase operational efficiency, effectiveness, and service quality by treating operational challenges as software engineering problems (reducing toil).
Guide junior team members and serve as a champion for Site Reliability Engineering best practices.
Actively participate in incident responses, including on-call rotations and post-incident reviews, collaborating with engineering teams to restore service and reduce recurrence.
Partner with stakeholders to influence and help drive the best possible technical and business outcomes.
Required
They do not present a comprehensive, detailed inventory of all
and

Skills

CommunicationLeadership

Certifications

GitHub Actions

Degrees

Associate

Work schedule

On-callRotationShift

Travel

Industry

AutomotiveEnergyOil-gas