Epicareer Might not Working Properly
Learn More
T

Senior/Lead SRE/Observability Engineer - Hybrid role

Salary undisclosed

Apply on


Original
Simplified

Job Description

Job Description

Title: Senior/Lead SRE / Observability Engineer
Location: Phoenix, AZ (3 days onsite per week)
Contract Duration: 6-12 Months + Contract
Job Type: Contract
Experience Needed: 8+ years

Job Description:
Roles and Responsibilities:

The Senior/Lead SRE/Observability Engineer will be responsible for implementing and maintaining observability solutions using open-source software (OSS) tools at an enterprise scale. The role involves collaborating with product and application teams to design solutions,
troubleshoot application and server outages, and drive system reliability and performance. You will be required to work closely with cloud platforms like Azure, GCP, or AWS to extract observability data from managed services. The role also demands a strong focus on DevOps, with the ability to address technical issues in store and network systems, including point-of-sale systems.

Must Haves:
8+ years of Senior/Lead SRE/Observability experience
Experience with Grafana OSS Stack (Mimir, Loki, Tempo, Grafana agent)
Experience with Azure, GCP, or AWS
Development experience with Java, Golang, or Python or OpenTelemetry SDK implementation
Be able to work with the team on design and whiteboarding
The necessary skills for an Application Engineer, such as Site Reliability Engineering (SRE) expertise, and the ability to drive solutions effectively.
Collaborating with product and application teams to understand their needs for building solutions and addressing challenges.
Strong DevOps experience and troubleshooting abilities are essential.
Day-to-day technical responsibilities include ensuring observability across systems like stores and networks, addressing issues with point-of-sale systems, troubleshooting application and server outages, and using tools to set appropriate thresholds for system performance monitoring.

Skills Required:
Grafana OSS Stack (Mimir, Loki, Tempo)
Azure, GCP, or AWS
Java, Golang, or Python
OpenTelemetry SDK
Site Reliability Engineering (SRE)
DevOps experience
Troubleshooting

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job