M
Director of SRE - Fully Remote
Salary undisclosed
Checking job availability...
Original
Simplified
A local company is seeking a Director of Site Reliability Engineering (SRE) to lead and enhance the Azure-based infrastructure in a fully remote role, with occasional office visits to Florida. This role is ideal for a seasoned SRE leader with deep expertise in Azure Cloud, Kubernetes, and observability tools.
Responsibilities
Responsibilities
- Architect, scale, and optimize Azure cloud environments to ensure reliability and performance.
- Lead Kubernetes operations, including cluster management and automation.
- Implement and manage Datadog and PagerDuty for monitoring, alerting, and incident response.
- Define and enforce SRE best practices to improve system resilience and operational efficiency.
- Collaborate with engineering teams to streamline CI/CD pipelines and infrastructure automation.
- Drive incident management, post-mortems, and reliability improvements.
- Proven experience leading SRE teams in an Azure-focused environment.
- Strong expertise in Kubernetes, including deployment, scaling, and troubleshooting.
- Hands-on experience setting up and managing Datadog and PagerDuty.
- Deep understanding of cloud infrastructure, automation, and observability tools.
- Experience with CI/CD, infrastructure as code (Terraform, Bicep), and scripting.
- Excellent problem-solving and leadership skills.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job A local company is seeking a Director of Site Reliability Engineering (SRE) to lead and enhance the Azure-based infrastructure in a fully remote role, with occasional office visits to Florida. This role is ideal for a seasoned SRE leader with deep expertise in Azure Cloud, Kubernetes, and observability tools.
Responsibilities
Responsibilities
- Architect, scale, and optimize Azure cloud environments to ensure reliability and performance.
- Lead Kubernetes operations, including cluster management and automation.
- Implement and manage Datadog and PagerDuty for monitoring, alerting, and incident response.
- Define and enforce SRE best practices to improve system resilience and operational efficiency.
- Collaborate with engineering teams to streamline CI/CD pipelines and infrastructure automation.
- Drive incident management, post-mortems, and reliability improvements.
- Proven experience leading SRE teams in an Azure-focused environment.
- Strong expertise in Kubernetes, including deployment, scaling, and troubleshooting.
- Hands-on experience setting up and managing Datadog and PagerDuty.
- Deep understanding of cloud infrastructure, automation, and observability tools.
- Experience with CI/CD, infrastructure as code (Terraform, Bicep), and scripting.
- Excellent problem-solving and leadership skills.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job