Director of SRE - Fully Remote

Full Time, remote
Motion Recruitment Partners, LLC
Remote On Site, United States of America

Salary undisclosed

Checking job availability...

Original

Simplified

A local company is seeking a Director of Site Reliability Engineering (SRE) to lead and enhance the Azure-based infrastructure in a fully remote role, with occasional office visits to Florida. This role is ideal for a seasoned SRE leader with deep expertise in Azure Cloud, Kubernetes, and observability tools.
Responsibilities

Architect, scale, and optimize Azure cloud environments to ensure reliability and performance.
Lead Kubernetes operations, including cluster management and automation.
Implement and manage Datadog and PagerDuty for monitoring, alerting, and incident response.
Define and enforce SRE best practices to improve system resilience and operational efficiency.
Collaborate with engineering teams to streamline CI/CD pipelines and infrastructure automation.
Drive incident management, post-mortems, and reliability improvements.

Requirements

Proven experience leading SRE teams in an Azure-focused environment.
Strong expertise in Kubernetes, including deployment, scaling, and troubleshooting.
Hands-on experience setting up and managing Datadog and PagerDuty.
Deep understanding of cloud infrastructure, automation, and observability tools.
Experience with CI/CD, infrastructure as code (Terraform, Bicep), and scripting.
Excellent problem-solving and leadership skills.

We are currently not accepting h1b at this time.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Report this job

Architect, scale, and optimize Azure cloud environments to ensure reliability and performance.
Lead Kubernetes operations, including cluster management and automation.
Implement and manage Datadog and PagerDuty for monitoring, alerting, and incident response.
Define and enforce SRE best practices to improve system resilience and operational efficiency.
Collaborate with engineering teams to streamline CI/CD pipelines and infrastructure automation.
Drive incident management, post-mortems, and reliability improvements.

Requirements

Proven experience leading SRE teams in an Azure-focused environment.
Strong expertise in Kubernetes, including deployment, scaling, and troubleshooting.
Hands-on experience setting up and managing Datadog and PagerDuty.
Deep understanding of cloud infrastructure, automation, and observability tools.
Experience with CI/CD, infrastructure as code (Terraform, Bicep), and scripting.
Excellent problem-solving and leadership skills.

We are currently not accepting h1b at this time.

Report this job