Epicareer Might not Working Properly
Learn More

SRE Engineer/ Senior Site Reliability Engineer/ SRE/ Site Reliability Engineer/ Sr. Azure

Salary undisclosed

Apply on


Original
Simplified

Job Description:

  • Overview: The Site Reliability Engineer (SRE) plays a critical role in ensuring the reliability, scalability, and performance of Client's digital platforms and infrastructure. As part of a global team of highly skilled engineers, the SRE will work on challenging and impactful projects that directly contribute to the company's core business activities. Client is committed to fostering a culture of innovation, collaboration, and continuous learning, providing the SRE with an opportunity to grow and develop their skills while making a positive impact on the world.

Main Accountabilities:

  • Troubleshoot and resolve infrastructure issues and incidents in a timely manner.
  • Design, implement, and maintain reliable and scalable infrastructure solutions to support Client's digital platforms and applications.
  • Monitor and analyze system performance, identify potential issues, and take proactive measures to prevent outages and disruptions.
  • Collaborate with cross-functional teams, including software engineers, product managers, and operations personnel, to ensure seamless integration of infrastructure and application components.
  • Develop and implement automation scripts and tools to streamline infrastructure management tasks and improve operational efficiency.
  • Stay up to date with industry best practices and emerging technologies in the field of site reliability engineering.
  • Close cooperation with DevOps and Cloud engineers.

Impact/Dimensions:

  • Contributes to the reliability and uptime of Client's digital platforms, which are critical for the company's global operations and customer satisfaction.
  • Works on projects that have a direct impact on Client's revenue and profitability.
  • The individual in this role will have a significant impact on the efficiency and effectiveness of Client's technology operations and will be responsible for driving continuous improvement initiatives that save the company time and money.

Key Performance Indicators (KPIs):

  • Mean Time to Repair (MTTR) for critical systems
  • System uptime and availability
  • Number of incidents and outages prevented
  • Customer satisfaction with infrastructure performance

Major Opportunities and Decisions:

  • Identifying and mitigating potential risks to infrastructure stability and performance.
  • Making decisions on infrastructure investments and resource allocation to optimize cost-effectiveness and scalability.
  • Balancing the need for innovation with the requirement for stability and reliability in infrastructure operations.

Management/Leadership:

  • Leads and mentors a team of junior SREs and infrastructure engineers.
  • Provides technical guidance to cross-functional teams on infrastructure-related matters.
  • Actively participates in shaping the company's infrastructure strategy and roadmap.

Key Relationships, Stakeholders & Interfaces (External & Internal):

  • Works closely with software engineering teams to ensure seamless integration of infrastructure and application components.
  • Development teams
  • Infrastructure teams
  • Business stakeholders
  • Vendors and partners

Knowledge and Technical Competencies:

  • Strong understanding of SRE & DevOps principles and practices.
  • Experience with CI/CD Azure DevOps platform.
  • Knowledge of infrastructure management tools such as Ansible, Puppet, or Chef.
  • Solid experience with containerization such as Docker and orchestration tools such as Kubernetes.
  • Solid knowledge about security aspects in cloud and on-premises.
  • Proficient in scripting languages such as Python or Bash.
  • Experience with cloud computing platforms such as AWS and Azure where Google Cloud Platform is preferred.
  • Experience with monitoring software such as Datadog, Zabbix, Kibana etc.
  • Hand-on coding, deploying, and supporting large scale, serverless architectures.
  • Infrastructure provisioning with Terraform or CloudFormation (IaaC).
  • Experience with Linux and Windows operating systems.
  • Strong problem-solving and analytical skills.
  • Excellent communication and interpersonal skills.

Education/Experience:

  • Bachelor's degree in computer science or a related field.
  • 5+ years of experience in DevOps engineering.
  • Experience with leading teams and managing projects.
  • Very good knowledge of English in general.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job