Site Reliability Engineer

Salary undisclosed

Apply on

Original

Simplified

Responsibilities:

Monitor and Maintain Systems: Utilize AWS CloudWatch, Dynatrace, and Quantum Metrics to ensure the reliability, performance, and availability of software systems.
Automate Processes: Develop automated solutions for operational tasks to improve efficiency and reduce manual intervention.
Incident Management: Proactively manage incidents, ensuring minimal downtime and quick recovery.
Build Monitoring Systems: Create and maintain effective monitoring systems using AWS CloudWatch and Dynatrace to alert on symptoms rather than outages.
Collaborate with Teams: Work closely with development, IT operations, and product teams to ensure smooth system operations and continuous improvement.
Capacity Planning: Participate in system design consulting, capacity planning, and performance tuning to support future growth.
Data Analysis: Use Quantum Metrics and other tools to analyze system performance and user experience data to drive improvements.

Requirements:

Experience: Proven work experience as an SRE or in a similar role, with specific experience in AWS CloudWatch, MQM, Quantum Metrics, and Dynatrace.
Technical Skills: Proficiency in programming languages such as Python, Java, or C/C++. Experience with distributed storage technologies and dynamic resource management frameworks.
Problem-Solving: Strong analytical and problem-solving skills.
Communication: Excellent collaboration and communication skills.
Education: Bachelor’s degree in Computer Science or a related field.

Preferred Skills:

Experience with tools like Chef, Ansible, Terraform, GitLab CI/CD, and Kubernetes.
Familiarity with cloud platforms such as AWS, Azure, or Google Cloud.
Knowledge of MQM (Message Queue Management) systems and their integration with monitoring tools.

Similar Jobs

5d ago

Bauer Foundation Corp Careers

Full Time, onsiteFull Time, onsite

Salary undisclosed

5d ago

Quest Global