Epicareer Might not Working Properly
Learn More

Site Reliability Engineer

  • Full Time, onsite
  • AER Technology Solutions
  • Atlanta Metropolitan Area, United States of America
Salary undisclosed

Apply on


Original
Simplified

Responsibilities:

  • Monitor and Maintain Systems: Utilize AWS CloudWatch, Dynatrace, and Quantum Metrics to ensure the reliability, performance, and availability of software systems.
  • Automate Processes: Develop automated solutions for operational tasks to improve efficiency and reduce manual intervention.
  • Incident Management: Proactively manage incidents, ensuring minimal downtime and quick recovery.
  • Build Monitoring Systems: Create and maintain effective monitoring systems using AWS CloudWatch and Dynatrace to alert on symptoms rather than outages.
  • Collaborate with Teams: Work closely with development, IT operations, and product teams to ensure smooth system operations and continuous improvement.
  • Capacity Planning: Participate in system design consulting, capacity planning, and performance tuning to support future growth.
  • Data Analysis: Use Quantum Metrics and other tools to analyze system performance and user experience data to drive improvements.

Requirements:

  • Experience: Proven work experience as an SRE or in a similar role, with specific experience in AWS CloudWatch, MQM, Quantum Metrics, and Dynatrace.
  • Technical Skills: Proficiency in programming languages such as Python, Java, or C/C++. Experience with distributed storage technologies and dynamic resource management frameworks.
  • Problem-Solving: Strong analytical and problem-solving skills.
  • Communication: Excellent collaboration and communication skills.
  • Education: Bachelor’s degree in Computer Science or a related field.

Preferred Skills:

  • Experience with tools like Chef, Ansible, Terraform, GitLab CI/CD, and Kubernetes.
  • Familiarity with cloud platforms such as AWS, Azure, or Google Cloud.
  • Knowledge of MQM (Message Queue Management) systems and their integration with monitoring tools.