Epicareer Might not Working Properly
Learn More

Site Reliability Engineer

  • Full Time, onsite
  • Software Guidance & Assistance
  • On Site, United States of America
Salary undisclosed

Apply on


Original
Simplified
Software Guidance & Assistance, Inc., (SGA), is searching for a Site Reliability Engineer for a Contract assignment with one of our premier SaaS clients in Newton, MA.

We are seeking a skilled Site Reliability Engineer (SRE) Level 2 to join our dynamic team. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and performance. You will play a crucial role in monitoring, automating, and optimizing our infrastructure to ensure the seamless operation of our services

Responsibilities :
  • System Monitoring and Incident Response: Monitor system health, performance metrics, and availability. Respond promptly to incidents and outages, ensuring minimal downtime.
  • Infrastructure Management: Manage and optimize both cloud and on-premise infrastructure using Infrastructure as Code (IaC) tools.
  • Automation: Develop and maintain automation scripts and tools to enhance operational efficiency and reduce manual tasks.
  • Collaboration: Work closely with development teams to implement CI/CD practices and improve deployment processes.
  • Capacity Planning: Analyze usage patterns and forecast capacity needs to ensure system scalability and reliability.
  • Documentation: Create and maintain comprehensive documentation for systems, processes, and incident response protocols.
  • Security Best Practices: Implement and enforce security measures to protect infrastructure and data.
  • Post-Incident Reviews: Conduct post-mortems on incidents to identify root causes and implement corrective actions.


Required Skills :
  • Strong knowledge of Linux/Unix systems and proficiency in scripting languages (e.g., Python, Bash).
  • Familiarity with cloud platforms (e.g., AWS) and their services.
  • Experience with container orchestration (e.g., Kubernetes, Docker).
  • Proficiency in using monitoring and alerting tools (e.g., Prometheus, Grafana, Nagios).
  • Experience with version control systems (e.g., Git).
  • Strong troubleshooting skills with the ability to diagnose complex system issues.
  • Excellent verbal and written communication skills for collaboration with cross-functional teams.
  • Understanding of Agile development practices and methodologies
  • 2-4 years of experience in Site Reliability Engineering or a similar role.


SGA is a technology and resource solutions provider driven to stand out. We are a women-owned business. Our mission: to solve big IT problems with a more personal, boutique approach. Each year, we match consultants like you to more than 1,000 engagements. When we say let's work better together, we mean it. You'll join a diverse team built on these core values: customer service, employee development, and quality and integrity in everything we do. Be yourself, love what you do and find your passion at work. Please find us at .

EEO Employer: Race, Color, Sex, Sexual Orientation, Gender Identity, Religion, National Origin, Disability, Veteran Status, Age, Marital Status, Pregnancy, Genetic Information, or Other Legally Protected Status.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job