Epicareer Might not Working Properly
Learn More

Site Reliability Engineer - Night Shift

Salary undisclosed

Apply on


Original
Simplified
GovCloud Incident Response (GIR) maintains the current infrastructure with daily alert response, smart hands, and incident management; including retrospectives and following up on long-term remediations.

Role Description:
  • Keep the customer-facing services available at top performance by maintaining the constant health of the supporting systems
  • Incident management - Act in key support roles during major incidents e.g. Sev0, Sev1. Also, participate in the technical review of the incident for problem management
  • Problem Management - populate and participate in RCAs and hand them off to the Global Solutions team
  • Ensuring that work carried out by the Site Reliability team is performed in such a way as to stay in sync with the company's internal compliance policy and directives
  • Passionate about solving technical issues and customer concerns with other technical staff as the need arises
  • Work with and lead other members of the team in staying on top of key industry innovation and technology, and assist in team development growth


Basic Requirements:
  • Systems engineering experience in enterprise scale internet service engineering or support role
  • Expertise in TCP/IP related technologies (networking protocols, network programming, etc.)
  • Expertise in CLI enterprise support of Unix variants (Linux/Solaris/BSD) as well as strong Linux/UNIX knowledge with significant exposure to Red Hat Enterprise Linux and Solaris
  • Strong understanding of monitoring security systems and administration
  • Strong Communication skills (Written and Oral)
  • Past experience in Incident Management and good understanding of ITIL service operations
  • Willingness to work in a 24/7 team managing large data centers
  • Be available for shift work and being on call if required
  • Experience provisioning, operating, and running AWS/C2S based infrastructure and systems
  • Understand and have experience with writing scripts in Python, Go, or other languages


Preferred Qualifications:
  • Prior Chef/Puppet or automated deployment experience
  • Prior Jenkins/Bamboo/Spinnaker pipeline execution experience
  • Experience in supporting and maintaining a monitoring and alert systems
  • Experience in supporting and maintaining Java applications
  • Hands on experience configuring and running AWS (Amazon Web Services), using the CLI/SDKs
  • Certifications in Linux+, RedHat and AWS
  • Experience in supporting and leading Kubernetes based applications and services
  • Familiar with Agile Process and DevOps
  • Experience taking part in blameless retrospectives, learning from incidents, and conducting post-incident investigations, including incident analysis as well as performance evaluations of responders
  • Working knowledge of and interest in resilience engineering including concepts such as safety II and looking at how things go right instead of how things go wrong, being proactive instead of reactive, and investigating complex sociotechnical systems

Due to the citizenship requirement for this role, which supports U.S. federal, state, and/or local government customers, citizenship will be verified through two of the following REAL ID Act documents: U.S. Passport, Passport Card, REAL Driver's License, Global Entry Card, U.S. Government CAC/PIV. "
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job