Senior Site Reliability Engineer

Full Time, onsite
Acl Digital
On Site, United States of America

Salary undisclosed

Apply on

Dice

Original

Simplified

As a lead engineer with Retail, Site Reliability Engineering team, you will be at the forefront of Cloud and Big Data technology. In this role you will establish yourself as a technical leader by exposing yourself to a broad range of industry leading technologies that will help to drive acceleration. The ideal candidate will have expert design and development capabilities and be positioned to contribute to a growing set of services and features for the ecosystem. This role will be supporting highly available, business critical applications. This role will serve as the escalation point for complex and hard to define issues in both on premise and AWS environments. We are seeking talented engineers, well versed in DevOps technologies, automation, infrastructure orchestration, configuration management, continuous integration, troubleshooting of complex issues, who are not constrained by how "things are usually done".

Quals--
Senior Site Reliability Engineer (SRE)

Location - ATL

We are looking for a Senior Site Reliability Engineer who is versed in modern reliability disciplines and can drive cross-team reliability initiatives. These initiatives include improving Delta reliability engineering practices through increased application resiliency, increased uptime/availability and improving application performance. An ideal candidate would have prior experience implementing observability plans around logs, metrics, and traces.

YOUR RESPONSIBILITIES IN THIS ROLE
Strong experience setting SLOs / SLIs / error budgets and managing of reliability for infrastructure and applications
Proficient in one or more of the following scripting languages: JavaScript, Nodejs, Python, Maven, Ansible, Bash, etc.
Experience handling large numbers of diverse systems with configuration management systems like Puppet, Chef, Ansible
Proven history of toil elimination by leveraging automation
Strong background using tools like PagerDuty for managing incidents
Strong experience with monitoring and alerting systems like Prometheus, Grafana, Dynatrace.
Understanding of standard networking protocols and components such as HTTP, DNS, ECMP, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies
Experience in Serverless Application Framework
Experience in containerized workloads and management platforms such as Docker or Kubernetes
Familiarity with distributed systems including Microservices
Experience in Infrastructure automation tools such as CloudFormation, Terraform
Understanding of CI/CD processes and experience with deployment automation tools such as Code Pipeline, Code Deploy, Jenkins, Bamboo
Strong debugging, troubleshooting, and problem-solving skills
Effective communication, collaboration & negotiation skills with the ability to interface with various business units and third parties
Experience liaising with developers, operations staff and third-party resources
Experience with API integration projects
Ability to coach/mentor team members on multiple aspects of reliability engineering

Must Have Expertise

1. Minimum 5+ years of experience in DevOps practices

2. Hands on experience with AWS Cloud and DevOps principles

3. 2+ years of experience working on DevOps tools (GitLab CI, AWS-CodePipeline)

4. 2+ years of experience in Scripting tools (Bash, Python etc.)

5. 1+ years of experience in developing NodeJS or TypeScript applications.

6. 2+ years of experience in building and supporting applications in AWS and engineering applications in the AWS infrastructure using their Native services.

7. 1+ year of experience in AWS CDK

8. Ability to troubleshoot and resolve problems with existing AWS Cloud Controls

Nice-To Have Expertise

1. 1+ year of experience in Containerization technologies like Kubernetes, OpenShift, Docker

2. 1+ year of experience in Application Resiliency evaluation using AWS FIS

3. 1 + year of experience using Litmus for Chaos Engineering methods.

4. Exposure to RedHat OpenShift on AWS (ROSA)

ACL Digital is proud to be an Equal Employment Opportunity Employer. We are committed to diversity and inclusion regardless of age, race, color, ancestry, religion or creed, sex, national origin, sexual orientation, citizenship, marital status, disability, gender identity, veteran status or any other characteristic protected by law.

To ensure a fair and transparent hiring process, we encourage you to review the following resources:
Know Your Rights
Pay Transparency Act
IER Right to Work Document

If you are an individual with a disability and need a reasonable accommodation to assist with your job search or employment application, please contact us by completing our Accommodations for Applicants form. For any other queries, send an email to or call the ACL Digital HR Help/Accommodation at .

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Report this job

Similar Jobs

1d ago

Test Automation Engineer with ACI MTS ISO 20022

Vy Systems