Site Reliability Engineer

Full Time, onsite
The Wolf Works
Hybrid, United States of America

Salary undisclosed

Apply on

Dice

Original

Simplified

Site Reliability Engineer (SRE)
Location: Mountain View, CA (Hybrid)
Job Type: Full-time
Compensation: Up to $85/hr

Job Description:

As a Site Reliability Engineer (SRE), you will design, implement, and maintain complex data systems that support millions of customers. You will apply Cloud Native principles and best practices to ensure high availability, security, performance, and scalability of database systems. This is a hands-on role that involves working with cutting-edge technologies and maintaining critical infrastructure.

Key Responsibilities:

Design, build, and maintain CI/CD pipelines in Jenkins.
Deploy services in Kubernetes clusters using Helm, Kustomize, and similar tools.
Implement infrastructure changes in AWS with a deep understanding of AWS services.
Participate in on-call duties for pre-production and production systems, supporting multi-million users.
Write and review RCA (Root Cause Analysis) documentation to prevent the recurrence of incidents and share learnings.
Contribute to system upgrades, deployment automation, monitoring enhancements, and production changes.
Create operational playbooks, write how-to articles, and gain domain knowledge to drive team improvements.
Participate in FMEA (Failure Mode and Effects Analysis) testing, chaos testing, and security remediation efforts.
Share best practices for operational excellence and cost optimization.
Automate processes to reduce manual efforts and increase efficiency.
Continuously look for opportunities to increase developer velocity and productivity.

Qualifications:

Bachelor s or master s degree in Computer Science or a related technical field, or equivalent experience.
4+ years of hands-on experience with development and operations in AWS environments.
Expertise in performance monitoring, troubleshooting, and tuning.
Experience with AWS services and Cloud hosting.
Proficiency in DevOps automation using scripting languages.
Experience with programming languages such as Java, Python, or Ruby.
Knowledge of Docker, Kubernetes, and ArgoCD.
Experience with monitoring and observability tools such as Splunk, Wavefront, AppDynamics, Prometheus, and Tracing.

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Report this job

Similar Jobs

1d ago

Windows Server 2016/RHEL9 Systems Engineer- LOCAL TO CHICAGO

RKMC Inc.