Epicareer Might not Working Properly
Learn More

Site Reliability Engineer

Salary undisclosed

Apply on


Original
Simplified
job summary:

Location: Westlake, TX or Merrimack, NH



Required Skills:




  • Datadog
  • Kubernetes
  • AWS (EKS) and Azure (AKS) would prefer AWS
  • On-call experience running incidents
  • Development background: Ansible, Python, node, Javascript, Jenkins, groovy
  • Understanding in API testing tools (SoapUI, Postman)
  • Understanding of Agile Methodology
  • Experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, Chef)



location: Westlake, Texas

job type: Contract

salary: $65 - 66 per hour

work hours: 8am to 5pm

education: Bachelors



responsibilities:


The Expertise and Skills we're Looking For




  • Bachelor's degree or higher in a technology related field (e.g. Engineering, Computer Science, etc.) required
  • 5-8+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale
  • Hands-on experience with Public Cloud environments, preferably AWS and Azure. Certifications a plus
  • Hands-on experience with container orchestration, preferably with Kubernetes
  • Working experience on batch processing using tools like Control M, Informatica etc.
  • Ability to solve application issues on Unix/Linux with J2EE, WebSphere, Tomcat and SQL
  • Exposure to basic OS level scripting languages such as Korn/Bash/Jscript
  • Familiarity with ITIL processes like Incident management, Change/Problem management
  • Balancing delivery with ad hoc workloads and re-evaluating priorities
  • Solid understanding of Cloud Computing and DevOps concepts including CI/CD pipelines
  • Hands on experience with one or more observability tools (Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.)
  • Use Datadog, Catchpoint, Splunk & Grafana for Application Observability and monitoring of app & infrastructure
  • Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale
  • Proven experience in maintaining scalability and resiliency of complex environment.
  • Proven experience in implementing advanced observability practices and techniques at scale.
  • Provide enterprise Cloud and Platform Engineering support for production environments and ability to participate in on-call rotation to provide solutions.
  • Experience in Cloud development (AWS and Azure) and migration skills; Experience with building and operating highly resilient platforms in public cloud environments
  • Ability to triage, complete root cause analysis, and be decisive under pressure
  • Experience managing and interpreting large datasets using query languages and visualization tools
  • Proficient communication skills with an ability to reach both technical and non-technical audience
  • Ability to learn new software, method and practices and bringing them to our developers
  • Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationships
  • Proven experience performing chaos testing to build confidence in the system's capability to withstand turbulent conditions in production
  • Understanding in API testing tools (SoapUI, Postman)
  • Understanding of Agile Methodology
  • Experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, Chef)
  • Handle a huge fleet of on-prem servers (including security & patching oversight)
  • Handle hundreds of SSL certificates for all applications in scope
  • Use Ansible & Python for automating day-to-day activities, Web development with Django, JavaScript
  • Collaboration and Relationships - Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationship





qualifications:

  • Experience level: Experienced
  • Minimum 5 years of experience
  • Education: Bachelors


skills:
  • Reliability



    Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

    At Randstad Digital, we welcome people of all abilities and want to ensure that our hiring and interview process meets the needs of all applicants. If you require a reasonable accommodation to make your application or interview experience a great one, please contact

    Pay offered to a successful candidate will be based on several factors including the candidate's education, work experience, work location, specific job duties, certifications, etc. In addition, Randstad Digital offers a comprehensive benefits package, including health, an incentive and recognition program, and 401K contribution (all benefits are based on eligibility).

    This posting is open for thirty (30) days.



  • Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
    Report this job