Epicareer Might not Working Properly
Learn More

Software Engineering PMTS

Salary undisclosed

Apply on


Original
Simplified
Job Details

(Lead/Principal/Architect) Software Engineer - Availability Engineering
Our Availability engineering teams are responsible for driving 'best in class' availability, you will work with delivery teams deploying Customer facing / supporting software across a multi substrate engineering platform that collectively ships hundreds of features to production for tens of millions of users across all industries every day. Our users count on our applications and platforms to be highly reliable, lightning fast, supremely secure, and to preserve all of their customizations and integrations every time we ship. You will need deep experience with concurrency, large scale systems, proficiency with solving real-world data management challenges, a strong understanding of how to craft solutions that are highly available, and a proven ability to design, develop, and optimize the core back-end systems.

What you'll be doing:
  • As part of a specialist unit focused on availability and resilience, you will embed with delivery teams, acting in a Lead capacity, creating bandwidth and prioritizing a focus on corrective and proactive availability measures.
  • You will be contributing to designing, developing, debugging, and operating resilient applications and platforms deployed across distributed systems that run across thousands of compute nodes in multiple data centers.
  • You will champion resiliency best practices; Observability tool integration, horizontal/vertical sizing & auto-scaling, release rollback & recovery workflows, integration tests and validation procedures for applications running on self host infra as well as public cloud platforms such as AWS, Google Cloud Platform, Azure & Alibaba
  • Using and contributing to open source technology (Spinnaker, Zookeeper, etc.)
  • Developing / leveraging Infrastructure-as-Code using Terraform.
  • Building / integrating with API's and microservices deployed on containerization frameworks such as Kubernetes, Docker, Mesos etc
  • Resolving complex technical issues and driving innovations that improve system availability, resilience, and performance
  • You have experience balancing live runtime management, feature delivery, and retirement of technical debt
  • Participate in the team's on-call rotation to address complex problems in real-time and keep services operational and highly available

Required Skills:
  • A related technical degree required, (masters preferred)
  • 15+ years of hands on software development experience
  • 5+ year in a Tech Lead, Principal or Architect capacity
  • Ability to reverse engineer solutions via independent code and architecture review, envision, define and then contribute to delivery of availability improvement refactoring projects
  • Mastery of one or more object oriented delivery with languages such as Java, Golang, APEX, Python.
  • Deep experience working with core web technologies: HTTP, JSON, REST, XML
  • Proficiency with databases including Oracle or other relational and/or NoSQL solutions
  • Experience owning and operating multiple instances of a critical service
  • Running critical infrastructure services; monitoring, alerting, logging, tracing and reporting
  • Subject matter expertise on Service ownership best practices, SLO/I/A definition, driving proactive operational awareness and experience with Incident / Problem management
  • Thorough knowledge of Agile development methodology with experience in both Test / Behavioral Driven Development practices
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job