M
Senior Site Reliability Engineer
Salary undisclosed
Checking job availability...
Original
Simplified
One of our clients in the entertainment platform space is looking for a Level 5 Reliability Engineer with a deep background in nix systems, networking, data analysis, and operating large-scale platforms to help build, scale, automate, and maintain our globally distributed infrastructure.
Key Responsibilities
Required Qualifications
Key Responsibilities
- Lead efforts to enhance system resiliency, observability, monitoring, and automation-ensuring global operations remain scalable and reliable.
- Collect, evaluate, and interpret significant volumes of server and application performance data using the Netflix Big Data platform to uncover optimization opportunities and identify trends or anomalies needing deeper analysis.
- Support ISP partners with technical guidance for integrating our Open Connect Appliances.
- Act as a Tier 3 escalation point and take part in the on-call rotation to address platform incidents.
Required Qualifications
- At least 5 years of experience in site reliability or operational engineering roles supporting large-scale, high-performance systems with a focus on uptime and efficiency.
- Bachelor's degree in Computer Science, Electrical or Computer Engineering, or equivalent experience (preferred).
- In-depth understanding of networking and protocols including TCP/IP, BGP, DNS, TLS, and HTTP/S; familiarity with CDN and HTTP caching/proxy technologies.
- Proficient in developing and maintaining automation using languages like Python.
- Advanced expertise in managing and troubleshooting Unix/Linux environments at scale-covering networking, storage, and OS fundamentals.
- Hands-on experience with distributed data processing tools such as Hive, Presto/Trino, or Spark SQL.
- Strong applied statistics knowledge with the ability to write code that detects anomalous system behavior.
- Some familiarity with containerization and orchestration technologies like Docker and Kubernetes.
- Effective communicator and collaborator, comfortable working with internal teams and external partners alike.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job One of our clients in the entertainment platform space is looking for a Level 5 Reliability Engineer with a deep background in nix systems, networking, data analysis, and operating large-scale platforms to help build, scale, automate, and maintain our globally distributed infrastructure.
Key Responsibilities
Required Qualifications
Key Responsibilities
- Lead efforts to enhance system resiliency, observability, monitoring, and automation-ensuring global operations remain scalable and reliable.
- Collect, evaluate, and interpret significant volumes of server and application performance data using the Netflix Big Data platform to uncover optimization opportunities and identify trends or anomalies needing deeper analysis.
- Support ISP partners with technical guidance for integrating our Open Connect Appliances.
- Act as a Tier 3 escalation point and take part in the on-call rotation to address platform incidents.
Required Qualifications
- At least 5 years of experience in site reliability or operational engineering roles supporting large-scale, high-performance systems with a focus on uptime and efficiency.
- Bachelor's degree in Computer Science, Electrical or Computer Engineering, or equivalent experience (preferred).
- In-depth understanding of networking and protocols including TCP/IP, BGP, DNS, TLS, and HTTP/S; familiarity with CDN and HTTP caching/proxy technologies.
- Proficient in developing and maintaining automation using languages like Python.
- Advanced expertise in managing and troubleshooting Unix/Linux environments at scale-covering networking, storage, and OS fundamentals.
- Hands-on experience with distributed data processing tools such as Hive, Presto/Trino, or Spark SQL.
- Strong applied statistics knowledge with the ability to write code that detects anomalous system behavior.
- Some familiarity with containerization and orchestration technologies like Docker and Kubernetes.
- Effective communicator and collaborator, comfortable working with internal teams and external partners alike.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job