Epicareer Might not Working Properly
Learn More

Senior Software Engineer/SRE - Automated Disaster Recovery

Salary undisclosed

Apply on

Availability Status

This job is expected to be in high demand and may close soon. We’ll remove this job ad once it's closed.


Original
Simplified
The Team: We are the Data and Runtime Stability DR team, charged to administer the end-to-end testing of Bloomberg's datacenters for disaster recovery scenarios of numerous services which support applications that constitute Bloomberg's line of products! On any given day we're inventing, engineering, developing, building, coding, trouble-shooting and maintaining a wide range of: tools, monitors, frameworks, interfaces, protocols, solutions and best-practices. These components stitch together a robust suite of automated and self-healing systems that manage the services that Data and Runtime provides to the rest of the firm. We improve uptime, provision and balance resources, architect and coordinate operational procedures, administer backup and recovery processes, coordinate maintenance windows, manage replication and oversee workflows.
What's in it for you: You will be part of a team that works to help meet company and regulatory defined Disaster Testing scenarios. Manages and supports solutions to support various disaster recovery tools, creating these applications to integrate the services they provide into the Bloomberg operational environment as well as Bloomberg products. This in-house tooling suite is required to test our datacenters in an automated, scale-able and self driven fashion, complete with accompanying metrics and transparency tools that would be required for internal and external clients. Tooling is expected to be written with end-to-end unit testing and continuous integration to provide the highest level of stability.
We have high-level-ownership and "the classic SRE responsibilities" such as: system tuning, performance analysis and the management of patches, installations, and upgrades; you'll also have immediate access to the experts that are designing and coding the Bloomberg specific components, APIs and methods. This means insight and entry to the lowest levels of how Bloomberg applications interact with each other and the Runtime environment for the purposes of both in-depth troubleshooting and enhancing stability, reliability, performance and feature-set.
You need to have:
  • 4+ years of programming experience with an object-oriented programming language (Python)
  • A degree in Computer Science, Engineering or similar field of study or equivalent work experience
  • 5+ years experience with Unix, Unix tools and shell scripting
  • Experience with Chaos Engineering
  • Deep understanding of TCP/IP networking and the OSI model
  • Experience designing and automating repeatable processes in a client/server modeled environment
  • Ability to build and maintain highly sophisticated, available, performant, and scalable, critically important systems
  • Experience building monitors and alarms for system performance, status and stability
  • Experience with CI/CD systems and writing robust unit and system tests
We'd love to see:
  • Basic knowledge in Rapid framework
  • Experience analyzing existing systems and identifying shortcomings with concrete ideas for improvement
  • Experience designing stable, long-lasting APIs
  • Experience with Splunk/Humio and Grafana
  • Experience with GitHub and JIRA
  • Passion for product ownership
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job