Senior Software Engineer/SRE - Automated Disaster Recovery

Full Time, onsite
Bloomberg
On Site, United States of America

Salary undisclosed

Apply on

Dice

Availability Status

This job is expected to be in high demand and may close soon. We’ll remove this job ad once it's closed.

Original

Simplified

The Team: We are the Data and Runtime Stability DR team, charged to administer the end-to-end testing of Bloomberg's datacenters for disaster recovery scenarios of numerous services which support applications that constitute Bloomberg's line of products! On any given day we're inventing, engineering, developing, building, coding, trouble-shooting and maintaining a wide range of: tools, monitors, frameworks, interfaces, protocols, solutions and best-practices. These components stitch together a robust suite of automated and self-healing systems that manage the services that Data and Runtime provides to the rest of the firm. We improve uptime, provision and balance resources, architect and coordinate operational procedures, administer backup and recovery processes, coordinate maintenance windows, manage replication and oversee workflows.
What's in it for you: You will be part of a team that works to help meet company and regulatory defined Disaster Testing scenarios. Manages and supports solutions to support various disaster recovery tools, creating these applications to integrate the services they provide into the Bloomberg operational environment as well as Bloomberg products. This in-house tooling suite is required to test our datacenters in an automated, scale-able and self driven fashion, complete with accompanying metrics and transparency tools that would be required for internal and external clients. Tooling is expected to be written with end-to-end unit testing and continuous integration to provide the highest level of stability.
We have high-level-ownership and "the classic SRE responsibilities" such as: system tuning, performance analysis and the management of patches, installations, and upgrades; you'll also have immediate access to the experts that are designing and coding the Bloomberg specific components, APIs and methods. This means insight and entry to the lowest levels of how Bloomberg applications interact with each other and the Runtime environment for the purposes of both in-depth troubleshooting and enhancing stability, reliability, performance and feature-set.
You need to have:

4+ years of programming experience with an object-oriented programming language (Python)
A degree in Computer Science, Engineering or similar field of study or equivalent work experience
5+ years experience with Unix, Unix tools and shell scripting
Experience with Chaos Engineering
Deep understanding of TCP/IP networking and the OSI model
Experience designing and automating repeatable processes in a client/server modeled environment
Ability to build and maintain highly sophisticated, available, performant, and scalable, critically important systems
Experience building monitors and alarms for system performance, status and stability
Experience with CI/CD systems and writing robust unit and system tests

We'd love to see:

Basic knowledge in Rapid framework
Experience analyzing existing systems and identifying shortcomings with concrete ideas for improvement
Experience designing stable, long-lasting APIs
Experience with Splunk/Humio and Grafana
Experience with GitHub and JIRA
Passion for product ownership

Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.

Report this job

Similar Jobs

4d ago

Project Engineer

Bauer Foundation Corp Careers

On Site, United States of America

Full Time, onsiteFull Time, onsite

Salary undisclosed

4d ago

Aerospace NPI Engineer (Midwest Territory)

Quest Global