Epicareer Might not Working Properly
Learn More

HPC Engineer/Architect

Salary undisclosed

Checking job availability...

Original
Simplified
Bravens Inc., a wholly owned subsidiary of Ampcus Inc., is an information technology consulting and services company. Bravens is a leader in providing tailored staffing solutions across both IT and non-IT industries. We are in search of a highly motivated candidate to join our talented team and contribute to our ongoing success.

Job Title: HPC Engineer/Architect

Location(s): New York, NY (Hybrid)

Job Summary

  • You will support day-to-day operations of large-scale parallel file systems, deploy and maintain Linux HPC infrastructure across multiple data centers, and assist HPC engineers and architects with day-to-day operations and tickets.
  • Support day-to-day operations of large-scale parallel file systems.
  • Deploy and Maintain Linux HPC infrastructure across multiple datacenters.
  • Assist HPC engineers and architects with day-to-day operations and tickets.

Required Skills

  • Linux Operating Systems (RHEL/CentOS), Parallel file system (GPFS), Job Scheduler LSF/Slrm
  • Anxible, Python, Shell scripting
  • GPU-based compute infrastructure (including CUDA)
  • CentOS 4.5
  • HPCC

Responsibilities

  • Design, architect and oversee implementation of Linux based HPC clusters and storage
  • Deploy physical hardware using HPC deployment tools and configuration and orchestration tools (Ansible)
  • Parallel file system (GPFS) performance tuning, monitoring and troubleshooting
  • Perform systems benchmarking, and developing automated tests for the HPC environment, ensuring the reliability and efficiency of our computational infrastructure
  • Infiniband network maintenance and troubleshooting
  • Automate and monitor the HPC user lifecycle process
  • Slurm installation, configuration, performance tuning and troubleshooting
  • Plan, design and implement a transition from the LSF scheduler to Slurm
  • Manage the Slurm scheduler and translate Research policies into scheduler configurations
  • Consult with faculty and students to develop research pipelines for use on the HPC cluster
  • Develop and maintain user lifecycle software suite in Python, implement CI/CD pipeline
  • Test and automate upgrades of critical system applications using Ansible and shell scripts.
  • The ability to communicate effectively with clinicians, researchers, and other team members to develop technological solutions is key

Qualifications

  • Experience working in a large-scale research based HPC environment
  • Proven experience working with distributed file storage solutions (i.e., GPFS)
  • Experience with deploying and troubleshooting Linux Operating Systems (RHEL/CentOS)
  • Experience with Scripting and Automation (Ansible, Python, Shell Scripting)
  • Solid understanding of job schedulers (LSF/SLURM)
  • Experience with GPU-based compute infrastructure (including CUDA)

Years of Experience: 16-20 Years of Experience

Bravens is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, protected veterans or individuals with disabilities.