Epicareer Might not Working Properly
Learn More
M

Senior HPC Architect

  • Full Time, onsite
  • Motion Recruitment Partners, LLC
  • On Site, United States of America
Salary undisclosed

Apply on


Original
Simplified
An industry leader in the chip space is seeking a highly skilled HPC Queue Architect to design, implement, and optimize our high-performance computing (HPC) queuing systems. The ideal candidate will have a deep understanding of HPC architecture and workload management, and will play a critical role in ensuring efficient resource allocation and job scheduling across our HPC infrastructure.

Key Responsibilities:
  • Queue Management:
    • Design and implement queuing systems for optimal workload management and resource allocation.
    • Monitor and analyze queue performance, identifying bottlenecks and proposing improvements.
  • System Architecture:
    • Collaborate with HPC engineers to develop and maintain the overall architecture of the HPC environment.
    • Ensure that the queuing system integrates seamlessly with HPC resources, storage, and networking components.
  • Job Scheduling:
    • Develop and manage job scheduling policies to optimize resource utilization and minimize job wait times.
    • Implement and configure scheduling software (e.g., Slurm, PBS, Torque) to meet the needs of diverse workloads.
  • Performance Tuning:
    • Conduct performance analysis and benchmarking of queuing systems and HPC resources.
    • Provide recommendations for hardware and software upgrades to enhance system performance.
  • Collaboration and Support:
    • Work closely with researchers and users to understand their HPC needs and provide support for job submissions and troubleshooting.
    • Develop and maintain documentation for queuing systems and job scheduling processes.

Qualifications:
  • Education:
    • Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).
  • Experience:
    • 3+ years of experience in HPC architecture, job scheduling, and workload management.
    • Hands-on experience with queuing systems and job schedulers in HPC environments.
  • Skills:
    • Strong understanding of HPC hardware, networking, and storage technologies.
    • Proficiency in scripting languages (e.g., Python, Bash) for automation and performance analysis.
    • Familiarity with cluster management tools and performance monitoring software.
  • Certifications:
    • Relevant certifications in HPC or cloud computing are a plus.

Personal Attributes:
  • Strong analytical and problem-solving skills.
  • Excellent communication and interpersonal abilities.
  • Ability to work collaboratively in a fast-paced, team-oriented environment.
  • Detail-oriented with a focus on optimizing performance and user experience.

What We Offer:
  • Competitive salary and comprehensive benefits package.
  • Opportunities for professional development and continuing education.
  • A collaborative and innovative work environment.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job