HPC Engineer - Hybrid

Posted 13 Days Ago
Be an Early Applicant
75063, Irving, TX
Mid level
Artificial Intelligence • Healthtech • Biotech
Where Molecular Science Meets Artificial Intelligence – Revolutionizing Cancer Care.
The Role
The HPC Engineer manages Linux HPC systems, optimizing software and hardware performance, providing user support, and ensuring data security while planning for system upgrades.
Summary Generated by Built In

Position Summary
An HPC (High Performance Computing) Engineer is responsible for implementing, and maintaining a High Performance Computing (HPC) systems primarily running on Linux operating systems, which involves tasks like installing, configuring, optimizing, and troubleshooting hardware and software components within a complex cluster environment, often requiring expertise in parallel processing, network architecture, and job scheduling tools like LSF, while ensuring optimal system performance and user support. 
Job Responsibilities

  • Installing and configuring Linux operating systems on HPC clusters, including network settings, storage systems, and parallel file systems like GPFS. 

  • Monitoring system performance, identifying bottlenecks, and tuning system parameters to maximize computational efficiency. 

  • Managing user job submissions and queues using tools like LSF or SLURM, ensuring fair allocation of computing resources. 

  • Implementing security measures to protect HPC systems and data from unauthorized access. 

  • Diagnosing and resolving hardware and software issues, applying updates and patches, and performing routine system maintenance. 

  • Providing technical assistance to researchers and other users on the HPC system, including account management and application support. 

  • Forecasting future computing needs and planning for system upgrades or expansions 

Required Qualifications

  • 4 years managing Linux servers, direct experience managing HPC clusters preferred.

  • Technical experience with system configuration, implementation, management and user support.

  • Strong understanding of Linux system administration

  • Expertise in parallel computing concepts and programming paradigms.

  • Knowledge of high-performance networking technologies

  • Familiarity with cluster management tools (e.g., LSF, Slurm, PBS)

  • Experience with distributed file systems (Lustre, Ceph, GPFS)

  • Scripting languages like Python and Shell scripting (e.g.,bash,ksh) for automation

  • Understanding of computer architecture and performance optimization techniques

  • Strong Linux system administration skills: Expertise in Linux commands, system configuration, and troubleshooting.

  • HPC cluster knowledge: Understanding of cluster architectures, network topologies (like InfiniBand), and parallel processing concepts.

  • Job scheduling tools: Proficiency with job scheduling systems like LSF or SLURM

  • Performance analysis tools: Familiarity with tools to monitor and analyze system performance

  • Scripting languages: Ability to write scripts (e.g., Bash, Python) for automation and system management

  • Networking expertise: Understanding of network protocols, network troubleshooting, and high-speed networking technologies

  • Storage management: Knowledge of parallel file systems and data management strategies

Preferred Qualifications

  • Experience with HPC schedulers and resource managers

  • Experience writing user documentation

  • Experience developing and delivering training for users

  • Strong technical and analytical skills

  • Strong verbal and written communication skills

  • Always maintains the highest level of professionalism when interacting with internal and external customers

  • Demonstrates a high-energy, positive attitude and commitment to quality customer service

  • Contributes to a positive team environment within the center by demonstrating a strong work ethic, effectively communicating with others, and proactively anticipating center and user needs

  • Experience coordinating and running support teams

Physical Demands

  • Ability to lift, move and install HPC data center hardware and supplies.

  • Standing for extended periods while performing data center related tasks.

Training

  • All job specific, safety, and compliance training are assigned based on the job functions associated with this employee.

Other

  • This position requires periodic travel and some evenings, weekends, and/or holidays.

  • Job may require after-hours response to emergency issues.

  • Periodically scheduled on-call may require after-hours response for technical emergencies not explicitly related to assigned job responsibilities

Conditions of Employment:  Individual must successfully complete pre-employment process, which includes criminal background check, drug screening, credit check ( applicable for certain positions) and reference verification.

This job description reflects management’s assignment of essential functions. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities to this job at any time.

 

Caris Life Sciences is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, gender identity, sexual orientation, age, status as a protected veteran, among other things, or status as a qualified individual with disability.

Top Skills

Ceph
Gpfs
Infiniband
Linux
Lsf
Lustre
Python
Shell Scripting
Slurm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Irving, TX
1,700 Employees
Hybrid Workplace
Year Founded: 2008

What We Do

Caris Life Sciences was founded in 2008 with a simple but powerful purpose – to help improve the lives of as many people as possible. With transformative technologies informed by massive amounts of big data, we are revolutionizing healthcare to provide physicians and patients with the highest quality information about their disease – from detecting it early and determining how best to treat it, to developing the next wave of novel therapies.

Similar Jobs

General Motors Logo General Motors

Senior Embedded Software Engineer - Vehicle Programming

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
2 Locations
165000 Employees

Two Barrels LLC Logo Two Barrels LLC

Senior Front-End Web Developer

eCommerce • Legal Tech • Professional Services • Software • Data Privacy
Remote
Hybrid
4 Locations
950 Employees
150K-150K Annually

Two Barrels LLC Logo Two Barrels LLC

Junior Front-End Web Developer

eCommerce • Legal Tech • Professional Services • Software • Data Privacy
Remote
Hybrid
4 Locations
950 Employees
75K-75K Annually

The PNC Financial Services Group Logo The PNC Financial Services Group

Infrastructure Architect Sr. - Azure Cloud Architect

Machine Learning • Payments • Security • Software • Financial Services
Hybrid
Dallas, TX, USA
56000 Employees

Similar Companies Hiring

Stepful Thumbnail
Software • Healthtech • Edtech • Artificial Intelligence
New York, New York
60 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account