Software Engineer (Infrastructure)

Posted 7 Days Ago
Be an Early Applicant
Cambridge, Cambridgeshire, England
Mid level
Security • Cybersecurity
The Role
The Software Engineer (Infrastructure) will manage and optimize NVIDIA GPU servers and cloud environments for AI and HPC projects, focusing on server maintenance, security, performance monitoring, and collaboration with researchers and engineers.
Summary Generated by Built In

Darktrace has more than 2,500 employees located globally. Founded by mathematicians and cyber defence experts in 2013, Darktrace is a global leader in cyber security AI, delivering complete AI-powered solutions in its mission to free the world of cyber disruption.

For over a decade, Darktrace has pioneered a proactive, AI-native approach to security. Our roots lie deep in innovation. The Darktrace AI Research Centre based in Cambridge, UK, has conducted research establishing new thresholds in cybersecurity, with technology innovations backed by over 200 patents and pending applications.

Today, Darktrace is a global leader in cybersecurity AI, delivering the essential cybersecurity platform to protect organisations today and for an ever-changing future.

What will I be doing:

Darktrace is seeking an experienced Infrastructure Engineer to manage, maintain, and optimize a dedicated NVIDIA GPU server and cloud environments for innovation projects. Responsibilities include setting up, configuring, and maintaining the servers and software stack. A successful candidate will work directly with Darktrace researchers and software engineers, ensuring optimal performance and availability for ongoing AI and HPC (high-performance computing) projects.

This is a hybrid role, with a compulsory attendance of 2 days a week in the Cambridge office.

This role focuses on maintaining and optimising the Linux operating system, file systems, and software stack (Cuda, PyTorch, Python etc) for machine learning projects as well as setting up and configuring NVIDIA HGX servers (installing and updating software, managing user access, and ensuring optimal performance) and cloud infrastructure for GPU compute projects (managing access and ensuring optimal performance). Additional responsibilities include:

  • Monitoring server and application performance, identifying bottlenecks, and taking corrective actions to maintain high availability,
  • Implementing and maintaining server security, including patch management, vulnerability scanning, and intrusion detection,
  • Collaborating with network administrators, hardware engineers, and researchers to troubleshoot and resolve server and software-related issues,
  • Working closely with the project manager to ensure efficient resource allocation, server utilisation and scaling across multiple teams,
  • Collaborating with data scientists and machine learning engineers to understand their software requirements and provide guidance on best practices,
  • Assisting in training team members on the capabilities and usage of the HGX servers and the software environment,
  • Developing multi-use tooling to work with the HPC environments.

What experience do I need:

We welcome applications from engineers with strong problem-solving and creative thinking skills as well as excellent communication and the ability to work in a collaborative team environment. You will be an independent thinker with a startup mindset. Technology-wise, you will have experience in system administration, preferably with a focus on HPC platforms, GPU-based servers, and machine learning software environment as well as a familiarity with AI and HPC provisioning and management, both on-premises and in the cloud. You will have experience with server virtualization technologies and containerization and well versed with the linux operating system. You'll also ideally have:

  • Strong knowledge of NVIDIA HGX server architectures and components,
  • Strong knowledge of AWS or Azure Cloud environments,
  • Experience with NVIDIA GPU technologies, such as NVLink, NVSwitch, and Tensor Core GPUs,
  • Experience with machine learning frameworks and libraries, such as PyTorch and associated system optimisations,
  • Experience with NAS servers,
  • Experience with data version control systems.

Benefits we offer:

  • 23 days’ holiday + all public holidays, rising to 25 days after 2 years of service,
  • Additional day off for your birthday,
  • Private medical insurance which covers you, your cohabiting partner and children,
  • Life insurance of 4 times your base salary,
  • Salary sacrifice pension scheme,
  • Enhanced family leave,
  • Confidential Employee Assistance Program,
  • Cycle to work scheme.

#LI-Hybrid

Top Skills

Python
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Atlanta, GA
1,763 Employees
On-site Workplace
Year Founded: 2013

What We Do

Darktrace, a global leader in cyber security AI, delivers world-class technology that protects over 5,500 customers worldwide from advanced threats, including ransomware and cloud and SaaS attacks.

The company’s fundamentally different approach applies Self-Learning AI to enable machines to understand the business in order to autonomously defend it.

Headquartered in Cambridge, UK, the company has 1,500 employees and over 30 offices worldwide.

Darktrace was named one of TIME magazine’s ‘Most Influential Companies’ for 2021.

Similar Jobs

Hudson River Trading Logo Hudson River Trading

Software Engineer - Treasury Infrastructure

Artificial Intelligence • Fintech • Other • Automation
Hybrid
London, Greater London, England, GBR
1000 Employees

Cloudflare Logo Cloudflare

Senior Software Engineer, Durable Objects (DO)

Cloud • Information Technology • Security • Software • Cybersecurity
2 Locations
3900 Employees
London, Greater London, England, GBR
2359 Employees
145K-157K Annually
Cambridge, Cambridgeshire, England, GBR
2724 Employees

Similar Companies Hiring

Coro Thumbnail
Software • Security • Information Technology • Data Privacy • Cybersecurity • Cloud • Artificial Intelligence
Chicago, IL
330 Employees
MacPaw Thumbnail
Software • Security • Information Technology • Data Privacy • Cybersecurity • App development
Cambridge, MA
550 Employees
Silverfort Thumbnail
Security • Sales • Information Technology • Cybersecurity • Automation
GB
357 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account