Job Description
We are seeking a Network Engineer to design, implement, and manage high-performance networks for HPC and AI infrastructure.
Candidates will work on cutting-edge technologies, including InfiniBand, optical networking, and advanced Linux-based systems, contributing to scalable, secure, and high-availability network solutions. They should also have expertise in IP routing protocols (BGP, OSPF) and network automation (Ansible, Nornir & Netmiko).
YOUR RESPONSIBILITIES:
- Monitor the performance and health of InfiniBand fabrics, including switches, host adapters, and nodes, using existing tools and contribute to developing new monitoring solutions where necessary.
- Investigate & Help diagnose network connectivity issues, performance bottlenecks, and component failures.
- Collaborate with cross-functional teams to support HPC clusters and ensure smooth network operation.
- Assist with the deployment and configuration of network infrastructures, including large-scale fabric installations from initial setup to operational readiness.
- Maintain and update network documentation and workflows to align with organizational standards.
- Contribute to the requirements for deployments and guide cross-functional teams during implementation.
- Develop and implement advanced monitoring tools and strategies for network performance.
- Work with senior colleagues to research technologies to improve scalability and security.
- Work with senior colleagues to help optimization initiatives, ensuring maximum efficiency, security, and performance.
- Contribute to new network technologies into existing infrastructure.
- Troubleshoot for complex, high-impact network issues across multiple sites.
- Support with Technical Network incidents, working with other teams to resolve issues quickly.
- Work with your team to support network automation solutions using Ansible, Nornir, and Netmiko.
- Contribute ideas and initiatives to ensure network changes are repeatable, efficient, and error-free.
YOUR REQUIREMENTS:
- Knowledge of InfiniBand configuration and management.
- Familiarity with optical networking hardware and Linux system administration.
- Have knowledge of one scripting language (e.g., Python, Bash).
- Analytical and troubleshooting skills.
- Ability to collaborate effectively in team environments.
- Willingness to travel to data centers for deployments and support.
2 years of experience in network engineering, with a focus on high-performance environments is Ideal but not mandatory - Understanding in InfiniBand, RDMA, and advanced network architectures is a Bonus
- Certifications (e.g., CCIE, NVIDIA DPU Certification) is a Bonus
Linux System & Network Security:
- Contribute to network-related Linux administration, ensuring high availability, security, and performance.
- Work with an understanding of security measures, including firewalls, VPNs, and access control policies.
- HPC & InfiniBand understanding:
- Contribute to designing and implementation of HPC network architectures, focusing on InfiniBand configurations for performance-critical environments.
- Work alongside senior colleagues for integration and management of InfiniBand for high-throughput, low-latency computing systems.
- Provide technical input on HPC interconnect issues, optimizing performance across large clusters.
With us, you will work towards the future of HPC: From new, sustainable building methods for data centers to cooling concepts to software solutions for accelerated compute.
Your approaches count: In official exchange formats or spontaneously at the coffee machine. At Northern Data, it's the best idea that counts - not the hierarchy. We’re looking forward to getting your inputs!
You make the difference in the company: Unlike in established corporations, at Northern Data you will really help shape things. From implementing new departments, to optimizing processes and culture.
Best-in-class partners: The best work with Northern Data. This means a knowledge and time advantage from which your career and our customers benefit equally.
Green by heart: Sustainability is at the core of Northern Data. With us, you actively work on the carbon neutrality of datacenters worldwide. Beginning with our infrastructure and continuing with the solutions for our clients, we work towards a green future.
Home Office facts: Work with our international and virtual team flexible from home. And of course, your hardware wishes will be fulfilled to make your ideas for next level HPC come true.
Your wellness matters: At Northern Data we have regular wellbeing initiatives that are designed to promote wellness, diversity, inclusion, and much more, ensuring a supportive and enriching environment for our global team.
Top Skills
What We Do
At Northern Data Group, we believe unlimited High Performance Computing (HPC) will unlock unprecedented opportunities for research and development, business, and ultimately human progress.
We power innovation through market-leading HPC infrastructure, operating across our three business divisions: Taiga Cloud, Ardent Data Centers and Peak Mining.
Our global organization is rapidly becoming a world leader for GPU-based solutions by designing and operating ultra-efficient green HPC infrastructure.
We uniquely combine intelligent and sustainable data centers, cutting-edge hardware and self-developed software for various HPC applications including Generative AI, Machine Learning and Bitcoin Mining.
We operate from large-scale custom data centers and proprietary containerized data centers for ultimate site selection flexibility