RunPod is pioneering the future of AI and machine learning, offering cutting-edge cloud infrastructure for full-stack AI applications. Founded in 2022, we are a rapidly growing, well-funded company with a remote-first organization spread globally. Our mission is to empower innovators and enterprises to unlock AI's true potential, driving technology and transforming industries. Join us as we shape the future of AI.
As our organization continues to push the boundaries of high-performance networking, we are seeking a full-time, remote Network Software Engineer to join our team. This technical position will be crucial in designing, implementing, and optimizing our advanced networking infrastructure. The ideal candidate will have a deep understanding of low-level networking systems and a passion for optimizing network performance at scale. Expertise with Linux networking stack, extensive experience with high-performance NICs, and in writing code for network accelerators. This role offers the opportunity to work with cutting-edge networking technologies, solve complex problems at scale, and contribute to the performance and efficiency of our critical systems.
If you are passionate about pushing the boundaries of network performance in AI compute, we want to hear from you. Join our team and help shape the future of high-performance networking!
Key aspects of our Network Software Engineering approach include:
- Performance-Driven Development: Focus on writing high-performance networking code that can handle massive throughput and low latency requirements.
- Hardware-Software Co-design: Work closely with hardware capabilities, optimizing software to leverage advanced NIC features.
- Protocol Optimization: Continuously refine and enhance network deployments to meet the demands of our distributed systems.
- Scalability at the Core: Design and implement networking solutions that can scale to support thousands of nodes across multiple data centers.
- Security-First Mindset: Integrate robust security measures at every layer of the networking stack.
As a Network Software Engineer in our team, you'll be at the forefront of high-performance networking, using your deep technical skills to build and optimize systems that push the boundaries of what's possible in network performance. You'll work on challenging projects that require innovative solutions, always with an eye towards efficiency, scalability, and reliability.
Responsibilities:
- Design and implement high-performance networking software for Linux environments
- Develop and maintain software for high-performance NICs (e.g., Mellanox/NVIDIA UFM)
- Implement and optimize network protocols at OSI layers 1-4
- Design and implement secure networking solutions, including mTLS/IPSEC
- Collaborate with hardware supply teams to co-design software solutions that leverage advanced NIC features
- Troubleshoot complex networking issues in large-scale distributed environments
- Participate in code reviews and contribute to the team's technical standards
- Implement networking systems which provide isolation between multi tenant workloads.
Requirements:
- Deep knowledge of the Linux networking stack and kernel internals
- Proven experience writing and optimizing code for network accelerators like XDP or VPP at scale
- Comprehensive understanding of OSI layers 1-4, including practical implementation experience
- Experience with high performance network accelerators like XDP (eXpress Data Path) and VPP (Vector Packet Processing)
- Strong background in TLS/IPSEC/VXLAN implementation and optimization
- Proficiency in C, with a focus on high-performance, low-level programming
- Demonstrated ability to optimize network performance in large-scale, high-throughput environments
- Strong communication skills and ability to explain complex networking concepts to diverse audiences
- Successful completion of a background check
Preferred:
- Master's degree or PhD in Computer Science, Computer Engineering, or a related field
- Contributions to open-source networking projects or research publications in the field
- Experience with DPDK (Data Plane Development Kit) or similar kernel-bypass technologies
- Extensive experience working with high-performance NICs, particularly Mellanox/NVIDIA UFM
- Familiarity with SmartNIC programming and offloading techniques
- Knowledge of networking technologies like RDMA (Remote Direct Memory Access)
- Experience with network simulation and modeling tools
- Knowledge of network requirements and profiles for AI workloads (NCCL)
- Experience with debugging Linux containers (LXC), Docker, and virtual machines (VM) issues.
- A strong understanding of network peering and cost optimization for data centers.
What You’ll Receive
- The competitive base pay for this position ranges from $150,000- $200,000. Factors that may be used to determine your actual pay may include your specific job related knowledge, skills and experience
- Stock options
- The flexibility of remote work with an inclusive, collaborative team.
- An opportunity to grow with a company that values innovation and user-centric design.
- Generous vacation policy to ensure work-life harmony and well-being.
- Contribute to a company with a global impact based in the US, Canada, and Europe.
RunPod is committed to maintaining a workplace free from discrimination and upholding the principles of equality and respect for all individuals. We believe that diversity in all its forms enhances our team. As an equal opportunity employer, RunPod is committed to creating an inclusive workforce at every level. We evaluate qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, marital status, protected veteran status, disability status, or any other characteristic protected by law.