CodeHunter

Mid-Level Site Reliability Engineer

Posted 2 Days Ago

Be an Early Applicant

McLean, VA

Mid level

Security • Software • Cybersecurity

The Role

As a Mid-Level Site Reliability Engineer, you will ensure the availability, resiliency, and scaling of our SaaS products. Responsibilities include optimizing performance, managing disaster recovery, refining DevSecOps practices, and automating CI/CD processes. You'll collaborate with DevOps Engineering to exceed SLAs and implement monitoring tools, all while driving continuous improvement in system reliability.

Summary Generated by Built In

Description

CodeHunter is a dynamic and innovative tech company that specializes in cybersecurity. CodeHunter is an enterprise-grade malware hunting platform. In seconds, we identify unknown malware threats that are undetectable to current cybersecurity solutions. By automating the analysis process, we reduce dependency on manual efforts and provide actionable intelligence that protects your organization from threat actors.

We are enthusiastic about pushing the boundaries of technology to create innovative solutions that solve real-world problems. As we continue to grow, we are looking for a talented Mid-Level Site Reliability Engineer to join our team.

As a Mid-Level Site Reliability Engineer at CodeHunter you will own the availability, resiliency, and scaling of our SaaS product offering. We need a highly skilled, purposeful, and accountable Site Reliability Engineer (SRE) to lead the charge in establishing a world-class reliability program. You will play a critical role in optimizing our systems, ensuring scalability, and maintaining the highest security, availability, and performance standards.

You will work closely with our DevOps Engineering team to observe, measure, and deliver high-quality solutions to meet client contractual service level agreements (SLAs). This position offers an exciting opportunity to work on challenging projects, learn modern technologies, and make a foundational impact on growing our service level maturity.

Responsibilities:

Lead the design and execution of a cutting-edge site reliability program that raises the bar for performance, scalability, and security.

Refine our DevSecOps practices, ensuring continuous improvement in monitoring, logging, and security.
Take full ownership of optimizing system performance, managing disaster recovery processes, and driving cost management for third-party SaaS solutions (AWS, Azure).
Establish and exceed SLAs, SLOs, and SLIs to guarantee system reliability, manage incidents with a sense of urgency, and conduct post-mortems to continuously improve our infrastructure.
Champion resilience and system uptime through chaos engineering, automated scaling, self-healing mechanisms, and future-proof capacity planning.
Develop and implement advanced monitoring and observability tools, while actively managing error budgets to meet organizational goals.
Automate CI/CD pipelines, infrastructure as code (IaC), and configurations to streamline our development processes.
Lead the DevOps Change Control Board (CCB), setting the standard of excellence in our change management processes.
Oversee the creation and evolution of a comprehensive internal knowledge base and develop training content to ensure seamless onboarding.
Drive zero-downtime deployments, utilizing blue-green and canary deployment strategies to ensure smooth updates.
Manage and optimize cloud platforms, containers (Docker, Kubernetes), and observability tools feeding critical insights to NOC/SOC and executive-level dashboards.
Stay at the forefront of industry’s best practices, emerging technologies, and innovations to drive continuous improvement.

Requirements

A proven history of exceeding expectations and delivering high-impact results in site reliability, performance optimization, and system scalability.
Expertise in cloud platforms (AWS, Azure), containers (Docker, Kubernetes), and modern monitoring/observability tools.
Deep experience with automation, DevSecOps practices, and infrastructure-as-code (IaC).
Strong leadership skills, with the ability to drive change, champion excellence, and mentor others.
A proactive problem-solver with a keen focus on both long-term vision and immediate execution.
Passion for continuous learning, staying updated on industry trends, and applying best practices to deliver exceptional reliability and performance.
Significant prior experience managing uptimes for cloud infrastructure investments handling millions of HTTP requests per second globally.
Good knowledge of PowerShell, Python, Bash.
5-7+ years of professional site reliability engineering.
Excellent problem-solving and communication skills.
Ability to work collaboratively in a team environment and delegate responsibilities to team members.

Nice-to-Have

Dynamic environment creation on demand using Terraform or similar technology.

Benefits

CodeHunter offers a creative, team-oriented, and entrepreneurial work environment. Self-starters thrive here. Our employees have the chance to be a part of the organization from the ground level and make a demonstrable impact by bringing an innovative product to the cybersecurity marketplace. CodeHunter offers best-in-class benefits, including:

401K
Health coverage
Vision and dental coverage
Company-sponsored training
Parking or metro benefits
Catered lunches
Generous PTO policy

CodeHunter is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law.

Top Skills

Bash

Powershell

Python

View all jobs at CodeHunter

View CodeHunter Profile

Report Job

The Company

HQ: McLean, VA

26 Employees

On-site Workplace

What We Do

The world’s first malware hunting SaaS platform designed to detect all variations of malware, known and unknown, without the need for source code or signatures. #LetTheHuntBegin