Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.
We are looking for a Lead Site Reliability Engineer to join our Cloud Infrastructure Engineering division. Cloud Infrastructure Engineering ensures the continuous availability of the technologies and systems that are the foundation of athenahealth’s services. We are directly responsible for thousands of servers, petabytes of storage, and handling thousands of web requests per second, all while sustaining growth at a meteoric rate. We enable an operating system for the medical office that abstracts away administrative complexity, leaving doctors free to practice medicine.
But enough about us; let’s talk about you!
You’re a seasoned engineer with a passion for identifying and resolving reliability and scalability challenges. You are a curious team player, someone who loves to explore, learn, and make things better. You are excited to uncover inefficiencies in business processes, creative in finding ways to automate solutions, and relentless in your pursuit of greatness. You’re a nimble learner capable of quickly absorbing complex solutions and an excellent communicator who can help evangelize engineering excellence.
The Team:
We are a bunch of Site Reliability Engineers who are passionate about reliability, automation, and scalability. We use an agile based framework to execute our work, ensuring we are always focused on the most important and impactful needs of the business. We support systems in both private and public cloud and make data-driven decisions for which one best suit the needs of the business. We are relentless in automating away manual, repetitive work so we can focus on projects that help move the business forward.
Job Responsibilities
Ownership of PaaS Core Infrastructure
-
Lead the design, implementation, and maintenance of the core infrastructure for our PaaS platform, ensuring its reliability, scalability, and efficiency.
-
Oversee the performance, availability, and resiliency of key platform services and ensure they meet Service Level Agreements (SLAs).
-
Continuously improve the core infrastructure to support new features and customer demands, ensuring it can scale effectively.
Automation and Efficiency
-
Lead automation efforts that minimize manual processes, improve operational efficiency, and enhance the developer experience.
-
Build and maintain tools, frameworks, and CI/CD pipelines that allow teams to deploy and manage PaaS services at scale.
-
Promote the use of Infrastructure as Code (IaC) and configuration management tools such as Terraform, Puppet, EC2/Kubernetes to automate the provisioning and management of platform infrastructure.
-
Experienced with Hybrid Cloud build and scaling out.
Monitoring, Observability, and Incident Management:
-
Design and implement comprehensive monitoring, alerting, and observability systems to ensure the health of PaaS infrastructure.
-
Proactively identify and mitigate performance bottlenecks, scaling issues, and potential failures.
-
Lead the incident management process, driving post-incident reviews, and ensuring that lessons learned are applied to improve reliability and uptime
Security and Compliance:
-
Work closely with security teams to ensure that platform services adhere to security best practices, compliance standards, and regulatory requirements.
-
Lead Vulnerability remediation effort
-
Conduct regular security audits and assessments, ensuring that both platform and user data remain secure.
Collaboration and Leadership:
-
Lead, mentor, and support a team of SREs and engineers, fostering a culture of continuous improvement, ownership, and operational excellence.
-
Work closely with development teams, product management, and DevOps teams to align platform infrastructure with business needs.
-
Serve as a key technical leader and advisor on platform scalability, reliability, and operational strategies.
Continuous Improvement:
-
Stay current with emerging technologies and industry trends in cloud infrastructure, platform engineering, and site reliability engineering.
-
Identify opportunities to innovate and improve PaaS offerings, advocating for improvements in both platform technology and operational practices.
Documentation and Knowledge Sharing:
-
Create and maintain comprehensive documentation for PaaS infrastructure, operational procedures, troubleshooting guides, and best practices.
-
Share knowledge with internal teams, fostering a culture of collaboration and continuous learning within the organization.
Qualifications
-
8-10 years of experience building, scaling, and supporting highly available systems and services hosted on diverse set of hosts, e.g. Physical Hosts, VM, EC, Kubernetes and containers.
-
4-5 years of years of experience managing and leading technical teams, including mentoring engineers and fostering team development.
-
Strong understanding of distributed systems, networking, and cloud-native and On-Prem hosted applications.
-
Strong experience with enterprise grade middleware and Core Infrastructure, e.g. Web Servers, MQ, Caching, Apache & Load Balancers (NetScaler) hosted on a virtual machine cluster.
-
Strong Expertise in configuration management tools like Puppet or Ansible.
-
Experience with Infrastructure-as-Code, Linux, VmWare and API integration. Hands on with Terraform
-
Experience with microservices architectures and containerization technologies.
-
Hands-on experience with CI/CD pipelines and automation practices.
-
Proficiency in at least one scripting or programming language (Ansible, Python, Go, Ruby, etc.).
-
Expertise in the delivery, maintenance, and support of Linux systems and infrastructure
-
Experience with cloud platforms ( AWS), containerization ( Docker), and orchestration ( Kubernetes).
-
Familiarity with observability tools (e.g., Prometheus, Grafana, ELK stack, CloudWatch, Splunk)
-
Familiarity with telemetry, latest monitoring, visualization tools.
-
Expertise in promoting and driving system visibility to aid in the rapid detection and resolution of issues
-
Computer Science degree or equivalent experience
Behaviors & Abilities Required:
-
Strong leadership and mentoring abilities, with a track record of developing high-performance engineering teams.
-
Excellent problem-solving, troubleshooting, and diagnostic skills.
-
Ability to work in a cross-functional, collaborative environment.
-
Effective communication skills, with the ability to translate technical concepts to non-technical stakeholders.
About athenahealth
Here’s our vision: To create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.
What’s unique about our locations?
From an historic, 19th century arsenal to a converted, landmark power plant, all of athenahealth’s offices were carefully chosen to represent our innovative spirit and promote the most positive and productive work environment for our teams. Our 10 offices across the United States and India — plus numerous remote employees — all work to modernize the healthcare experience, together.
Our company culture might be our best feature.
We don't take ourselves too seriously. But our work? That’s another story. athenahealth develops and implements products and services that support US healthcare: It’s our chance to create healthier futures for ourselves, for our family and friends, for everyone.
Our vibrant and talented employees — or athenistas, as we call ourselves — spark the innovation and passion needed to accomplish our goal. We continue to expand our workforce with amazing people who bring diverse backgrounds, experiences, and perspectives at every level, and foster an environment where every athenista feels comfortable bringing their best selves to work.
Our size makes a difference, too: We are small enough that your individual contributions will stand out — but large enough to grow your career with our resources and established business stability.
Giving back is integral to our culture. Our athenaGives platform strives to support food security, expand access to high-quality healthcare for all, and support STEM education to develop providers and technologists who will provide access to high-quality healthcare for all in the future. As part of the evolution of athenahealth’s Corporate Social Responsibility (CSR) program, we’ve selected nonprofit partners that align with our purpose and let us foster long-term partnerships for charitable giving, employee volunteerism, insight sharing, collaboration, and cross-team engagement.
What can we do for you?
Along with health and financial benefits, athenistas enjoy perks specific to each location, including commuter support, employee assistance programs, tuition assistance, employee resource groups, and collaborative workspaces — some offices even welcome dogs.
In addition to our traditional benefits and perks, we sponsor events throughout the year, including book clubs, external speakers, and hackathons. And we provide athenistas with a company culture based on learning, the support of an engaged team, and an inclusive environment where all employees are valued.
We also encourage a better work-life balance for athenistas with our flexibility. While we know in-office collaboration is critical to our vision, we recognize that not all work needs to be done within an office environment, full-time. With consistent communication and digital collaboration tools, athenahealth enables employees to find a balance that feels fulfilling and productive for each individual situation.
What We Do
athenahealth strives to cure complexity and simplify the practice of healthcare. Our innovative technology includes electronic health records, revenue cycle management, and patient engagement solutions that help healthcare providers, administrators, and practices eliminate friction for patients while getting paid efficiently. athenahealth partners with practices with purpose-built software backed by expertise to produce the insights needed to drive better clinical and financial outcomes. We’re inspired by our vision to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.
For more information, please visit www.athenahealth.com
Why Work With Us
We are here to make an impact on the healthcare industry at scale. We enable our diverse teams to move fast, grapple with interesting technical challenges, and innovate at every level. We are on a modernization journey and build on the hybrid cloud. We deliver best-in-class solutions to help every patient receive the best possible care.