Lead Site Reliability Engineer

Posted 12 Days Ago
Hiring Remotely in MO
Remote
99K-183K Annually
Senior level
Healthtech
The Role
The Lead Site Reliability Engineer is responsible for managing and maintaining platform infrastructure performance, reliability, and security by utilizing SRE practices. They design Kubernetes clusters, implement Infrastructure as Code, manage container orchestration, and ensure compliance and security. Responsibilities also include monitoring, performance optimization, and mentoring junior team members.
Summary Generated by Built In

You could be the one who changes everything for our 28 million members by using technology to improve health outcomes around the world.  As a diversified, national organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility.
 

Position Purpose:

Helps lead projects that are focused on managing and maintaining optimum platform infrastructure performance, reliability, and security using SRE practices, observability tools, manual and automated procedures, documentation, people and processes and continuous delivery (CI/CD) tools, processes, and designs.  Develops complex services to automate monitoring activities and provide critical information to facilitate response and resolution of performance and availability issues and incidents. Understands and advocates for standardized and scalable software tools to ensure that systems operate without interruption at optimum performance and leads project teams throughout the deployment process. Troubleshoots and analyzes service disruptions to determine the root cause of issues and develop solutions for improved reliability.   

  • Design and Architecture: Designing Kubernetes clusters tailored to the specific requirements of applications and workloads, ensuring scalability, resilience, and security.
  • Cluster Provisioning: Deploying Kubernetes clusters across various environments, including on-premises data centers, public clouds (like AWS, Azure)
  • Infrastructure as Code (IaC): Implementing Infrastructure as Code practices using tools like Terraform, Ansible, or Kubernetes-specific tools like Helm to automate cluster provisioning and configuration.
  • Cluster Configuration: Configuring Kubernetes clusters according to best practices, including networking, storage, security policies, and resource allocation.
  • Container Orchestration: Managing containerized applications and workloads using Kubernetes primitives like Pods, Deployments, StatefulSets, Services, and DaemonSets.
  • CI/CD Integration: Integrating Kubernetes with Continuous Integration/Continuous Deployment (CI/CD) pipelines for automated application deployment, testing, and rollout. GitLab/Terraform.
  • Monitoring and Logging: Implementing monitoring and logging solutions (e.g., Dynatrace, Splunk) to gain insights into cluster health, performance, and application behavior.
  • Security and Compliance: Ensuring Kubernetes clusters adhere to security best practices, implementing RBAC (Role-Based Access Control), network policies, and encryption mechanisms to protect sensitive data.
  • Scaling and Performance Optimization: Optimizing cluster performance and scalability by fine-tuning resource allocation, scheduling policies, and horizontal/vertical autoscaling configurations.
  • Disaster Recovery and High Availability: Designing and implementing disaster recovery plans and high availability strategies to minimize downtime and ensure business continuity.
  • Troubleshooting, Debugging and Root Cause Analysis: Diagnosing and resolving issues related to cluster performance, application deployment, networking, and security.
  • Knowledge Sharing and Mentoring: Sharing knowledge and best practices with junior team members, conducting training sessions, and providing mentorship to help grow the team's expertise in Kubernetes and related technologies.
  • Stay Updated: Keeping abreast of the latest developments in Kubernetes ecosystem, attending conferences, and participating in forums.
  • Collaboration: Collaborating with cross-functional teams including developers, DevOps engineers, system administrators, and security professionals to ensure smooth operation of Kubernetes clusters and applications.
  • Documentation: Maintaining detailed documentation of cluster architecture, configurations, and operational procedures to facilitate knowledge transfer and troubleshooting.
  • Patching: Maintaining deployed systems with the latest releases, updating infrastructure to adhere to compliance with audit regulations.
  • Performs other duties as assigned.
  • Complies with all policies and standards.

Education/Experience:

A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science).
Requires 5 – 7 years of related experience.
Or equivalent experience acquired through accomplishments of applicable knowledge, duties, scope and skill reflective of the level of this position.
Technical Skills:

  • Experience with Kubernetes: Rancher, RKE1/2, EKS, AKS, Nginx/Ingress Controllers, Calico/CNI, Security Scanning Tools
  • Experience with Linux Operating System: Ubuntu Amazon Linux 2
  • Experience with Observability/Monitoring: Splunk, Dynatrace, Prometheus
  • Experience with CICD: Ansible, GitLab, Terraform, Helm
  • Experience with Public Cloud: AWS and Azure
  • Experience with Programming Tools; Python, API/REST, Bash
  • Experience with CLI Tools: Kubectl, Docker, Nerdctl, Git, AWS CLI
  • Experience with Code Repositories: auto deployments, branching with tools such as Gitlab, Artifactory
  • Experience with IT Service Management Tools: Service Now, Atlassian Tools - Jira, Confluence

Soft Skills:

  • Intermediate - Seeks to acquire knowledge in area of specialty
  • Intermediate - Ability to identify basic problems and procedural irregularities, collect data, establish facts, and draw valid conclusions
  • Intermediate - Ability to work independently
  • Intermediate - Demonstrated analytical skills
  • Intermediate - Demonstrated project management skills
  • Intermediate - Demonstrates a high level of accuracy, even under pressure
  • Intermediate - Demonstrates excellent judgment and decision making skills
  • Intermediate - Ability to communicate and make recommendations to upper management
  • Intermediate - Ability to drive multiple projects to successful completion
  • Intermediate - Possesses technical aptitude

Pay Range: $98,900.00 - $183,100.00 per year

Centene offers a comprehensive benefits package including: competitive pay, health insurance, 401K and stock purchase plans, tuition reimbursement, paid time off plus holidays, and a flexible approach to work with remote, hybrid, field or office work schedules.  Actual pay will be adjusted based on an individual's skills, experience, education, and other job-related factors permitted by law.  Total compensation may also include additional forms of incentives.

Centene is an equal opportunity employer that is committed to diversity, and values the ways in which we are different. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or other characteristic protected by applicable law.

Qualified applicants with arrest or conviction records will be considered in accordance with the LA County Ordinance and the California Fair Chance Act

Top Skills

Kubernetes
The Company
Columbus, GA
19,002 Employees
On-site Workplace
Year Founded: 1984

What We Do

Centene provides healthcare solutions to individuals across the United States with more than 23 million members nationwide.

Similar Jobs

Remote
USA
156 Employees
164K-226K Annually

HashiCorp Logo HashiCorp

Sr. Platform Software Engineer - HCP Terraform

Cloud • Information Technology • Security • Software
Remote
United States
1200 Employees
177K-208K Annually

Core Scientific Logo Core Scientific

Senior Site Reliability Engineer (SRE)

Blockchain • Fintech • Cryptocurrency
Remote
USA
290 Employees

Phaidra Logo Phaidra

Site Reliability Engineer

Artificial Intelligence
Remote
USA
43 Employees

Similar Companies Hiring

Sage Thumbnail
Software • Healthtech • Hardware • Analytics
New York, NY
44 Employees
Zealthy Thumbnail
Telehealth • Social Impact • Pharmaceutical • Healthtech
New York City, NY
13 Employees
Cencora Thumbnail
Pharmaceutical • Logistics • Healthtech
Conshohocken, PA
46000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account