Collaboration Reliability Engineering Lead

Posted 16 Days Ago
Be an Early Applicant
Seattle, WA
135K-150K Annually
Mid level
Internet of Things
The Role
As the Collaboration Reliability Engineering Lead, you will manage a global team, oversee system architecture maintenance, drive incident response efforts, and enhance system reliability through automation and performance improvements. This role requires collaboration with multiple teams and the implementation of monitoring tools and strategies to ensure the reliability of advanced collaboration technologies.
Summary Generated by Built In
WHO WE ARE:

EOS IT Solutions is a Global Technology and Logistics company, providing Collaboration and Business IT Support services to some of the world’s largest industry leaders, delivering forward-thinking solutions based on multi-domain architecture. Customer satisfaction and commitment to superior quality of service are our top business priorities, along with investing in and supporting our partners and employees.
We are a true International IT provider and are proud to deliver our services through global simplicity with trusted transparency.

WHAT YOU’LL DO:

We are seeking an experienced and technically proficient Collaboration Reliability Engineering Lead to join our team. In this role, you will support advanced collaboration technologies in a fast-paced and industry-leading environment. The ideal candidate is a highly motivated technical enthusiast with a strong foundation in IT, operations, networking, scripting, and collaboration technologies, and a passion for continuous learning.

TEAM LEADERSHIP:

    • Lead, mentor, and manage a global team of 8-12 reliability engineers.
    • Foster ownership, accountability, and collaboration within the team.
    • Develop team members technical and professional skills through coaching and performance reviews.

SYSTEM RELIABILITY AND PERFORMANCE:

    • Oversee maintenance of highly available and scalable architecture including but not limited to cisco server templates, endpoints, edge & proxy appliances
    • Develop, present, and achieve service-level objectives (SLOs), service-level agreements (SLAs), and key performance indicators (KPIs).
    • Perform quality assurance on video conferencing infrastructure, calendar tooling, touch panel hardware, automation bots, cisco endpoints, and call center tooling.   

INCIDENT MANAGEMENT RESOLUTION:

    • Drive incident response, root cause analysis, and post-mortem processes to identify and address reliability issues impacting users.
    • Implement proactive monitoring, alerting, and automation to minimize downtime and improve recovery times in live production environments.
    • Serve as an escalation point for video conferencing infrastructure and network troubleshooting, maintaining up-to-date documentation and on-call runbooks.

RELIABLILITY IMPROVEMENTS:

    • Identify opportunities to improve system performance and reduce operational toil.
    • Develop and implement strategies for failure testing, and future-capacity planning.

CROSS FUNTIONAL COLLABORATION:

    • Work closely with engineering, security, networking, and third-party vendors (e.g., Cisco, Brightsign, Arista, Zoom, Webex) to resolve support cases and critical escalations.
    • Provide highly-visible communications to hundreds of users regarding large scale changes and updates.
    • Advocate for reliability-focused initiatives and communicate their value to stakeholders.

TOOLS AND AUTOMATION:

    • Leverage internal-tooling to monitor, analyze, and improve system reliability.
    • Lead efforts to automate repetitive tasks, ensuring efficient system operations.

TECHNICAL REQUIREMENTS:

  • 3+ years of experience in Reliability Engineering or similar roles.

Health Monitoring: Experience implementing and coordinating telemetry using monitoring tools like Splunk, Grafana, and Prometheus, or similar technologies.

  • VMware expertise: Hands-on experience with VMware from a VM deployment, lifecycle and API/CLI perspective
  • ITIL Knowledge: Understanding of ITIL processes, service management principles, and IT service delivery best practices
  • Automation: Experience as an automation advocate with a history of removing operational toil via software
  • Experience supporting internet-facing production services and distributed systems, including: Deployments, On-Call rotations, and Incident management.

TECHNICAL SKILLS:

  • Familiarity with Bash, Python, Terraform, and REST APIs.
  • Fundamental understanding of networking protocols (e.g., HTTP, TCP/IP, WebRTC, SIP).
  • Infrastructure components (e.g., load balancers, firewalls, DNS).

ADDITIONAL KEY PRIORITIES:

  • Expertise in disaster recovery and future-capacity planning.
  • Excellent communication and interpersonal skills, with the ability to work effectively in a team-oriented environment.
  • Self-motivated and eager to learn new technologies, tools, and methodologies.

Experience with collaboration hardware, platforms (e.g., Zoom, Microsoft Teams, WebEx), or media delivery networks.The EOS pay range for this job is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) location, responsibilities of the job, experience, education, knowledge, skills, and abilities, as well as internal equity, market data, or other laws. 


EOS is committed to creating a diverse and inclusive work environment and is proud to be an equal opportunity employer. We invite you to consider opportunities at EOS regardless of your gender; gender identity; gender reassignment; age; religious or similar philosophical belief; race; national origin; political opinion; sexual orientation; disability; marital or civil partnership status or other non-merit factor. 

The EOS pay range for this job is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) location, responsibilities of the job, experience, education, knowledge, skills, and abilities, as well as internal equity, market data, or other laws. 

#LI-ML1
#LI-Hybrid

Pay Range

$135,000$150,000 USD

Top Skills

Bash
Python
Terraform
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Banbridge
390 Employees
On-site Workplace

What We Do

EOS IT Solutions is a family run Global Technology and Logistics company, providing Collaboration and Business IT Support services to some of the world’s largest  industry leaders, delivering forward-thinking solutions based on multi-domain architecture.

As a top-tier partner with Cisco, Juniper, Dell, Arista, HP, Pure Storage, Palo Alto, and many others, we have grown to become a market leader in IT Supply chain, AV Installation and International Deployment. We also continue to expand our service offerings into Security, Data Centre, and Enterprise Networking. Our Managed Services simplify complex processes and improve productivity across all business functions under a single purchase order.

Similar Jobs

Seattle, WA, USA
390 Employees
135K-150K Annually
Remote
Hybrid
10 Locations
2674 Employees

Hiya Inc. Logo Hiya Inc.

Software Development Engineer in Test

Artificial Intelligence • Cloud • Mobile • Security • Software
Remote
Hybrid
Seattle, WA, USA
145 Employees
140K-205K Annually
Hybrid
10 Locations
2674 Employees

Similar Companies Hiring

Arch Systems Inc. Thumbnail
Software • Manufacturing • Machine Learning • Internet of Things • Industrial • Artificial Intelligence • Analytics
US
80 Employees
Halter Thumbnail
Software • Machine Learning • Internet of Things • Hardware • Greentech • Business Intelligence • Agriculture
Auckland City, NZ
150 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account