Cloud Operations Lead Monitoring & AI Ops Engineer

Posted 8 Days Ago
Be an Early Applicant
Bangalore, Bengaluru, Karnataka
Senior level
Machine Learning • Cybersecurity
The Role
Lead the strategy and management of global network monitoring and AI Ops tools, ensuring cloud infrastructure reliability and performance.
Summary Generated by Built In

Job Title:

Cloud Operations Lead Monitoring & AI Ops Engineer

About Skyhigh Security:

Skyhigh Security is a dynamic, fast-paced, cloud company that is a leader in the security industry.  Our mission is to protect the world’s data, and because of this, we live and breathe security. We value learning at our core, underpinned by openness and transparency. 

Since 2011, organizations have trusted us to provide them with a complete, market-leading security platform built on a modern cloud stack. Our industry-leading suite of products radically simplifies data security through easy-to-use, cloud-based, Zero Trust solutions that are managed in a single dashboard, powered by hundreds of employees across the world. With offices in Santa Clara, Aylesbury, Paderborn, Bengaluru, Sydney, Tokyo and more, our employees are the heart and soul of our company. 

Skyhigh Security Is more than a company; here, when you invest your career with us, we commit to investing in you. We embrace a hybrid work model, creating the flexibility and freedom you need from your work environment to reach your potential. From our employee recognition program, to our ‘Blast Talks' learning series, and team celebrations (we love to have fun!), we strive to be an interactive and engaging place where you can be your authentic self. 

We are on these too! Follow us on LinkedIn and Twitter@SkyhighSecurity.

Role Overview:

The Cloud Operations Lead Monitoring & AI Ops Engineer at Skyhigh Security will be responsible for leading the strategy, implementation, and management of global network monitoring tools and AI Ops solutions. This role involves ensuring the reliability, performance, and security of our cloud infrastructure through proactive monitoring, automation, and advanced analytics. The successful candidate will collaborate with engineering, operations, and security teams to enhance observability and incident response capabilities. This position is part of a fast-growing Global Tech Ops team, playing a key role in scaling and optimizing our cloud operations.

Key Responsibilities:

  • Serve as the technical lead for global monitoring and AI Ops initiatives across the Skyhigh Security product portfolio.
  • Develop and implement strategies for proactive monitoring, anomaly detection, and automated incident resolution.
  • Oversee the deployment and management of monitoring/logging tools such as Prometheus, Grafana, OpenSearch, PagerDuty AI Ops, and Kentik.
  • Ensure comprehensive observability of cloud environments, network performance, and security metrics.
  • Collaborate with engineering teams to integrate monitoring principles into the software development lifecycle (SDLC), making observability an integral part of deployments rather than a post-deployment task.
  • Define and implement best practices for monitoring high-scale cloud environments across AWS, Azure, and OCI.
  • Utilize AI Ops tools to enhance event correlation, root cause analysis, and automated remediation.
  • Analyze monitoring data to identify trends, optimize system performance, and improve alerting mechanisms.
  • Provide guidance on incident response processes and drive continuous improvement in monitoring effectiveness.
  • Maintain documentation for monitoring frameworks, configurations, and operational procedures.
  • Stay updated with industry trends, emerging AI Ops technologies, and best practices in cloud monitoring.

Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • 7+ years of experience in cloud operations with a strong focus on global monitoring and AI Ops tools.
  • Expertise in cloud platforms (AWS, Azure, OCI) and their monitoring services.
  • Deep understanding and hands-on experience with Jira Cloud, Confluence, and Atlassian Service Management.
  • Strong knowledge of monitoring and observability platforms, including Prometheus, Grafana, OpenSearch, PagerDuty AI Ops, and Kentik.
  • Experience designing and implementing AI-driven monitoring solutions for large-scale environments.
  • Proficiency in automation and scripting (e.g., Python, Go, Bash) to enhance monitoring capabilities.
  • Strong analytical and problem-solving skills with the ability to interpret complex monitoring data.
  • Excellent leadership, collaboration, and communication skills.
  • Ability to work in a fast-paced, dynamic environment.

Preferred Qualifications:

  • Relevant certifications (e.g., AWS Certified Solutions Architect, Azure Administrator).
  • Experience with ITIL processes and best practices in incident and problem management.
  • Knowledge of cloud security monitoring and threat detection methodologies.
  • Experience in designing and modernizing monitoring tools for cloud-native and hybrid environments.
  • Understanding of network performance monitoring and optimization strategies.

Company Benefits and Perks:

We work hard to embrace diversity and inclusion and encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.

  • Retirement Plans
  • Medical, Dental and Vision Coverage
  • Paid Time Off
  • Paid Parental Leave
  • Support for Community Involvement

We're serious about our commitment to diversity which is why we prohibit discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.

Top Skills

AWS
Azure
Bash
Confluence
Go
Grafana
JIRA
Kentik
Oci
Opensearch
Pagerduty
Prometheus
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Plano, Texas
3,118 Employees
On-site Workplace
Year Founded: 2022

What We Do

Trellix is a global company redefining the future of cybersecurity. The company’s open and native extended detection and response (XDR) platform helps organizations confronted by today’s most advanced threats gain confidence in the protection and resilience of their operations. Trellix’s security experts, along with an extensive partner ecosystem, accelerate technology innovation through machine learning and automation to empower over 40,000 business and government customers.

Similar Jobs

iManage Logo iManage

Full Stack Senior Developer (ReactJS, NodeJS)

Artificial Intelligence • Cloud • Information Technology • Legal Tech • Productivity • Software
Hybrid
Bengaluru, Karnataka, IND
1100 Employees

BlackLine Logo BlackLine

Software Engineer

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Hybrid
Bengaluru, Karnataka, IND
1810 Employees
Hybrid
Bengaluru, Karnataka, IND
289097 Employees
Hybrid
Bengaluru, Karnataka, IND
289097 Employees

Similar Companies Hiring

Air Space Intelligence Thumbnail
Transportation • Software • Machine Learning • Logistics • Artificial Intelligence • Aerospace
Boston , Massachusetts
109 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account