Company Description
We are One Sutherland — a global team where everyone is working together to create great breakthrough solutions. Our workforce has thrived in an environment of diversity of thought, experience and background. We celebrate our diversity and embrace it whole-heartedly. Sutherland is an equal opportunity employer. We promote a positive work environment by conducting ourselves professionally and helping each other achieve our goal of One Sutherland Team, Playing to Win.
Sutherland was founded 38 years ago (1986). Since then, we have become a leading global provider of business process and technology management services offering an integrated portfolio of analytics-driven back office and customer-facing solutions that support the entire customer life cycle.
Job Description
Sutherland is seeking Application and System monitoring Engineer to take our existing CloudOps monitoring to the next level.
In this position You will be working with multitude of modern tools and technologies to properly and efficiently build next generation of monitoring system as well as troubleshoot and resolve issues in our development, test and production environments.
The ideal candidate has to have the ability to work in a dynamic and complex software build environment and will also be an energetic self-starter with a passion to build, innovate and achieve excellence.
Subject matter expertise:
- Experience implementing predictive and detailed monitoring.
- Expert in Linux Command line.
- Design, architect and implement secure and highly available monitoring infrastructure.
- Enhanced monitoring capabilities including –
- Auto detection of brute force attacks in logs.
- Password attacks in logs.
- Implement next gen predictive monitoring solution to –
- Detect and alert on capacity utilization of compute resources.
- Detect and alert on any network related issues and choke points.
- Ability to design, implement and improve Grafana, Prometheus, Loki, Promtail, node exporter.
- Log parsing and management.
- Configuration of alerting, push notifications to VictorOps (now Splunk) and Email notifications.
- Architect, design and Implement Icinga 2 monitoring and alerting.
- Ability to monitor system metrics and log parsing.
- Ability to automate tasks using bash and / or Python scripting.
- Predictive monitoring of systems and applications.
- Familiarity with JVM internals and using of JMX and REST for monitoring.
- Familiarity with AWS infrastructure.
- Deep understanding of Java applications, TLS, Apache.
- Automated checks of performance of system metrics in Grafana.
- Automated checks of performance of Web Applications.
- Problem-solving and troubleshooting, including performing root cause analysis to design preventative activities.
- Crafting and maintaining dashboards and reports, pulling together monitoring data across multiple platforms within the same tool as well as across multiple tools.
- Assisting with writing scripts and queries that can provide environment self-healing capabilities.
- Written, verbal, interpersonal, and presentation skills.
- Communications among technical and non-technical employees.
- A customer driven approach and good customer management skills.
- Staying abreast of the latest monitoring technology and trends.
- Adhering to configuration, release, and change management protocols.
Qualifications
- Experience with using monitoring tools in a production environment.
- 5+ years of production cloud operations experience
- 5+ years expertise in Linux command line.
- 5+ years of using Terraform in AWS for automation. Hands on with automation and seeking out opportunities to automate manual processes.
- 5+ years of strong, hands-on experience building production services in AWS. (Must Have)
- 4+ years of experience with scripting using Python and Bash
- Ability to participate in on-call rotation
- Considerable knowledge of IT equipment and diagnostic tools.
- Considerable knowledge of principles and techniques of systems analysis, design, development and programming.
- Considerable knowledge of principles of information systems.
- Cnsiderable knowledge of capabilities of computer technology.
- Knowledge of methods and procedures used to conduct detailed analysis and design of computer systems.
- Knowledge of practices and issues of systems’ security and disaster recovery
- Knowledge of computer operating systems.
- Considerable problem solving skills.
- Considerable logic and analytical skills.
- Considerable oral and written communication skills; interpersonal skills; considerable ability to analyze, troubleshoot and resolve data communications problems.
- Considerable ability to prepare manuals, reports, documentation and other written materials; considerable ability to identify, analyze and resolve complex business and technical problems.
Additional Information
This is a Hybrid position in Bogotá, Colombia. Enjoy the benefits of joining a Great Place to Work company working for the world's biggest technology companies
Top Skills
What We Do
We make digital ?????™ by combining human-centered design with real-time Analytics, AI, Cognitive Technology & Automation to create exceptionally engineered Brand Experiences!
Sutherland is an experience-led digital transformation company. Our mission is to deliver exceptionally engineered experiences for customers and employees today, that continue to delight tomorrow.
For over 35 years, we have cared for our customers’ customers, delivering measurable results and accelerating growth. Our proprietary, AI-based products and platforms are built using robust IP and automation.
We are a team of global professionals, operationally effective, culturally meshed, and committed to our clients and to one another.
We call it One Sutherland. #MakeDigitalHuman