Software Support - onsite Lyndhurst NJ
• GPFS
- Experience monitoring GPFS , quota sizes, pool sizes and managing them effectively
- Debug GPFS issues with respect to data and metadata corruption, and GPFS performance issues-
• LSF
- manage issues with NHC (node health check) monitoring from LSF
- Experience with LSF configuration and add / remove nodes from LSF and manage downtimes/reservations with LSF
• Puppet/Foreman
- Experience writing puppet code and Hiera, along with running it effectively on hundreds of nodes
- Manage deployments with foreman when necessary, and experience managing pxe, dhcp, dns, kickstart scripts and postscripts
• Icinga/Nagios
- Experience managing Icinga, setting up alerts and downtimes
• Scripting
- Experience writing bash and Python scripts with REST API
• User support
- Experience helping users with login issues, managing LDAP/AD accounts, build software and debug software issues
• L3 Linux support
- Monitoring logs and debug OS issues with respect to performance and best practices
• SLA
- less than 10 mins to respond to software alerts
- less than 1 day to resolve the issue (depends on severity and complexity).
- Availability on weekends for Sev1 issues.
Nice to have
• Experience with other scheduling / parallel filesystem technologies
• Scientific computing experience
• Weka experience
Top Skills
What We Do
RCC is a full service IT firm which provides solutions for the development and support of IT infrastructure. We serve a broad range of clients, from large complex enterprises to small and mid-sized companies. RCC is committed to developing IT solutions which enable our clients to achieve their business goals. With over two decades of IT experience, RCC has the skill-set to support all of your IT needs.