Title: Incident Manager
Location: San Jose, CA
Duration: 12 Months
This position reports to the Senior Manager of Service Level Management, Cloud Operations. The individual in this position will use Service Level Management (SLM) frameworks to provide support for ongoing incidents and the long-term remediation of incident root cause. The Incident Management Lead will be responsible for maintaining detailed records of all incidents, capturing root cause, and ensuring problem resolution. The purpose of the Incident Management process is to facilitate service restoration and prevent recurrence to meet our high standards of service availability and performance. Success will be measured based on the progress of reducing key performance indicators around service availability, performance, time to react, time to recover, and number of open or unresolved known problems.
What you'll do
Report on achieved service levels and compare them with agreed service level targets.
Capture, triage, and escalate incidents to technical teams as necessary.
Direct and manage technical resolution calls with members from various teams.
Communicate progress and resolution messages to appropriate stakeholders.
On call for after-hours incidents.
Provide incident reporting, including detailed description of incident from detection through problem resolution.
Provide service availability and performance metrics to support reliable reporting.
Incident resolution and problem management: conduct Root Cause Analysis (RCA), capture action items, and conduct remediation follow-up meetings.
Track process efficacy using established Key Performance Indicators (KPIs).
Collaborate with team members to improve the incident management process, establish new KPIs as needed.
Work with an international NOC team to identify and remediate issues with the potential to disrupt service availability or performance.
Ensure the accuracy of Quality of Service and availability metrics.
What you need to succeed
5+ years supporting highly available SaaS products in an incident management role.
5+ years' experience working with and improving processes within a standard ITSM framework such as ITIL.
Hands-on experience with enterprise-level helpdesk software.
Demonstrable ability to communicate clearly and effectively.
Ability to direct and manage incident responders and incident response processes.
Ability to effectively manage client & staff relationships, promptly respond to queries, ensure promises are kept, and manage expectations.
Advanced planning and organizational experience within fast-paced/dynamic business environments
BS/BA required – business or computer science discipline
Successful Candidate Qualities:
Team player – willing to take calculated risk
Familiar and comfortable working in a 24x7, high-availability service delivery environment.
Highly supportive of the business and of its ideals and strategies
No particular bias towards specific technologies, vendors, or products
Effective at driving short-term results that are consistent with long-term goals
Skilled at operating in a matrix environment