Major Incident & Problem Manager

The Major Incident & Problem Manager will play a pivotal role in coordinating response to Major Incidents, and managing the lifecycle of Problems from initiation through to restoration. You’ll be responsible for managing the Major Incident and Problem Management processes, including related reporting, governance, training, and continual improvement. You’ll build relationships and work with a number of colleagues to ensure collaborative efforts to service restoration and root cause analysis, thus reducing the impact of Incidents and the risk of recurrence, enhancing stability of our systems and environments.

You’ll be joining a growing Corporate IT department at an exciting time. If you are detail-oriented, and have a passion for excellent service delivery, we encourage you to apply for this opportunity.

Responsibilities

Major Incident Management:

  • Manage Major Incidents from identification to service restoration
  • Collaborate with a variety of teams and third parties to ensure service is restored at the earliest opportunity and within SLA
  • Complete post-Major Incident activities, including production of a Post Major Incident Report and identification of lessons learnt
  • Maintain an accurate record of system availability, linked to outages or degradation to service identified as Major Incidents

Problem Management:

  • Raise reactive Problems following all Major Incidents in a timely manner post-Major Incident resolution
  • Proactively identify potential Problems through trend analysis and monitoring system performance
  • Coordinate root cause analysis for all Problems, collaborating with a variety of teams and third parties to get to root cause within defined SLAs
  • Complete post-Problem resolution activities, including production of a Problem Closure Report and identification of lessons learnt
  • Maintain the Known Error database

Process Design:

  • Establish, document, and communicate the Major Incident, and Problem Management, processes and strategy across the organisation
  • Conduct yearly process reviews, obtain feedback from others on how the processes work, and implement improvements when required
  • Provide guidance, training, and mentorship to colleagues involved in Major Incident and Problem Management activities
  • Use existing toolsets to implement processes where possible

Integration:

  • Work with the Change Management function to enable appropriate discussions on Changes that are the cause of, or may prevent, Major Incidents and Problems
  • Work with the Service Transition function to document any Incident needs, or Known Errors, in advance of go-live of a new system
  • Work with the Asset Management function to determine any dependencies that could contribute to, of be affected by, a Major Incident or Problem
  • Work with the Desktop Support Team to gather any trends in user behavior, or other items that may have been highlighted during Major Incident or Problem Management activities
  • Work with the Security Team to assist with the response to Security-related Major Incidents or Problem investigations
  • Adhere to other Service Management policies and processes relevant to the role

Communication:

  • Maintain timely communications regarding the status of all Major Incidents and Problem investigations to IT and Business stakeholders as required
  • Communicate any changes to the Major Incident or Problem Management processes in advance of go-live, obtain buy-in and approval from IT and Business stakeholders as needed
  • Communicate any resourcing needs, to complete related activities, with the relevant Team Managers

Documentation:

  • Create a central repository for recording all documents, stakeholder decisions, meeting minutes, and the progress of all Major Incident and Problem Management activities
  • Oversee the creation, and reviews, of documentation relating to Major Incident and Problem Management, including standard operating procedures relating to workarounds and guidance for end users as a means of self-help

Risk Management:

  • Identify and document operational risks that are apparent during Major Incident response or Problem investigations
  • Contribute to effective risk management strategies and conversations to minimise the risk of disruption and ensure service continuity

Requirements

  • Minimum 3 years’ experience in a Major Incident or Problem Manager role, preferably in a large, multi-client organisation
  • Proven track record of managing Major Incidents and Problem investigations while delivering excellent customer experience
  • In-depth understanding of ITIL Service Management processes and best practices, preferably certified in ITIL Foundation
  • Excellent communication (written and verbal) and interpersonal skills, and ability to collaborate effectively across different regions and cultures
  • Ability to work well under pressure
  • Ability to think logically, analyse situations and problem solve
  • Ability to effectively challenge when appropriate
  • Ability to champion best practice and influence others
  • JIRA experience is desirable, but not essential

Lunik employees enjoy:

  • Hybrid working (3 days at home and 2 in the office)
  • Corporate pension plan
  • Free health insurance for the whole family
  • Free English and Spanish language classes
  • Free psychotherapy sessions
  • Gym membership subsidy
  • Free Life insurance
  • Flexible retribution
  • Fun socials – from weekly happy hour drinks to big seasonal events

Job Category: Corporate IT
Job Type: Full Time
Job Location: Malaga, Madrid

Menu