Categories
Incident Management ITSM

Top tips for tip top post-incident reviews

Top tips for tip top post-incident reviews

One of the most important things to do after an incident is resolved, is to reflect on what happened and what could be improved on for next time. You might call it a post-mortem, I call this an RCA (root cause analysis).

The purpose of this exercise is to identify and make improvements to your systems and processes. We’re looking to continuously improve, people!

Below, I’ve outlined several things which are critical in getting the maximum value from your RCAs.

Timing
Set a maximum amount of time within which an RCA should take place and make sure it happens. For me? Three working days, max. The longer you leave it, the fuzzier the detail.

This also helps ensure that customer facing copies of your review are available in a timely manner.

Post its and collaboration
A RCA session should be as inclusive as possible. Involve stakeholders! Not just technical staff. I’d recommend having at least one member of each commercial team present. I mean, who has a better idea of the customer impact the incident has had?
This will help bridge gaps between the teams, bringing better understanding of each other’s pain points at the same time.

Documentation
Document everything. ITIL has defined what you should basically cover, but make sure you are capturing all of the information that your business needs. Record information that’ll be useful to you later. Document the root cause so you can identify trends and patterns. I like to use Confluence to document incidents, as it has a neat way to view all of my tasks and associated JIRA tickets in one place.

Root cause
Have a framework for identifying a root cause and stick to it. I’ve had great success with the five whys method. But you should find something that gets results for you and stay consistent with it. Mindtools.com has a good list of methods for reaching a root cause.

Actions
The outcome from the RCA session should include actions you can take to stop the same thing happening again. They should be SMART. Why? Otherwise, there’s very little chance the tasks will be completed.
Make sure someone is assigned to complete an action, the actions have a due date and have a work ticket assigned to them and documented.

No blame
This exercise will really deliver value if there is a no blame culture. Participants need to feel like they can be honest, and comfortable in that. This will not happen if people feel like they might have blame pinned to them.

We win as a team and we fail as a team. Stay constructive and document the changes you need to make to stop this happening again, as a team.

In summary, use your RCA sessions to document what happened and talk about how to improve your processes and systems. Get your framework in place and sick to it. Ensure that there is a no blame culture and keep your actions SMART. If you’ve got all of that, you’re well on your way to a lean, mean incident management machine!