r9y-map


Project maintained by r9y-dev Hosted on GitHub Pages — Theme by mattgraham

Blameless Postmortems

Definition:

A blameless postmortem is a meeting or process that is held after an incident or outage to analyze what happened and why, and to identify ways to prevent similar incidents from happening in the future. Blameless postmortems are conducted in a non-punitive environment, where the goal is to learn from mistakes and improve processes, rather than to assign blame to individuals.

Key Principles:

Benefits:

Examples:

References:

Tools:

Resources:

Additional Tips:

Related Terms:

Other Related Terms:

These terms are all related to the field of Site Reliability Engineering (SRE), which is the practice of applying software engineering principles to the operation of large-scale distributed systems. SREs are responsible for ensuring that these systems are reliable, scalable, and efficient.

Prerequisites

Before you can do Blameless Postmortems, you need to have the following in place:

In addition to the above, you also need to have the following in place:

Once you have all of these things in place, you can begin conducting Blameless Postmortems.

What’s next?

After you have Blameless Postmortems, the next steps are to:

  1. Implement the action plan: The action plan should outline the steps that need to be taken to address the root causes of the incident and to prevent similar incidents from happening in the future. This may involve changes to processes, procedures, or technology.
  2. Monitor the effectiveness of the action plan: Once the action plan has been implemented, you need to monitor its effectiveness to ensure that it is actually preventing similar incidents from happening. This may involve tracking metrics such as the number of incidents, the severity of incidents, and the mean time to repair (MTTR).
  3. Make adjustments to the action plan as needed: If the action plan is not effective, you need to make adjustments as needed. This may involve adding new steps to the plan, modifying existing steps, or removing steps that are not effective.
  4. Continuously improve the Blameless Postmortem process: The Blameless Postmortem process should be continuously improved. This may involve making changes to the process itself, or it may involve adopting new tools and techniques.

In addition to the above, you should also:

By following these steps, you can ensure that Blameless Postmortems are used to their full potential to improve the reliability and availability of your systems.