In September, a project launch was nearly derailed. The project launch happened on schedule and relatively smoothly because employees stepped in to work hours over evenings and weekends.
Individual incidents will inevitably happen, and one skill of a good project manager is to know what resources are available and necessary to remediate. As many others have written (see resources below), the goal of a blameless post-mortem is to go from putting out fires to preventing those fires. Ideally, blameless post-mortems should be embraced by your entire organization, but my goal is to make this process accessible to employees at any level in an organization. Taking leadership does not require holding a management position.
As we worked on resolving the incident, I documented each step involved in the resolution. Afterwards, I cleaned up and generalized that documentation into a set of procedures to help prevent an incident in that class from happening in the future.
Documentation is essential to sharing knowledge, but lists of documented procedures won’t help if your employees can’t identify which procedure is applicable to their situation or if the details of the procedures are overly complicated by cases.
Your procedures repository should:
- Be easily searchable;
- Contain clear, concise summaries not obfuscated by terms or concepts that most of your employees wouldn’t know;
- Never assume that your employees know everything that they “should” know; and
- Link to other documents (or other sections) that explore cases or issues in depth.
Ideally, the employees involved should document the incident. Github offers a fantastic postmortem template to gather basic information, with guidance on how to do so in a blameless fashion.
Employees involved in an incident already feel some degree of responsibility and shame. The best course of action is to encourage them to improve documentation and procedures, and demonstrate appreciation for them doing so, acknowledging that they are in the best position to help your organization avoid incidents in the future.
As explained by Astro Teller, I highly recommend using “pre-mortems” to explore the question “What might go wrong?” Just as your code should anticipate that users may not follow the exact path you expect them to follow, your procedures should encourage pro-active anticipation of potential problems.
Knowledge-sharing is one of the most effective tools to develop effective employees — as long as employees have access to a knowledge repository that follows the guidelines similar to what I suggested above. When our team answers questions, either from customers or internally, we document the answers on our team wiki. We encourage those answers to be uploaded as rough drafts and not highly edited, to make it easy. As the team develops our repository, these answers are refined, linked together, and edited. With the help of our tech writers, in just a few short weeks, our repository developed from a few simple notes to dozens of fantastic, well-organized, labeled, and refined articles, and a resource that our employees rely on.
Astro Teller from Google’s X writes about encouraging open conversations about failure, pre-mortems exploring what could go wrong, and celebrating “killing a project” as success rather than failure
Many thanks go to our tech writers for developing our Confluence documentation structure into the valuable resource that it’s become!
(reposted from LinkedIn)