Operationalizing Your Crisis Plan#

Operationalizing your crisis plan begins by making practical changes to ensure you have what you need, in the way you need it, and at the time you need it. For example, your broader crisis management plan will be too cumbersome for your team to scan through for answers during a crisis situation. On the other hand, playbooks are more focused versions of your larger plan which make them easier to action, test, and maintain. They're also scenario-driven and provide you with specific parameters, considerations, and tasks.

Once you have these critical resources created, it can be difficult to centralize them and keep track of the most current version. Make sure your runbooks, playbooks, policies, and any other crisis response documentation are linked from your incident.io service definitions and/or pinned in the relevant Slack channels so responders can find them quickly.

Crisis Classification Scheme#

Waking up your Executive Crisis Leadership Team in the middle of the night should be a very rare occurrence. Having a classification scheme in place to rank the actual or anticipated materiality of an event will help you avoid a cry wolf scenario. At Polygon, we use our standard severity levels (S0-S3) for incidents. For true crises that go beyond a technical incident, a simple additional scale such as Low, Medium, High can be effective.

Remember that not all crises begin as a crisis. It may develop out of an ongoing incident, so determining your thresholds for escalation ahead of time (e.g., chain halted for more than 30 minutes, bridge funds at risk, regulatory notice received, etc.) is equally as important as the rankings.

Once you've defined your priorities, you can begin to automate parts of your crisis response. incident.io integrates with Slack for creating communications channels, can auto-publish status updates, and can notify stakeholders automatically. Combine this with Datadog for monitoring and alerting, and you have the operational backbone for rapid crisis response.

In a crisis situation, time savings are everything. Decreasing the mean time to respond and getting in touch with the right people is the most critical action your team can take at the onset of a crisis.

Crisis Declaration#

Does your crisis response team operate the same in a crisis as they do in normal business situations? Your answer should be no. Operating in a "crisis mode" should be distinctive because all actions and decisions are amplified, the tempo is quicker, the need for timely decisions is critical, the complexity of the problems are greater, the risks are higher, etc.

The Crisis Team Leader needs to clearly and definitively signal that the modes of thinking and processing have shifted. An incident.io alert to the crisis response team, a dedicated Slack channel with a clear crisis label, and an explicit verbal declaration on the Google Meet war room all serve this purpose. Declaring the response as over is also important in transitioning to normal or new ways of doing things — resolve the crisis incident and post a final update to the relevant channels.

Crisis Response Management Operations#

If you've followed along so far, you've essentially learned the operational framework for crisis response. During your response, you don't want to worry about how to contact the Crisis Team Leaders or which conference bridge you should be using or where your most up-to-date playbook is located. The operations side of things should just work.

At Polygon, our operational stack for crisis response includes:

  • incident.io — alerting, on-call scheduling, escalation policies, incident lifecycle management
  • Slack — real-time communication, dedicated incident/crisis channels
  • Google Meet — war rooms and conference calls
  • Datadog — monitoring, alerting, dashboards
  • ClickUp — follow-up action tracking

Make sure these are configured and tested before you need them. The middle of a crisis is not the time to be figuring out how to page someone or where the runbook lives.