The first step in any incident response process is to determine what actually constitutes an incident. Incidents are then classified by severity using S0–S3 levels, where lower numbers indicate higher urgency. The severity level determines the response process, communication requirements, and the level of risk you're authorized to take in pursuit of resolution.

Anything classified as S0 or S1 is a major incident and triggers the full incident response process — war room, Incident Commander, Communication Commander, status page updates, and 15-minute update cadence.

Always Assume The Worst

If you are unsure which level an incident is (e.g. not sure if S1 or S0), treat it as the higher severity. During an incident is not the time to discuss or litigate severity levels. Assume the worst, respond accordingly, and adjust during or after the postmortem.

Can an S2 or S3 trigger full incident response?

Yes. If a lower-severity issue requires coordinated response across multiple teams, the First Responder or Incident Commander can escalate it to the full incident response process regardless of the initial severity classification.

Severity Description Typical Response
S0 — Critical

Network halted or severe loss of core functionality. Immediate user impact. Requires immediate response.

Chain examples:
  • Bor chain halt — no blocks produced.
  • Heimdall consensus halt — no milestones, checkpoints, or span rotations.
  • Chain split — validators diverge on execution state.
  • Empty block production despite pending transaction pool.
  • Multiple block producer failures.
Application examples:
  • Customer-data-exposing security vulnerability.
  • Bridge operations completely halted (deposits and withdrawals frozen).
  • Complete outage of a critical user-facing service (Portal, Staking) affecting all users.
Full major incident response.
  • First Responder declares the incident on #incidents-pos.
  • War room created (Slack channel + Google Meet).
  • Incident Commander, Communication Commander, and BD OnCall join.
  • Status page updated on status.polygon.technology.
  • Partner Comms Bot notification sent.
  • 15-minute update cadence.
  • Public communications on X, Telegram, Discord as needed.
  • See During an Incident.
S1 — High

Significant degradation of core functionality. User impact likely. Requires urgent response.

Chain examples:
  • Heimdall consensus degradation — blocks produced but at significantly reduced rate.
  • Bor liveness degradation — block times consistently above 5 seconds.
  • Checkpoint production stopped for more than 1 hour.
  • StateSync message processing failure (deposits frozen).
  • Single block producer failure with degraded failover.
  • Network-wide block or transaction propagation degradation.
Application examples:
  • Severe performance degradation of a major service affecting most users.
  • Multi-provider RPC outage or degradation.
  • Erigon client divergence affecting multiple RPC providers.
  • Validator state divergence affecting checkpoint signing.
Full major incident response.
  • Same process as S0.
  • Incident Commander assesses whether public communications are warranted.
  • Monitor for escalation to S0.
Anything above this line is a major incident. The full incident response process is triggered for all S0 and S1 incidents.
S2 — Medium

Degraded performance but functional. Limited user impact. Requires timely response.

Chain examples:
  • Single block producer failure with clean failover.
  • Localized peering issues affecting non-producer validators.
  • Localized block or transaction propagation degradation.
  • Sustained elevated gas fees (1–3 hours).
Application examples:
  • Single RPC provider outage.
  • Partial loss of functionality on Staking or Portal (some features degraded, core flows work).
  • Polygonscan degradation.
  • Performance issues affecting a subset of users.
Coordinated response without full war room.
  • First Responder assesses and notifies relevant on-call team.
  • Work on issue as top priority.
  • Liaise with engineers of affected systems to identify cause.
  • If related to a recent deployment, consider rollback.
  • Monitor for escalation to S1.
  • Escalate to full incident response if coordination across multiple teams is needed.
S3 — Low

Minor issues with minimal user impact. Monitoring and planned response.

Chain examples:
  • Individual sentry or RPC node failure (redundancy intact).
  • Minor peering issue on a single non-producer node.
  • Brief gas fee elevation (under 1 hour).
  • Observability degradation (ethstats or sensor network issues).
Application examples:
  • Cosmetic bugs or minor UI issues on Portal or Staking.
  • Polygonscan temporarily unavailable.
  • Non-critical background job failures.
  • Minor performance degradation not affecting core user flows.
Monitored response.
  • Address during normal working hours as a priority above routine tasks.
  • Monitor status and escalate if the issue worsens.
  • Create a ticket for tracking and resolution.

Severity and Communication#

The severity level also determines the communication response:

Severity Status Page Partner Comms Bot Public Channels (X, Telegram, Discord) T0 Customer Outreach
S0 Immediately Immediately Communication Commander decides BD OnCall decides
S1 Immediately Immediately Communication Commander decides BD OnCall decides
S2 If user-facing impact As needed Typically not Typically not
S3 No No No No

Changing Severity During an Incident#

Severity can and should be adjusted as new information becomes available:

  • An S2 that is trending toward broader impact should be escalated to S1, triggering full incident response.
  • An S0 where impact is contained and recovery is underway can be downgraded to S1 — but never skip the postmortem.
  • The Incident Commander has authority to adjust severity at any time during an active incident.
  • All severity changes should be announced in the incident channel and reflected on the status page.