Organizational structures vary, but these are general guidelines about the way different functions at Polygon relate to incident response.
Generally speaking, every department should have a primary point of contact, on-call rotation, and clear escalation path. Organizations should always strive to minimize dependencies and empower response teams as much as possible. But in novel situations, you do not know who you will need to help out. A clear system for recruiting responders from all parts of the business ensures that when the unexpected happens, responders don't waste time on manual processes or ambiguous points of contact.
On-Call Groups#
| Group | Slack Handle | Coverage | Typical Involvement |
|---|---|---|---|
| First Responders | @FirstResponders |
24x7 (always staffed) | Triage, incident declaration, status updates, coordination |
| DevOps | @DevOpsOnCall |
On-call rotation | Infrastructure, deployments, node operations, chain recovery |
| Applications | @ApplicationsOnCall |
On-call rotation | Portal, Staking, managed services |
| Smart Contracts | @SmartContractsOnCall |
On-call rotation | Bridge contracts, staking contracts, protocol-level issues |
| SecOps | @SecOpsOnCall |
On-call rotation | Security incidents, vulnerability response |
| IT | @ITOnCall |
On-call rotation | Internal tooling, access management, corporate infrastructure |
| Agglayer | @AgglayerOnCall |
On-call rotation | Agglayer service health, cross-chain settlement issues |
| BD | @BDOnCall |
On-call rotation | T0 customer outreach, partner communications |
Coming soon: additional on-call groups
On-call rotations for Sequence, Trails, BPN, and OMS are planned but not yet established. As these services mature, dedicated on-call groups and escalation paths will be added to this page.
First Responders are not a traditional on-call
The First Responder rotation is staffed 24x7 — there is always someone actively working the shift, not just carrying a pager. First Responders are the front line for all incidents: they triage alerts, assess severity, declare incidents, and coordinate the initial response. See Different Roles for details.
Engineering#
Engineers are the primary responders and subject matter experts during incident response.
At Polygon, the First Responder on duty has initial triage and assessment responsibility for all incidents. Once an incident is declared, the Incident Commander (a senior member of the core engineering team) leads the technical response. Depending on the nature of the incident, the relevant on-call engineers are pulled in:
- DevOps (
@DevOpsOnCall) — For infrastructure issues: node health, deployments, disk/compute problems, Ansible playbook execution, chain recovery operations (Bor rewinds, Heimdall rollbacks, span rotations, block producer maintenance). - Applications (
@ApplicationsOnCall) — For application-layer issues: Portal, Staking UI, managed RPC endpoints, backend services. - Smart Contracts (
@SmartContractsOnCall) — For protocol-level issues involving on-chain contracts: bridge operations, staking contract behavior, state sync failures, checkpoint contract issues. - Agglayer (
@AgglayerOnCall) — For issues with the Agglayer service: cross-chain settlement, proof verification, service availability.
Business Development / Partner Relations#
BD is the voice of the customer during incident response. The BD OnCall (@BDOnCall) is responsible for direct outreach to T0 customers and key partners during major incidents. They assess which partners are affected, coordinate messaging, and manage relationship-sensitive communications that go beyond public status page updates.
Account owners should understand any potential impact to their accounts and be prepared to field inbound questions from partners who discover the incident through their own monitoring or social channels.
Marketing / Communications#
Marketing is the primary response team for any public relations incident. During a major incident, the Communication Commander (a member of the marketing team) handles public communications on X, Telegram, and Discord.
Marketing should be engaged in any incident of sufficient scope or severity that the company's brand or image is at risk, or where public updates need to be sent through customer communication channels.
Product Management and Design#
Product Managers and Designers are often on the hook for helping response teams make decisions when product functionality is impacted across multiple services or products. For example, if the response team has to decide which service to bring back up first, a Product Manager can help decide which one has the greatest customer impact.
Product will also be involved in the postmortem process, for both scheduling follow-up action against other work, as well as advising on any required product changes due to the issue.
Executive Team#
Clear processes for updating the executive team during a major incident response helps ensure organizational leadership has the context and information they need and prevents executive swoops. Additionally, while the Incident Commander has final authority during response, occasionally a major incident may require action at the highest levels of the company. For example, a senior executive may want to reach out to an impacted customer or partner to manage their relationship and help assure them the issue is getting the attention it needs.
Security#
The SecOps OnCall (@SecOpsOnCall) is engaged for any incident with a security dimension: vulnerability disclosures, suspected exploits, unauthorized access, or any incident where the security posture of the chain or Polygon's infrastructure is in question. Security incidents follow a separate response process with additional confidentiality and coordination requirements.
During a security incident, SecOps may need to coordinate with HR for management of an internal threat, as well as with Legal for disclosure obligations.
IT#
The IT OnCall (@ITOnCall) handles incidents involving internal corporate infrastructure: access management, VPN, SSO, internal tooling, and any systems that responders depend on to do their jobs. If an incident is compounded by responders being unable to access the tools they need (SSH, GCP console, Slack, incident.io), IT is pulled in immediately.
Finance#
Finance is most often a stakeholder during incident response, and should be kept up to date on any impacts that may affect billing, accounting, or end-of-month/quarter activities.
However, finance should also have a clear escalation path because some incident response actions require financial operations. As a chain, there are situations where resolving an incident may require sending funds to a specific address — for example, topping up a checkpoint submitter account that has run out of ETH, or funding a contract interaction needed for recovery. Finance needs to be reachable and authorized to act quickly when these situations arise.
Consider Your Entire Organization
There may be other parts of your organization that need to be part of incident response, either as responders or stakeholders. It is important to identify the different areas of your business and think through situations in which they may need to be involved, as well as ensure that anyone on-call has proper incident response training and understands their responsibilities.