Business Continuity Planning for Crypto Firms

Every crypto firm has heard the phrase "not your keys, not your coins." What fewer firms have internalised is an equally important corollary: if you cannot access your keys when it matters, the outcome is the same whether the problem was a hack, a flood, a dead battery, or a departed employee. Business continuity planning (BCP) is the organisational discipline that closes this gap. It ensures that your firm can keep functioning, or restore operations rapidly, when any disruptive event strikes.

BCP is not the same as incident response planning, which focuses on detecting, containing, and recovering from a specific security incident. BCP is the broader resilience framework that wraps around incident response and every other operational risk. It asks: if any critical function fails for any reason, what happens next, who is responsible, and how quickly can we recover? If you do not have clear answers to those questions in writing, tested and rehearsed, you do not have a business continuity plan. You have a hope.

What Is Business Continuity Planning?

Business continuity planning is the process of identifying which functions are critical to an organisation's survival, understanding what threatens them, and building documented procedures to maintain or restore them under adverse conditions. The output is a Business Continuity Plan: a living document that describes what to do, in what order, and by whom, when disruption occurs.

BCP, Disaster Recovery, and Incident Response: Key Distinctions

These three disciplines are closely related but serve different purposes:

Business Continuity Planning (BCP) is the overarching organisational framework. It covers people, processes, facilities, communications, and technology. Its goal is operational resilience across all threat types.
Disaster Recovery (DR) is a subset of BCP focused specifically on restoring technology systems and data after a failure. It defines how backups are taken, how systems are rebuilt, and how data integrity is verified after recovery.
Incident Response (IR) is the structured process for handling security incidents: detecting an intrusion, containing the breach, eradicating malware, and restoring affected systems. IR feeds into BCP when a security incident also constitutes a continuity-threatening event.

A mature organisation needs all three. BCP without DR leaves you with a plan but no technical path to recovery. DR without BCP means your systems are back online but your team does not know who should be doing what. Incident response without BCP means you can handle a breach tactically but may not have the business-level structures to survive a prolonged outage.

The Business Impact Analysis

The Business Impact Analysis (BIA) is the foundation of every BCP. It is the structured process by which an organisation identifies which business functions are critical, what the impact of their disruption would be over time, and therefore how quickly they must be restored. The BIA produces two fundamental recovery objectives:

Recovery Time Objective (RTO): The maximum acceptable length of time that a business function can remain unavailable before the disruption causes unacceptable harm. For a crypto trading desk, this might be 30 minutes. For a compliance reporting function, it might be 48 hours.
Recovery Point Objective (RPO): The maximum acceptable amount of data loss, measured in time. If a database is restored from a backup taken 4 hours ago, the RPO is 4 hours. A trading system might require an RPO of under 1 minute; a legal document repository might tolerate 24 hours.

Setting realistic RTOs and RPOs requires input from business owners, not just technical teams. The compliance function may set a generous RPO, while the treasury team demands near-zero tolerance. These objectives then drive the technical and procedural investments required to meet them.

Why Crypto Firms Need BCP More Than Traditional Financial Firms

The case for BCP in traditional finance is well-established. The case for crypto is even more compelling, for several structural reasons:

Irreversible asset losses. There are no chargebacks in blockchain transactions. If funds are sent to the wrong address during a crisis, or if private keys are lost during an outage, those assets may be permanently unrecoverable. Every BCP decision in crypto has a potential finality that traditional finance does not.
24/7 operations. Crypto markets never close. A three-hour outage at 2am on a Saturday still carries the same market risk as any other time. BCP must account for out-of-hours incidents without relying on a full team being available.
Thin operational margins in early-stage firms. Many Web3 firms operate with small, highly specialised teams where one person often holds multiple critical functions. Key person risk is acute.
Regulatory scrutiny. Under frameworks like DORA and MiCA, documented BCP is becoming a compliance obligation rather than merely best practice.

"In traditional finance, a prolonged outage costs revenue and reputation. In crypto, it can cost assets that cannot be recovered. Business continuity planning is not optional for firms that operate with custodial or treasury responsibilities."

Crypto-Specific Continuity Risks

A BCP for a crypto firm must address threats that do not exist in the same form for traditional businesses. Understanding these risks is a prerequisite for building effective continuity strategies.

Key Person Dependency and Keyholder Unavailability

In many crypto firms, a single individual controls the private keys to significant assets, or holds the sole device needed to authorise transactions. If that person becomes ill, is in an accident, travels to a jurisdiction without internet access, or simply resigns, operations can grind to a halt. The problem is compounded when the firm's wallet architecture has not been designed with multi-signature schemes from the outset.

This is not a theoretical risk. There have been documented cases of exchange operators dying without leaving recovery procedures, resulting in permanent loss of customer funds. A robust BCP must map every critical key management function to a named individual and at least one trained successor.

Exchange Account Access and 2FA Device Loss

Many firms rely on centralised exchanges for a portion of their treasury or trading operations. These accounts are typically protected by two-factor authentication (2FA) tied to a specific mobile device. If that device is lost, stolen, or damaged during an incident, and no backup 2FA method or recovery codes are stored securely, the account may be inaccessible for days. BCP must include documented procedures for recovering access to every exchange account used in operations, including backup authentication methods and the contact details for exchange support teams.

Regulatory Suspension or Enforcement Action

Regulators can move quickly. A firm may receive notice of a licence suspension, a dawn raid, or an asset freeze with very little warning. BCP must address what happens to customer funds and operational continuity obligations in this scenario, including legal counsel contact procedures and regulatory notification requirements.

Third-Party Provider Outages

Crypto firms depend on a stack of third-party infrastructure providers whose outages can be just as disruptive as internal failures. Critical dependencies include:

Node providers and RPC endpoints: Services such as Infura, Alchemy, or QuickNode are critical infrastructure for many DeFi applications. A provider outage can make it impossible to read chain state or submit transactions.
Oracles: Price feed oracle failures can cause liquidation systems to malfunction or DeFi protocols to behave unexpectedly.
Custody providers: If a third-party custodian experiences an outage, access to custodied assets may be temporarily unavailable.
Cloud infrastructure: A major cloud availability zone outage can take down trading systems, APIs, and customer interfaces simultaneously.

BCP must include fallback arrangements for each critical third-party dependency, including pre-tested failover to alternative providers. Reviewing these dependencies is also a core part of operational risk management for Web3 firms.

Reputational Crises Requiring Rapid Communications

A security breach, a protocol exploit, or a high-profile regulatory action can trigger a reputational crisis that demands immediate, coordinated public communications. Without a pre-prepared crisis communications plan, teams default to silence or improvised statements that can make the situation significantly worse.

DDoS Attacks on Infrastructure

Distributed denial-of-service attacks against trading platforms, APIs, or websites can render services unavailable to customers for extended periods. BCP must address DDoS mitigation and failover procedures, as well as the communications strategy for telling customers what is happening and when service will be restored.

Smart Contract Pause Mechanisms and Governance Delays

When an exploit is detected on a deployed smart contract, the ability to pause the contract quickly can be the difference between a contained incident and a catastrophic loss. However, many protocols have governance mechanisms that introduce delays into any on-chain action. BCP must identify which contracts have pause or emergency stop functionality, who holds the authority to invoke it, and what the governance path is when time is of the essence.

The Business Impact Analysis for Web3 Firms

The Business Impact Analysis is the analytical engine of your BCP. For a Web3 firm, it should cover the following steps.

Identifying Critical Functions

Start by cataloguing every function the organisation performs and then classifying each one by its criticality. For a typical crypto firm, critical functions typically include:

Trading and treasury operations: The ability to execute transactions, manage positions, and move funds. Disruption here has direct financial consequences in real time.
Customer-facing services: Exchanges, wallets, and DeFi front-ends that customers rely on. Downtime here damages trust and may trigger regulatory reporting obligations.
Key management: The processes and people responsible for private key custody, signing authority, and multi-signature governance. Failure here can lock assets permanently.
Compliance reporting: Transaction monitoring, suspicious activity reporting, and regulatory submissions. Failure here carries direct regulatory risk.
Communications and customer support: The ability to communicate with customers, partners, and regulators during an incident.

Assessing the Impact of Disruption

For each critical function, assess what happens if it is unavailable for 1 hour, 4 hours, 24 hours, and 72 hours. Impacts typically fall into four categories: financial loss, reputational damage, regulatory breach, and contractual default. Quantify these impacts where possible. A trading function offline for 4 hours during high volatility may represent millions in foregone revenue. A compliance reporting failure for 24 hours may trigger a regulatory notification requirement.

Setting RTOs and RPOs

With impact assessments in hand, assign an RTO and RPO to each critical function. These must be set by business owners, not assumed by IT. Once set, they become contractual requirements for your technology and recovery investments: if the RTO for a function is 2 hours, your recovery procedures must demonstrably be capable of meeting that target.

Document these in a table that becomes a core annex of your BCP, and review them at least annually or whenever major changes to the business occur.

BCP Framework: The Core Components

A complete BCP document covers five core components. Each is essential; a plan that omits any one of them will have gaps that become apparent at the worst possible moment.

1. Business Impact Analysis

As described above: the documented assessment of critical functions, threat scenarios, and recovery objectives. The BIA is the source of truth for all subsequent planning decisions.

2. Continuity Strategies

For each critical function and its associated threat scenarios, the BCP must describe the strategy for maintaining or restoring that function. Strategies include: redundant systems, manual workarounds, backup personnel, geographic failover, and third-party arrangements. Each strategy must be specific enough to be actionable, not merely aspirational.

3. Recovery Procedures

Detailed, step-by-step runbooks for how to execute each continuity strategy. These must be written at a level of detail that allows a competent but unfamiliar person to execute them under pressure. Every procedure should include the tools required, the credentials or access needed, and the verification steps to confirm that recovery has been successful.

4. Crisis Communications Plan

A documented framework for communicating with internal stakeholders, customers, partners, and regulators during an incident. This includes pre-approved holding statements, escalation thresholds, and named spokespersons.

5. Testing and Maintenance Programme

A scheduled programme of exercises, tests, and reviews to validate that the BCP works and remains current. A plan that is never tested is not a plan; it is a document that provides false assurance.

Key Management Continuity

Key management continuity is arguably the most important and most overlooked component of a crypto firm's BCP. The irreversibility of blockchain transactions means that a key management failure can result in permanent, unrecoverable asset loss. No insurance policy, no legal action, and no technical recovery procedure can undo a transaction that was incorrectly executed or access that is permanently lost.

Multi-Signature Succession Planning

Multi-signature wallet configurations are the foundation of key management continuity. By requiring M-of-N signatories to authorise a transaction, you eliminate single points of failure. But a 2-of-3 multisig only provides continuity if at least two keyholders are accessible at any given time. BCP must document the succession order for each signing role: who is the primary keyholder, who are the named successors, and under what conditions does succession activate?

This is closely related to the broader topic of privileged access management, which covers how signing authorities are provisioned, monitored, and revoked across the organisation.

Hardware Wallet Backup and Recovery Procedures

Every hardware wallet used in operations must have a documented recovery procedure. Seed phrases must be stored in a format that survives the loss of the device itself, typically on metal seed storage in a physically secured location. The BCP must specify where recovery materials are stored, who has access to them, and under what conditions that access is authorised. For high-value custody operations, consider the use of hardware security modules (HSMs), which provide tamper-resistant key storage with enterprise-grade access controls and audit logs.

Geographic Distribution of Key Material

Concentrating all key material in a single location creates a geographic single point of failure. A fire, flood, or physical security breach at that location destroys not just the hardware but potentially the recovery materials as well. Distribute key material across at least two geographically separate, physically secured locations. For institutional operations, consider jurisdictional distribution as well, particularly where regulatory freezing of assets is a plausible risk.

Dead Man's Switch Considerations

For firms where a founding individual controls significant key material, consider automated notification or escalation procedures that trigger if the individual fails to check in within a defined period. This is a sensitive subject, but the alternative is discovering the problem only when a crisis has already occurred. Legal and technical arrangements should ensure that successor keyholders can access materials in the event of prolonged incapacitation.

What Happens When a Keyholder Dies or Is Incapacitated?

This is not a morbid hypothetical. It is a risk that has resulted in permanent asset losses for multiple organisations in the crypto industry. Every firm with custodial responsibilities should have a documented legal and operational procedure for this scenario, including power of attorney arrangements, estate planning coordination, and clear succession paths that do not require a court order to execute a transaction.

Technology and Infrastructure Continuity

The technology component of a crypto BCP covers the redundancy and recovery procedures for every critical system. The goal is to eliminate single points of failure and ensure that recovery can be executed within the RTOs established in the BIA.

Redundant RPC Endpoints

Any application that relies on a single RPC endpoint to interact with blockchain networks is exposed to provider-level outages. Build redundancy by maintaining relationships with multiple node providers and implementing automatic failover at the application layer. Test this failover regularly to confirm it works as expected under real conditions, not just in theory.

Multi-Cloud and Multi-Region Deployments

Critical application workloads should be deployed across at least two cloud availability zones or, for the most critical functions, across two separate cloud providers. This protects against both availability zone outages and provider-wide incidents. Deployment architecture should be documented in the BCP, along with the procedures for activating failover and confirming that the standby environment is functioning correctly.

Database Backup and Recovery

Database backups must be taken at a frequency that meets the RPO for each function. Backups must be stored in a separate location from primary data (ideally in a separate cloud region or offline). Recovery from backup must be tested regularly to confirm that the backup is valid and that the restoration process works within the target RTO. Untested backups are not backups; they are unverified hopes.

Smart Contract Upgrade and Pause Procedures

For protocols with upgradeable or pausable smart contracts, the BCP must document the procedure for invoking these mechanisms under emergency conditions. This includes identifying the address or multisig wallet with upgrade or pause authority, the governance steps required, and the communication sequence for notifying users and the market. Where governance delays make on-the-fly pausing impossible, consider whether a guardian multisig with delegated emergency authority is appropriate.

DNS Failover

A DNS failure or hijacking can take down a customer-facing service even if the underlying infrastructure is fully operational. The BCP should address DNS redundancy, including the use of secondary DNS providers and procedures for rapidly updating DNS records in the event of a primary provider failure or security incident affecting domain control.

People and Process Continuity

Technology continuity without people continuity is incomplete. Even the best-designed failover architecture requires people who know how to activate it, verify it, and communicate about it. People and process continuity ensures that the human side of recovery is as robust as the technical side.

Succession Planning for Key Roles

Every critical role in the organisation should have a documented successor. For small firms, this may mean cross-training individuals to cover multiple roles under crisis conditions. The succession plan must specify not just who the successor is, but how they are notified, what authority they assume, and what access they need to perform the role.

Documented Runbooks

Every critical recovery procedure must be documented in a runbook: a step-by-step guide written at sufficient detail that a trained but unfamiliar person can execute it correctly under pressure. Runbooks should be version-controlled, stored in a location that is accessible even when primary systems are down, and reviewed regularly to ensure they remain accurate as systems evolve.

Out-of-Band Communication Channels

If your primary communication platforms are compromised or unavailable during an incident, your team needs an alternative channel to coordinate the response. Out-of-band communication options include dedicated Signal groups, offline phone trees, or pre-established secondary email accounts on a different provider from your primary. These channels should be tested periodically and every team member must know how to access them without relying on the primary system. This is a key consideration for security operations teams who need to maintain coordination during active incidents.

Emergency Access Procedures

There must be documented procedures for granting emergency access to systems when normal access mechanisms are unavailable. This might include break-glass accounts with elevated privileges that are only activated under defined conditions, with all use logged and reviewed. These procedures must be tested and must be secured against misuse.

Contractor and Vendor Continuity Requirements

If critical functions are performed by contractors or third-party vendors, those vendors must be subject to BCP requirements that align with your own recovery objectives. This means incorporating BCP and DR requirements into contracts, conducting due diligence on vendor continuity arrangements, and testing joint recovery scenarios where appropriate.

Crisis Communications Planning

How an organisation communicates during a crisis is often as consequential as how it responds operationally. Poor communications can turn a manageable incident into a reputational catastrophe that outlasts the original disruption by months or years.

Who Speaks Publicly During an Incident?

Designate named individuals as authorised spokespersons for different types of incidents. Define clear rules about who may communicate with the press, with regulators, with institutional counterparties, and with retail customers. Ensure that everyone in the organisation understands that they should not make public statements about an active incident without authorisation. Spontaneous, unauthorised disclosures are a common source of reputational damage during crypto security events.

Pre-Approved Holding Statements

Prepare holding statement templates in advance for the most likely incident scenarios: security breach, service outage, regulatory action, and key management failure. These statements should acknowledge the issue, confirm that the team is investigating, and provide a timeline for the next update, without disclosing information that could worsen the incident or create legal liability. Pre-approved language prevents teams from improvising under pressure.

Stakeholder Notification Procedures

Define notification procedures for each stakeholder group: customers, institutional partners, regulators, and internal leadership. Each group has different information needs and different notification timelines. Customers need early reassurance. Regulators need formal, accurate reporting within defined timeframes. Internal leadership needs situational awareness to make resource allocation decisions.

Regulatory Notification Timelines Under DORA and MiCA

Under DORA, financial entities must report major ICT-related incidents to competent authorities within defined timeframes, typically including an initial notification within 4 hours of classification, an intermediate report within 72 hours, and a final report within one month. DORA compliance requires firms to have pre-established reporting procedures and designated points of contact with their national competent authority.

Under MiCA, crypto asset service providers (CASPs) are required to notify their competent authority without undue delay of any significant operational disruption, including those affecting the availability, authenticity, integrity, or confidentiality of services. The crisis communications plan must include the contact details, reporting templates, and escalation paths required to meet these obligations reliably.

Social Media Response

During a significant incident, social media will amplify rumours and misinformation faster than your communications team can respond. The crisis communications plan should address social media monitoring, the frequency and format of public updates, and the guidelines for responding to inaccurate claims. Silence on social media during a major incident is typically interpreted as evidence of something worse than the actual situation.

Testing Your BCP

A business continuity plan that has never been tested is not a plan. It is a document that may have been accurate when it was written and is almost certainly out of date now. Testing converts a static document into a dynamic capability, and it is the only way to discover gaps before a real incident exposes them.

Tabletop Exercises

A tabletop exercise brings key personnel together to walk through a hypothetical scenario in a structured discussion format. No systems are activated; the exercise tests understanding of the plan, identifies gaps in knowledge, and surfaces conflicting assumptions about roles and responsibilities. Tabletop exercises are low-cost and low-risk and should be conducted at least annually.

Useful scenarios for crypto firm tabletops include: a private key compromise discovered on a Sunday evening, a node provider outage coinciding with high market volatility, a regulatory dawn raid on the primary office, and the unexpected death of the CEO who was the sole authorised signatory on a high-value wallet.

Functional Tests

A functional test activates specific elements of the BCP without taking primary systems offline. Examples include: testing the failover of an RPC endpoint to a secondary provider, restoring a database from backup and verifying data integrity, or activating the out-of-band communications channel and confirming that all personnel can access it. Functional tests should be conducted at least twice per year for critical functions.

Full Interruption Tests

A full interruption test genuinely takes primary systems or functions offline to verify that the recovery procedure works within the target RTO. This is the most realistic form of testing and carries the most operational risk, so it requires careful planning and execution. For critical functions, a full interruption test should be conducted at least annually, ideally outside peak operating hours.

What to Do After a Failed Test

A failed test is valuable information. Treat it as an opportunity to improve the plan, not as a reason to conceal the exercise's results. Document what failed, why it failed, and what corrective action is required. Update the plan accordingly, assign ownership of the corrective actions, and schedule a follow-up test to verify that the fix worked. An organisation that consistently passes its BCP tests without ever encountering any issues should review whether its tests are sufficiently challenging.

Regulatory Requirements

Business continuity planning is increasingly a regulatory requirement for crypto firms operating in major jurisdictions, not merely best practice. Understanding the specific obligations that apply to your firm is essential for compliance planning.

DORA Article 11: BCP and Recovery

The Digital Operational Resilience Act (DORA) entered into force across the European Union in January 2025 and applies to a broad range of financial entities, including many crypto firms. Article 11 of DORA requires financial entities to establish, implement, and maintain ICT business continuity policies and plans. These plans must address at minimum: the conditions and procedures for activating BCP measures, backup procedures and methods for restoring systems to required service levels, procedures for activating crisis communications plans, and testing requirements.

DORA also requires that BCP plans be subject to regular review and that testing be conducted at least annually. Major incidents that trigger BCP activation must be reported to competent authorities within the timelines specified in the regulation.

MiCA Article 96: Operational Continuity

MiCA Article 96 requires crypto asset service providers to maintain business continuity arrangements sufficient to ensure continuity of their services and activities, with timely resumption in the event of an interruption. Providers must maintain resources and have contingency and recovery plans in place, document these arrangements, and notify their competent authority of any significant operational disruption without undue delay.

ISO 22301: Business Continuity Management System

ISO 22301 is the international standard for business continuity management systems (BCMS). It provides a framework for implementing, maintaining, and continuously improving a BCP programme. Certification to ISO 22301 demonstrates to clients, partners, and regulators that your BCP programme meets a recognised international standard and is subject to independent audit. While not currently a regulatory requirement for most crypto firms, ISO 22301 certification provides a credible benchmark and is increasingly requested by institutional counterparties during vendor due diligence.

Frequently Asked Questions

What is business continuity planning in the context of crypto firms?

Business continuity planning (BCP) for crypto firms is the organisational framework that ensures operations can continue or be rapidly restored during and after a disruptive event. This includes hacks, regulatory actions, key loss events, data centre outages, third-party provider failures, and public crises. Unlike incident response, which focuses on detecting and containing a specific security incident, BCP covers the broader resilience posture of the entire organisation.

What is the difference between RTO and RPO?

The Recovery Time Objective (RTO) is the maximum acceptable length of time that a business function can be offline before the disruption causes unacceptable harm. The Recovery Point Objective (RPO) is the maximum acceptable amount of data loss measured in time; it defines how far back in time your recovery point can be. A trading desk might have an RTO of 30 minutes and an RPO of 5 minutes, meaning it must be back online within 30 minutes and no more than 5 minutes of transaction data can be lost.

Does DORA require crypto firms to have a business continuity plan?

Yes. DORA Article 11 requires financial entities, including many crypto asset service providers, to establish, implement, and maintain ICT business continuity policies and plans. These must cover critical functions, dependencies, recovery objectives, and testing schedules. MiCA Article 96 similarly requires crypto asset service providers to maintain business continuity arrangements, including documented procedures for restoring services and notifying competent authorities of major operational disruptions.

How should a crypto firm handle keyholder incapacitation in its BCP?

Key management continuity should be addressed through multi-signature wallet configurations where no single individual holds unilateral control, combined with documented succession procedures specifying who assumes keyholder responsibilities if a primary keyholder is unavailable. Hardware wallet seed phrases should be secured in geographically distributed, access-controlled locations. Legal arrangements such as power of attorney or estate planning guidance should also be in place so that assets are not locked permanently in the event of a keyholder's death or prolonged incapacitation.

How often should a crypto firm test its business continuity plan?

At minimum, a tabletop exercise should be conducted annually, walking key personnel through BCP scenarios without activating the full plan. Functional tests of specific recovery procedures (such as RPC failover or backup restoration) should occur at least twice per year. A full interruption test, where primary systems are genuinely taken offline to validate recovery, should be conducted at least once per year for critical functions. The plan itself should be reviewed and updated after every test, after every real incident, and whenever significant changes are made to technology or organisational structure.

Business Continuity Planning for Crypto and Web3 Organisations