Red Team Blue Team: Adversarial Testing for Web3

Red Team and Blue Team Defined

The terms red team and blue team originate in military exercises where one force simulates an adversary and the other defends. In information security, the model maps directly: the red team attacks, the blue team defends, and the interaction between the two reveals the true security posture of an organisation.

The red team operates from the perspective of a real-world attacker. It uses the same tools, techniques, and procedures (TTPs) that sophisticated threat actors employ, working from a defined objective rather than a checklist of vulnerability classes. In the context of Web3, that objective might be exfiltrating private keys, approving a malicious transaction from a treasury multisig, or compromising a validator node. The red team does not constrain itself to the perimeter; it pursues whatever path would lead a real attacker to the goal.

The blue team is the defensive function: the security operations centre (SOC), incident responders, threat hunters, and detection engineers who are responsible for identifying and containing threats. In a formal red team exercise, the blue team typically operates without foreknowledge of the engagement, responding to red team activity as if it were a genuine attack. This blind structure is what gives the exercise its validity. A blue team that knows an exercise is underway will respond differently to one that does not.

A third concept has become increasingly important: the white team, which manages and referees the exercise, sets boundaries, holds the rules of engagement, and deconflicts activity that could cause genuine harm. In regulated environments, the white team also serves as the interface with supervisory authorities.

Why Adversarial Testing Matters in Crypto

Automated vulnerability scanners and conventional penetration tests have a well-understood limitation: they find what they are configured to look for. They identify known vulnerability patterns, misconfigurations, and CVEs. They do not discover novel attack paths that combine a phishing-susceptible developer, a weak multisig approval workflow, and an insufficiently monitored admin panel into a single kill chain.

This is precisely the kind of attack that has drained hundreds of millions from crypto firms. The Bybit incident is a reference point: the attacker did not exploit a zero-day in the blockchain itself. The attack path ran through social engineering, compromised developer tooling, and a blind spot in the approval process. A red team would have identified and demonstrated each of those weaknesses. A blue team, properly tooled and rehearsed, would have detected the lateral movement before the transaction was signed.

"Automated tools find the vulnerabilities your developers already know about. Red teams find the attack paths your developers never imagined."

Crypto firms face a specific threat profile that makes adversarial testing particularly valuable. The assets held are liquid, pseudonymous, and irreversible. A single successful attack can result in total loss of funds with no recourse. The adversaries are sophisticated, patient, and financially motivated to an extraordinary degree. Nation-state groups such as the Lazarus Group have demonstrated willingness to invest months of reconnaissance into a single target. Standard security practices designed for conventional IT environments are insufficient against this threat model.

For more context on the specific threat landscape, see our analysis of Lazarus Group tactics and what crypto firms can do to defend against them.

What a Crypto Red Team Tests

A red team engagement in a Web3 context extends well beyond smart contract fuzzing and API testing. The objective is to simulate a credible, motivated attacker across the full attack surface, including people, processes, and technology.

Social Engineering and Phishing Campaigns

Spear phishing targeted at developers, finance teams, and executives is one of the most effective entry vectors in the crypto sector. Red teams will build realistic phishing campaigns using open-source intelligence (OSINT) on target personnel, crafting pretexts that align with the target's actual work context. This might mean a fake communication from a known DeFi protocol requesting a contract review, or an urgent message impersonating a senior internal figure requesting wallet approval. The goal is to test whether personnel recognise and report the attempt, or whether credentials, session tokens, or sensitive data are surrendered.

Physical Security Testing

For firms with hardware signing devices, cold storage facilities, or hardware security modules (HSMs), physical access represents a genuine attack vector. Red teams test whether an adversary could gain physical access to critical infrastructure through tailgating, social engineering of facilities staff, or exploitation of inadequate physical controls.

Credential Harvesting and Privilege Escalation

Red teams test whether compromised low-privilege credentials can be leveraged to escalate access toward critical systems. In a crypto firm context, this means tracing the path from an employee's workstation to administrative access over internal tooling, cloud infrastructure, or key management systems. This connects directly to privileged access management weaknesses that are endemic in the Web3 sector.

Key Management Bypass Attempts

The red team will specifically target key management processes: the workflows through which signing keys are generated, stored, used, and rotated. This includes testing whether approval workflows for multisig transactions can be manipulated, whether key material is adequately protected at rest, and whether personnel responsible for signing operations can be deceived into approving illegitimate transactions.

Insider Threat Simulation

Insider threats in crypto are disproportionately damaging. Red teams will simulate a malicious or compromised insider with legitimate access, testing whether the organisation's monitoring, access controls, and process segregation would detect or contain the activity. This requires close coordination with the white team to avoid disruption to live operations.

Developer Toolchain Compromise

Supply chain attacks via compromised developer tooling have become a primary vector in Web3 attacks. Red teams test whether a malicious pull request, a compromised dependency, or a tampered build pipeline would be detected before reaching production. This requires collaboration with the organisation's engineering leadership to scope appropriately.

The Blue Team's Role and Tooling

The blue team's function in a crypto firm spans detection, containment, investigation, and recovery. Its effectiveness depends on the quality of its tooling, the depth of its visibility into the environment, and the readiness of its personnel to respond under pressure.

Core blue team tooling in a Web3 context includes:

SIEM (Security Information and Event Management): Centralised log collection and correlation across cloud infrastructure, endpoints, internal applications, and blockchain monitoring feeds. The SIEM is the foundation of the blue team's situational awareness.
EDR (Endpoint Detection and Response): Real-time monitoring and response capability on endpoints, including developer workstations and administrative machines that represent high-value targets.
On-chain monitoring: Blockchain transaction monitoring tools that alert on anomalous activity, large outflows, contract interactions from unexpected addresses, and governance proposal submissions.
Identity and access monitoring: Alerting on unusual authentication patterns, privilege escalation events, and access to sensitive systems outside normal working patterns.
Threat intelligence feeds: Subscriptions to intelligence feeds relevant to the crypto sector, including known attacker wallet addresses, malicious infrastructure, and emerging TTPs.

During a red team exercise, the blue team responds to genuine alerts generated by red team activity. The quality of its response, measured against pre-defined metrics, reveals whether detection rules are tuned correctly, whether alert fatigue is causing genuine signals to be missed, and whether incident response playbooks are fit for purpose.

See our detailed guide to building a security operations centre for crypto firms for a full breakdown of SOC architecture and tooling requirements.

Purple Teaming

Purple teaming is the collaborative model in which red and blue teams work together in real time rather than in isolation. In a traditional red team exercise, the red team operates covertly for weeks or months before handing over a report. The blue team then retrospectively analyses what it did and did not detect. In a purple team exercise, findings are shared as the exercise progresses.

The practical benefit is substantially faster improvement. When a red team technique succeeds undetected, the blue team immediately learns about it and works to build a detection rule. The red team then tests whether the new detection is effective, potentially adjusting its technique to evade it. This iterative loop, run over several days or weeks, delivers more security improvement per unit of investment than a traditional red team report that lands on a desk six weeks after the engagement.

Purple teaming is increasingly the preferred model for mature security programmes in regulated crypto environments. It requires a higher degree of trust and openness between teams, but the output is measurably better detection coverage and a blue team that has rehearsed responses to techniques that are directly relevant to its threat model.

DORA TLPT Requirements

The EU Digital Operational Resilience Act (DORA) introduces a formal regulatory requirement for adversarial testing for in-scope financial entities. DORA Article 26 mandates Threat-Led Penetration Testing (TLPT) at least every three years for significant financial entities, with the European Supervisory Authorities (ESAs) empowered to require more frequent testing for specific firms.

TLPT under DORA is more structured and more demanding than a conventional red team exercise. Key requirements include:

Engagements must be based on current threat intelligence relevant to the specific institution.
Testing must cover production systems and live environments, not representative test environments.
External testers must meet specific competence and independence requirements.
The scope must include critical or important functions as classified under DORA's resilience framework.
Results are shared with the relevant National Competent Authority.

For crypto-asset service providers (CASPs) operating under MiCA that are also captured by DORA's scope as significant financial entities, TLPT represents a substantial compliance obligation. The cost and duration of a DORA-compliant TLPT is significantly higher than a standard red team engagement, partly because of the regulatory documentation requirements and partly because of the requirement to test production systems with appropriate safeguards.

Our detailed breakdown of DORA compliance requirements for crypto firms covers the full operational resilience framework, including the TLPT provisions in context.

How to Commission a Red Team Engagement

Commissioning a red team engagement requires careful scoping, provider selection, and internal preparation. A poorly scoped engagement will either fail to test the right things or cause unnecessary disruption. The following steps provide a sound basis for procurement.

Define the Objective

Start with a concrete adversarial objective rather than a list of systems to test. "Demonstrate whether an external attacker could exfiltrate treasury signing keys" is a better starting point than "test our infrastructure." The objective should reflect the most damaging realistic attack scenario for the organisation.

Define Rules of Engagement

Rules of engagement (RoE) set the boundaries of the exercise: what systems are in scope, what techniques are permitted, what is explicitly prohibited (production transaction signing, denial-of-service attacks), and the escalation path if the red team encounters a genuine vulnerability mid-exercise that requires immediate remediation. The white team holds and enforces the RoE.

Select a Provider with Crypto Domain Expertise

Generic red team providers without crypto experience will miss the attack paths that matter. Look for providers who can demonstrate familiarity with key management systems, multisig workflows, on-chain governance mechanisms, and the social engineering techniques specific to the crypto sector. Request references from comparable organisations.

Internal Preparation

The blue team should not be informed of the engagement start date, but key internal stakeholders including legal, HR (for social engineering aspects), and executive leadership need to be briefed on the exercise. Establish a communications protocol to handle scenarios where the red team activity triggers a genuine incident response.

Debrief and Remediation Tracking

The end of the exercise is the beginning of the improvement cycle. A quality red team report provides not just a list of findings but a narrative of each attack path, the detection points that were missed, and prioritised remediation guidance. Track remediation progress against risk ratings and schedule a validation retest for critical findings.

Metrics and Measuring Effectiveness

Adversarial testing produces value only if the organisation can measure its effectiveness and track improvement over time. The following metrics provide the foundation for a defensible measurement framework.

Detection rate: The percentage of red team actions (techniques, movements, or objectives attempted) that generated an alert in the blue team's tooling. A detection rate below 50% on a first engagement is not unusual and should be treated as a baseline to improve from.
Mean Time to Detect (MTTD): The average time between a red team action being executed and an alert being generated. In a crypto context, where asset exfiltration can occur in seconds, MTTD is a critical metric. An MTTD measured in hours is functionally equivalent to no detection.
Mean Time to Respond (MTTR): The average time between an alert being generated and a response action being taken. This measures the efficiency of the blue team's workflow and triage process.
Alert-to-escalation ratio: The proportion of generated alerts that are escalated to an investigation versus those that are dismissed or missed. A high ratio of missed escalations indicates alert fatigue or insufficient triage capacity.
Objectives achieved: The number of red team objectives that were fully accomplished without detection. Each achieved objective represents a complete failure of the security programme for that attack scenario, and should be treated with corresponding urgency in the remediation process.

These metrics should be reported to the board alongside the technical findings, framed in terms of organisational risk rather than technical detail. A CISO who can demonstrate that MTTD has fallen from four hours to twenty minutes over three successive exercises is making a concrete, measurable case for the value of the programme.

For a broader view of how operational security metrics integrate with risk management frameworks, see our guide to operational risk management for crypto firms.

Frequently Asked Questions

What is the difference between a red team and a blue team in Web3 security?

The red team acts as the attacker, simulating real adversary tactics against your infrastructure, personnel, and smart contracts. The blue team acts as the defender, detecting, containing and responding to those simulated attacks. Running both in a coordinated exercise reveals whether your detection and response capability is genuinely effective.

How does a crypto red team engagement differ from a standard penetration test?

A standard penetration test is typically scoped, time-limited, and focuses on known vulnerability classes. A red team engagement is objective-led, uses custom threat intelligence, runs over weeks rather than days, and simulates a complete attack kill chain including social engineering, physical access, and lateral movement across the full stack.

What is purple teaming and why is it valuable?

Purple teaming combines red and blue teams working collaboratively in real time. Rather than the red team operating in isolation and handing over a report, both sides share findings as the exercise progresses, allowing the blue team to immediately tune detection rules and test whether remediations are effective.

Does DORA require adversarial testing for crypto firms?

DORA Article 26 mandates Threat-Led Penetration Testing (TLPT) for significant financial entities within scope, which includes crypto-asset service providers regulated under MiCA. TLPT is a structured form of red team exercise conducted under regulatory supervision, typically every three years.

How do you measure the effectiveness of a red team blue team exercise?

Key metrics include detection rate (percentage of red team actions detected by the blue team), mean time to detect (MTTD), mean time to respond (MTTR), alert-to-escalation ratio, and the number of attack paths that reached the objective undetected. These metrics establish a baseline and track improvement over successive exercises.

Red Team vs Blue Team: Adversarial Security Testing for Web3 Organisations

Red Team and Blue Team Defined

Why Adversarial Testing Matters in Crypto

What a Crypto Red Team Tests

Social Engineering and Phishing Campaigns

Physical Security Testing

Credential Harvesting and Privilege Escalation

Key Management Bypass Attempts

Insider Threat Simulation

Developer Toolchain Compromise

The Blue Team's Role and Tooling

Purple Teaming

DORA TLPT Requirements

How to Commission a Red Team Engagement

Define the Objective

Define Rules of Engagement

Select a Provider with Crypto Domain Expertise

Internal Preparation

Debrief and Remediation Tracking

Metrics and Measuring Effectiveness

Frequently Asked Questions

What is the difference between a red team and a blue team in Web3 security?

How does a crypto red team engagement differ from a standard penetration test?

What is purple teaming and why is it valuable?

Does DORA require adversarial testing for crypto firms?

How do you measure the effectiveness of a red team blue team exercise?

Ready to Test Your Defences Against a Real Adversary?

Red Team vs Blue Team: Adversarial Security Testing for Web3 Organisations

Red Team and Blue Team Defined

Why Adversarial Testing Matters in Crypto

What a Crypto Red Team Tests

Social Engineering and Phishing Campaigns

Physical Security Testing

Credential Harvesting and Privilege Escalation

Key Management Bypass Attempts

Insider Threat Simulation

Developer Toolchain Compromise

The Blue Team's Role and Tooling

Purple Teaming

DORA TLPT Requirements

How to Commission a Red Team Engagement

Define the Objective

Define Rules of Engagement

Select a Provider with Crypto Domain Expertise

Internal Preparation

Debrief and Remediation Tracking

Metrics and Measuring Effectiveness

Frequently Asked Questions

What is the difference between a red team and a blue team in Web3 security?

How does a crypto red team engagement differ from a standard penetration test?

What is purple teaming and why is it valuable?

Does DORA require adversarial testing for crypto firms?

How do you measure the effectiveness of a red team blue team exercise?

Ready to Test Your Defences Against a Real Adversary?

Related Articles