A February 2026 paper from arXiv formalized something practitioners had started to feel but struggled to articulate: in multi-agent coding systems, the structural composition of how agents are coordinated now outperforms the choice of which agent to use.
The paper is AdaptOrch (arXiv:2602.16873). The finding is quantified and reproducible: topology-aware orchestration produces 12-23% better outcomes than static single-topology approaches across coding, reasoning, and retrieval tasks using the exact same underlying models. On SWE-bench Verified, the improvement over the best single-model baseline reaches 22.9%.
That number is larger than the performance difference between any two frontier models on most coding benchmarks. Teams still debating which LLM to standardize on (Claude vs. GPT-5 vs. Gemini) are increasingly optimizing the wrong variable. The orchestration layer is already the primary performance lever.
AdaptOrch · arXiv:2602.16873 · Feb 2026
Orchestration topology outperforms model selection.
Performance gain over single-best-model baseline · Same models, different structural composition
5-8%
Typical gain from switching between frontier models.
12-23%
Gain from topology-aware orchestration with identical models.
+22.9%
AdaptOrch on SWE-bench Verified vs single-best baseline.
Source: AdaptOrch: Task-Adaptive Multi-Agent Orchestration in the Era of LLM Performance Convergence (arXiv:2602.16873, Feb 2026).
What AdaptOrch Actually Proved
AdaptOrch introduces the Performance Convergence Scaling Law: a condition where orchestration selection systematically outweighs model selection. The condition is straightforward: frontier models have converged enough that the marginal delta from model choice is smaller than the delta from orchestration design.
That condition now appears to be met. As of early 2026, spread between frontier models on SWE-bench Verified is tight enough that topology selection can produce larger gains than switching models.
AdaptOrch identifies four canonical orchestration topologies: parallel, sequential, hierarchical, and hybrid. It then routes tasks dynamically based on task dependency graphs. Coding tasks benefit more from parallel and hybrid topologies. Reasoning tasks lean sequential and hierarchical (41% and 35% of GPQA Diamond tasks route there, respectively). A single-topology system tuned for one task family underperforms on the others.
Efficiency is part of the same story. AdaptOrch reports 41.8K tokens per SWE-bench instance. A leading static multi-agent alternative (MoA-3L) consumes 84.6K, more than double. Topology-aware routing removes redundant calls and cuts cost while improving outcomes.
AdaptOrch · arXiv:2602.16873 · SWE-bench Verified
Topology-aware routing cuts token cost by up to 51%.
Topology-aware routing avoids redundant model calls - fewer tokens, same or better performance.
-51%
vs. MoA-3L (84.6K to 41.8K tokens)
-32%
vs. LLM-Blender (61.7K to 41.8K tokens)
+22.9%
SWE-bench gain while spending fewer tokens.
Source: AdaptOrch: Task-Adaptive Multi-Agent Orchestration in the Era of LLM Performance Convergence (arXiv:2602.16873, Feb 2026). Token counts per SWE-bench Verified instance.
The Scale at Which This Already Applies
Anthropic's 2026 Agentic Coding Trends Report places this orchestration result in enterprise context.
Reported adoption numbers are already at production scale: Zapier at 97% agent adoption, Rakuten running autonomous agents on a 12.5 million-line codebase with 99.9% reported accuracy in 7 hours, and TELUS reporting 500,000 engineering hours saved. Across organizations, development cycles are reported as 30-79% faster.
That scale creates pressure. Developers integrate AI into roughly 60% of work while still maintaining oversight on 80-100% of delegated tasks. Oversight load remains high even as generated volume increases, and reviewer bandwidth does not scale proportionally.
The report flags "AI-automated review systems" as a strategic priority for 2026 because manual review at 60%+ AI volume and near-full adoption is not sustainable.
Put together, AdaptOrch and Anthropic describe the same gap from opposite directions: orchestration is now the main performance variable, and the quality/oversight burden is scaling faster than manual controls.
DryRun Security · CVE-2026-21852 · SecurityScorecard STRIKE
All three layers of the agentic security stack are now compromised.
Three independent findings this week document failures at every layer of agentic coding infrastructure.
Infrastructure
Deployment and exposure
OpenClaw - 135,000 exposed instances
SecurityScorecard STRIKE reports 135,000 agentic coding instances exposed across 82 countries, including 15,000+ directly vulnerable to remote code execution.
Tool Layer
Files, config, inputs
CVE-2026-21852 - RCE via project config files
Agents that trust and execute unvalidated project configuration files create a direct path from file access to code execution.
Application
Auth, logic, patterns
DryRun - auth consistency failures across agents
Security patterns applied correctly in one PR are violated in another. Cross-PR authentication inconsistency currently goes undetected and compounds.
Patching one layer is not a security strategy. A team that addresses a single vector but ignores application consistency and deployment exposure still leaves documented failure paths open.
Sources: DryRun Security (March 2026), CVE-2026-21852 (March 2026), SecurityScorecard STRIKE OpenClaw exposure analysis (March 2026).
The Three-Layer Security Problem Has Fully Materialized
Security findings this week map to three distinct layers of the agentic coding stack.
SecurityScorecard's STRIKE team reported on OpenClaw: 135,000 publicly exposed instances across 82 countries, with 15,000+ directly vulnerable to remote code execution. In parallel, CVE-2026-21852 documented RCE through Claude Code project configuration files, while DryRun Security reported authentication consistency failures across major agents tested.
These findings map cleanly:
- Application layer: inconsistency in auth and security decisions across PRs. - Tool layer: unsafe trust boundaries for config and file-based inputs. - Deployment layer: exposed execution environments with weak network controls.
Fixing one layer without the other two still leaves active failure paths open.
What Engineers and Engineering Leaders Should Take From This
The AdaptOrch result is operational: if you run multi-agent coding workflows, design orchestration topology deliberately rather than defaulting to one pattern.
Parallel/hybrid topologies suit independent coding work. Sequential/hierarchical patterns suit reasoning-heavy dependency chains. Synthesis protocols should merge parallel outputs without redundant loops. These are architecture decisions, not tuning details.
The Anthropic adoption data is a calibration signal, not a future forecast. Deployment at scale is already here, and with it come documented quality debt and oversight constraints.
The security picture is now a checklist: application consistency enforcement, tool input validation, and deployment boundary controls are all necessary. None is sufficient alone.
The Bigger Picture
The week ending March 14, 2026 produced a coherent set of empirical signals:
- SWE-CI: 75% of agents break working code over time. - SWE-EVO: long-horizon performance is materially lower than benchmark expectations. - MSR 2026: cognitive complexity growth is persistent and compounding. - AdaptOrch: orchestration topology outperforms model selection by 12-23%. - Anthropic's trends report: enterprise deployment is already widespread. - OpenClaw: large numbers of RCE-vulnerable instances are live in production.
Taken together, these findings describe a transition moment. Agentic coding has moved beyond experimentation into enterprise deployment, while quality, security, and oversight infrastructure is still catching up.
The infrastructure gap is now quantified. The teams that close it with topology-aware orchestration and automated architectural constraints will scale safely. The teams that do not will scale output and debt at the same time.