Resources

Research & Insight

Analysis on agentic coding, AI development platforms, and software architecture — written for engineering leaders.

AI CodingArchitectureMulti-Agent SystemsSecurity

Your Agent Will Override Its Own Security Instructions. Two Papers Prove It.

Two arXiv papers document a new failure class: agents measurably override their own system prompt instructions when trained values conflict with explicit rules. Better instructions do not fix it. External governance architecture does.

March 15, 202610 min read5 sources
AI CodingArchitectureMulti-Agent SystemsSecurity

The Orchestration Layer Is Now the Most Important Decision in Your Stack

AdaptOrch shows a structural turning point: orchestration topology now drives larger performance gains than model choice, while enterprise-scale adoption and new security findings make architecture-level controls urgent.

March 14, 202611 min read6 sources
AI CodingArchitectureEngineering Leadership

Your AI Coding Agents Are Accumulating Complexity Debt. The Data Is Now Causal.

A new MSR 2026 causal study across 401 repositories shows autonomous coding agents drive complexity debt even when velocity gains fade. The data points to a structural issue: quality degradation is compounding and requires architectural enforcement, not better prompts.

March 13, 202610 min read1 sources
AI CodingBenchmarksArchitecture

75% of AI Coding Agents Break Working Code Over Time. Here Is Why That Number Matters.

Three papers published yesterday measure the same underlying failure: SWE-CI finds 75% of agents break working code over time; SWE-EVO shows a 44-point gap between short-horizon and long-horizon tasks; ETH Zurich proves AGENTS.md files make things worse. The fix is infrastructure, not instructions.

March 13, 202611 min read9 sources
AI CodingMulti-Agent SystemsArchitecture

Anthropic Just Shipped Multi-Agent Code Review — Here's Why It's Not Enough

Claude Code Review deploys multiple AI agents to analyze every pull request. The 54% substantive comment rate is a genuine improvement. But PR-level behavioral verification cannot catch the failure mode that is actually scaling with AI adoption: architectural drift across multi-agent, multi-PR workflows.

March 11, 20268 min read6 sources
BenchmarksAI CodingArchitecture

The Benchmark Illusion Is Collapsing

SWE-Bench Verified has been effectively deprecated. The research now points to a harder truth: the real problem in AI software development is not model capability — it is architectural scaffolding, repository reasoning, and system-level verification.

March 5, 20269 min read13 sources
AI CodingMulti-Agent SystemsEngineering Leadership

43% of MCP Servers Are Vulnerable — The AI Agent Security Crisis Nobody Is Talking About

43% of Model Context Protocol server implementations are vulnerable to command execution attacks. MCP is being adopted as an industry standard by OpenAI and Microsoft. The attack surface for enterprise agent deployments is not a single point of failure — it is a mesh of interconnected vulnerabilities.

March 11, 20268 min read7 sources
AI CodingArchitectureBenchmarks

DeepSeek V4 Has a 1 Million Token Context Window — What Does That Actually Change?

DeepSeek V4's 1M token context window means an entire enterprise codebase can fit in context with no retrieval at all. That removes one problem and makes a different one much more visible: the model can see all the code. It still has no idea what the architecture is supposed to be.

March 11, 20269 min read7 sources
AI CodingArchitectureSecurity

No AI Coding Agent Builds Secure Applications. Here's What the Data Actually Shows.

DryRun Security's study found that no agent — Claude, Codex, or Gemini — produced a fully secure application. The real story is a structural failure: agents implement security locally but miss it globally, revealing what's missing from the autonomous agent stack.

March 12, 202612 min read8 sources