Resources
Research & Insight
Analysis on agentic coding, AI development platforms, and software architecture — written for engineering leaders.
Your Agent Will Override Its Own Security Instructions. Two Papers Prove It.
Two arXiv papers document a new failure class: agents measurably override their own system prompt instructions when trained values conflict with explicit rules. Better instructions do not fix it. External governance architecture does.
The Orchestration Layer Is Now the Most Important Decision in Your Stack
AdaptOrch shows a structural turning point: orchestration topology now drives larger performance gains than model choice, while enterprise-scale adoption and new security findings make architecture-level controls urgent.
Your AI Coding Agents Are Accumulating Complexity Debt. The Data Is Now Causal.
A new MSR 2026 causal study across 401 repositories shows autonomous coding agents drive complexity debt even when velocity gains fade. The data points to a structural issue: quality degradation is compounding and requires architectural enforcement, not better prompts.
75% of AI Coding Agents Break Working Code Over Time. Here Is Why That Number Matters.
Three papers published yesterday measure the same underlying failure: SWE-CI finds 75% of agents break working code over time; SWE-EVO shows a 44-point gap between short-horizon and long-horizon tasks; ETH Zurich proves AGENTS.md files make things worse. The fix is infrastructure, not instructions.
Anthropic Just Shipped Multi-Agent Code Review — Here's Why It's Not Enough
Claude Code Review deploys multiple AI agents to analyze every pull request. The 54% substantive comment rate is a genuine improvement. But PR-level behavioral verification cannot catch the failure mode that is actually scaling with AI adoption: architectural drift across multi-agent, multi-PR workflows.
The Benchmark Illusion Is Collapsing
SWE-Bench Verified has been effectively deprecated. The research now points to a harder truth: the real problem in AI software development is not model capability — it is architectural scaffolding, repository reasoning, and system-level verification.
43% of MCP Servers Are Vulnerable — The AI Agent Security Crisis Nobody Is Talking About
43% of Model Context Protocol server implementations are vulnerable to command execution attacks. MCP is being adopted as an industry standard by OpenAI and Microsoft. The attack surface for enterprise agent deployments is not a single point of failure — it is a mesh of interconnected vulnerabilities.
DeepSeek V4 Has a 1 Million Token Context Window — What Does That Actually Change?
DeepSeek V4's 1M token context window means an entire enterprise codebase can fit in context with no retrieval at all. That removes one problem and makes a different one much more visible: the model can see all the code. It still has no idea what the architecture is supposed to be.
No AI Coding Agent Builds Secure Applications. Here's What the Data Actually Shows.
DryRun Security's study found that no agent — Claude, Codex, or Gemini — produced a fully secure application. The real story is a structural failure: agents implement security locally but miss it globally, revealing what's missing from the autonomous agent stack.