AI Intelligence

Frontier AI Models Rewrite Documents, Masking Errors That Slip Past Review

By Julian Reed • Published: May 14, 2026 • 1 MIN READ

1 Min Read

Why frontier AI models rewrite more than they delete

As large language models gain prowess, firms hand over document‑heavy tasks to them, assuming fidelity. Microsoft researchers proved otherwise: in a 20‑step simulation across 52 professions, frontier AI models altered roughly ↓ 25% of the original text and drove overall decay to ↓ 50% by the final round. The DELEGATE‑52 benchmark mimics real‑world pipelines, pairing each edit with an exact inverse to catch drift without human references.

“Models aren’t aware they’re in a test; they simply try each instruction,” says Philippe Laban of Microsoft Research.

The study examined 19 systems from OpenAI, Anthropic, Google, Mistral, xAI and Moonshot. Only Python‑centric tasks reached readiness scores above 98%; everything else suffered silent hallucinations or subtle rewrites that evade casual review. Adding generic agentic tools – file read/write and code execution – increased corruption by roughly 6 %. The presence of 8‑12 KB distractor files further amplified errors, a caution for enterprises leaning on retrieval‑augmented generation. Reuters recently flagged similar risks in AI‑driven finance workflows. Laban advises incremental human checks after each AI step and the development of narrowly scoped utilities to keep agents on target.

Words by: Julian Reed

Consumer Electronics Expert

Must Read Intel Read the full classified report →

Related Intel: Claude Code Triples Engineer Output, Sparking a Surge in Product Thinkers

Analysis By Julian Reed

Senior Intel Analyst & Contributing Editor. Focused on deep-tier geopolitical and market strategies.