Vibe Coding Legacy Rewrites

Written by DeeDee Walsh | May 28, 2026 4:46:50 PM

Aikido's research on AI-generated code vulnerabilities is one of the most important conversations at Microsoft Build 2026.

At Microsoft Build 2026, Aikido is presenting one of the more important security talks of the week: Stop vulnerabilities in AI-generated code before they ship. Their research is sobering. So is everyone else's.

Veracode scanned roughly 4 million code samples and found AI-generated code contained security flaws 45% of the time. The Cloud Security Alliance put the number at 62%. Georgia Tech's Vibe Security Radar flagged confirmed issues in over 2,000 of the 5,600 vibe-coded apps it scanned. CodeRabbit found AI-generated code is 2.74x more likely to introduce XSS vulnerabilities than human-written code.

The data is converging on a clear picture: AI coding assistants produce insecure code at roughly twice the rate of humans, and developers ship it because the friction that used to catch these issues (code review, testing, dependency auditing) gets compressed out of the workflow when the loop is prompt, accept, ship.

This is legit. It is also, almost entirely, a story about greenfield apps.

Read the studies carefully and the applications being analyzed are Lovable dashboards, Cursor-built side projects, Copilot-assisted weekend builds. The rapid prototyping use case where speed is the entire point. The vulnerabilities are real. The blast radius is, for most of these apps, bounded. A vibe-coded side project leaking an API key is bad. It is recoverable.

Now imagine the same statistical pattern applied to your 25-year-old VB6 line-of-business application.

The worse version of the problem

When a developer prompts Cursor to scaffold a new app, the worst case is they ship an insecure product into a small user base, get embarrassed, and rebuild it. When a developer prompts an LLM to rewrite a legacy claims engine, ERP module, or trading system into .NET or Java, the worst case is way different:

The application holds real customer data, financial transactions, and regulated workflows including PII, PHI, payment card data.
It is already in production, already trusted by the business, already integrated into a dozen downstream systems.
The original codebase encodes decades of accumulated business logic (quirks, edge cases, undocumented rules) that exist nowhere except in the source code itself.
Nobody is going to rebuild it again if the rewrite fails. The team will ship what got produced, paper over the gaps, and discover the security problems in production.

Over the past 24 months, we've watched a bunch of modernization pipeline disappear to teams who decided to try the rewrite themselves with Copilot, Claude, or a homegrown prompt chain. In practice, we watched the LLM hit what we call the 70% wall: it produced something that looked right, ran in dev, passed surface-level tests, and then quietly broke in ways no one could trace back to the prompt.

Why legacy rewrites are structurally harder than greenfield generation

The vibe coding security research focuses on a real failure mode: LLMs don't know what secure code looks like, so they reproduce insecure patterns from their training data. That's true. It's also only half the story.

Legacy rewriting has a second, deeper failure mode that doesn't apply to greenfield work: the model has no way to verify functional equivalence to the source.

When you prompt Cursor to build a new login page, correct is whatever passes the spec in your head. When you prompt an LLM to translate a VB6 payment processing module to C#, correct is whatever does exactly what the original module did; including the undocumented rule that rejects transactions with certain prefixes during certain hours because of a 2009 fraud incident nobody currently remembers.

The LLM produces plausible C#. It can't tell you the new C# does what the old VB6 did. For a payments system or a claims engine, plausible isn't a category that exists.

The security implications follow directly. A rewrite that subtly changes authorization logic, drops a validation check, or reorders an audit step doesn't just introduce a CVE. It introduces a compliance violation the legal team will be discovering in deposition two years later. The vulnerability surface in a vibe-rewritten legacy app is the silent delta between what the old app enforced and what the new app forgot to.

What doing this correctly actually requires

Aikido's session blueprint for vibe coding security including guardrails, supply chain scanning, runtime verification is necessary but not sufficient when the input is a legacy codebase. Legacy modernization also requires:

Deterministic equivalence verification, not LLM self-review. The new code must be tested against the old code's actual behavior, not against the model's understanding of its behavior. The generator's confidence in its own output is the worst possible quality signal.
Checkpoint architecture between stages. Discovery, architecture, translation, and QA need to be separate gated steps with explicit verification at each; not one prompt that produces a finished application.
Specialized agents for verification, distinct from the agents doing the generation. A pipeline that uses the same generalist model end-to-end is one pipeline, not four.
Treatment of the original source as ground truth. Documentation, tribal knowledge, and developer memory are all lossy. The source code is the specification.

This is the architecture we built VELO around: Scout, Architect, Translation, and Quality/QA agents running on Microsoft Foundry, with checkpoints between every stage. But the architecture matters more than the product name. If you're modernizing legacy code with an LLM, the question is whether your pipeline has these properties, not whose logo is on it.

Build 2026 and vibe coding

Aikido's talk will be one of the most-attended security sessions at Build, and it should be. The vibe coding security problem is real and the data is unambiguous.

But the industry needs to stop treating "vibe coding a new app" and "vibe rewriting a 30-year-old line-of-business system" as the same problem. They share a failure pattern. They don't share a blast radius. And the security guidance that works for one won't save you from the other.

If you're responsible for a legacy estate and someone on your team is pitching a DIY LLM rewrite, get them to Aikido's session first. Then have the conversation about what an LLM can and can't verify when the ground truth is a quarter-century of production code.

View full post