An AI prototype from Lovable, Cursor, Claude Code, or Replit gets you to a working demo in a weekend. It does not get you to production. The gap between "it works on my screen" and "it survives real users, real load, and a security review" is where AI-generated code quietly fails. This is the QA process we run on AI-assisted builds before they go live, and the failure modes we see most.
None of this is an argument against building with AI. We build with it too. It is an argument for testing the output the same way you would test any code that is about to touch real money, real data, and real users.
Shipped an AI prototype?
Book a Production-Readiness ReviewAI coding tools optimise for one thing: producing something that runs and matches the prompt. They do not optimise for the things that decide whether software survives contact with users. The model has no view of your threat model, your data volumes, your edge cases, or your compliance obligations. It writes the happy path well and skips almost everything else, because nobody asked.
The result is code that demos cleanly and breaks predictably. The breakages are not random. They cluster in the same places every time, which is what makes them testable.
Here is the list we work through on every AI-assisted build, ordered by how often it bites.

"AI does not write insecure code on purpose. It writes the code you asked for and nothing you forgot to ask for. Production is the sum of everything you forgot to ask for."
This is the structure of a Wavect review. You can run a first pass yourself before you call anyone.
This is the core of our software QA service. The deliverable is not a PDF of complaints. It is a fixed, tested codebase and the test suite that keeps it fixed.
Partly. An AI tool will happily add a validation check or wrap a call in error handling once you point at the spot. What it cannot do is decide where to look. It has no model of your technical debt, no memory of the order in which things were built, and no instinct for the edge case a real user will hit on day two. Finding the gaps is human work. Closing them, increasingly, is shared work. That split is exactly how we run these engagements.
For a typical vibe-coded MVP, a focused review and hardening pass runs one to three weeks. The variance is driven by two things: how much real money or sensitive data the product touches, and how far the AI ran without supervision. A weekend prototype that handles payments and personal data needs more than a weekend of QA. A read-only internal tool needs far less. We scope it after a first look, not before.
Rarely, but it happens. If the data model is fundamentally wrong, or the same broken pattern is copied across a hundred files, rebuilding the core is cheaper than patching it. We will tell you that on the first call rather than bill you for a month of patching a foundation that needs to be poured again. Honesty here is cheaper for everyone.
AI-generated code is not worse code. It is unreviewed code. The prototype that took a weekend skipped the same weeks of hardening that every production system needs, and the bill for those weeks does not disappear because a model wrote the first draft. It just moves to launch day, when it is most expensive.
Run the checklist before you put real users in front of an AI-assisted build. If the authorization, input, and failure-path sections make you nervous, that is the signal to get a second set of eyes on it before launch, not after the incident.
Shipped an AI prototype?
Book a Production-Readiness Review