Kevin Riedl

13 min read · 16 Jun 2026

Technical Due Diligence Checklist for AI MVPs Before Funding

Technical due diligence on an AI MVP examines the same layers as any software review (code, infrastructure, security, team) plus a set of AI-specific checks a generalist misses: do you have an evaluation set and regression evals, are prompts and models versioned, do you log every model call, what happens when the model fails, what does an inference actually cost, and do you have the rights to the data you train or retrieve on. The single thing that separates a fundable AI MVP from a demo is evidence. Investors increasingly treat a private, versioned eval suite as the proof your AI works. "We test it by hand" fails that bar. This is the checklist to run on yourself before they run it on you.

This is an engineering view aimed at founders, with the investor's questions made explicit. Regulatory dates are current as of mid-2026; one in particular is a trap if you plan around a delay that has not happened, flagged below.

Want an independent technical DD pass before your raise?

 Book Free Consultation

Why evidence, not a demo

Two independent findings set the bar. A Stanford study of purpose-built legal AI tools, the kind sold as accurate, still measured hallucination on more than 17 percent of benchmark queries for some products and more than 34 percent for others. And an MIT-affiliated report widely cited in 2025 found that around 95 percent of enterprise generative-AI pilots delivered no measurable bottom-line impact. The lesson for a founder raising money is blunt: a working demo proves almost nothing, and the investor knows it. What moves a round is measured evidence that your system works, does not regress, and is economically and legally sound at scale.

The AI-specific checks a generalist misses

This is the core of the post and the part a generic software review skips. For each: what to check, why it matters, and the red flag.

  1. An evaluation set. A versioned golden dataset plus a scoring rubric. Unit tests tell you green or red; they cannot tell you whether an answer was correct or faithful. Red flag: "we eyeball outputs," no golden set, no numbers.
  2. Regression evals as a CI gate. The eval suite runs on every prompt or model change before deploy. The same prompt gives different output when the model version or input shifts, and a fix for one case silently breaks another. Red flag: prompt changes ship straight to production.
  3. Model-call observability. Tracing of every model call, with token and cost accounting and the prompt and response captured. You cannot debug a bad answer you cannot reconstruct. Red flag: "we use the provider dashboard" as the whole story.
  4. Prompt and model versioning. Prompts are versioned artifacts and the model is pinned, not called as "latest" which auto-upgrades under you. Red flag: prompts hardcoded inline, model aliased to latest.
  5. A fallback when the model fails. Retries, a secondary model or provider, graceful degradation. Your uptime is now bounded by a third-party API. Red flag: one provider, one model, no timeout or degraded path, so one vendor outage is a full outage.
  6. Unit economics per inference. Cost modeled per call, then per action, then into gross margin. Agentic flows fan one action into hundreds of calls. Red flag: no cost-per-action metric and a margin assumed to be "SaaS-like."
  7. Rights to the training and retrieval data. Documented provenance and a license or permission per source. The question is no longer "is it fair use" but "can you prove where every datum came from and that it was lawfully obtained." Red flag: scraped data of unknown origin, a RAG corpus with no usage rights.
  8. A measured hallucination rate plus guardrails. An error rate on a domain benchmark, plus retrieval grounding and output validation. Red flag: no measured rate and "RAG fixes hallucinations" stated as if solved.
  9. Model choice and lock-in. A rationale for proprietary API versus open weights, and an abstraction layer that lets you swap providers. Red flag: hard-coupled to one provider's SDK with economics that only work at today's subsidized price.

The handover artifacts a fundable AI MVP has ready

If these exist, diligence is fast and your valuation holds. If they live only in a founder's head, every gap becomes a discount.

ArtifactWhy diligence caresRed flag if missing
Architecture diagram (dated, names external deps)Tests whether it handles 10x and reveals key-person riskArchitecture lives only in a founder's head
Data-flow map (follows the data, not the services)Shows which third parties touch what data; GDPR exposureUnknown privacy exposure the investor inherits
Eval reports (versioned harness, results per model and prompt)How a claimed AI moat is verified instead of taken on faithNo objective evidence the model works or will not regress
Model and prompt registryReproducibility and rollback of any outputProduction behavior cannot be reproduced
Runbook and incident responseLowers key-person dependency, base compliance evidenceUnmeasured downtime risk
SBOM (SPDX or CycloneDX, regenerated in CI)Surfaces copyleft contamination and unpatched CVEsUnknown license and vulnerability exposure
IP chain of title (founder and contractor assignments)The classic deal-killer; paying an invoice does not transfer IPA departed contributor who never assigned a core module
Security report (recent pen test, SOC 2 or ISO 27001 if applicable)Baseline in 2026, and it unblocks enterprise salesUnknown breach exposure

Data, privacy, and provenance

For an EU AI MVP this is where deals get repriced. Diligence checks your record of processing activities (GDPR Article 30), a lawful basis for training on personal data (Articles 6 and 9, with a legitimate-interest assessment on file), a data protection impact assessment before high-risk processing (Article 35), and data processing agreements with sub-processors. Note one thing founders miss: a model API that ingests your users' prompts is a sub-processor, so it needs a DPA and a no-training, zero-retention configuration, not consumer terms. The EDPB's Opinion 28/2024 also warns that a model trained on personal data is not automatically anonymous, so unlawful training data can taint the deployed product. On the EU AI Act, the live binding date for most high-risk and transparency obligations is 2 August 2026. A proposal to delay it was circulating in 2026 but is not enacted, and a compliance plan that banks on the delay is itself a red flag.

What investors actually flag

From the investor and acquirer side, and these sources are interested parties so weigh them as such, the recurring flags are: a thin wrapper on a single model with no workflow depth; a weak moat (the durable ones now are proprietary or permissioned data, integrations, and persistent context, not the base model); gross margin after inference cost, since inference is a real variable cost that breaks the SaaS-margin assumption; fragile retention when switching costs are low; and, increasingly, the absence of private continuous evals. On acquisition specifically, expect retention covenants on key AI engineers and indemnities tied to data-provenance representations. The vibe-coded angle of this, security, IP ownership, and what an acquirer checks in AI-built code, is its own checklist in our Lovable, Bolt, and Replit due diligence post, and the eval discipline that underpins item one and two is in when LLM evals are worth building.

Kevin Riedl

"A demo proves you can get a good answer once. An eval set proves you get good answers consistently and will notice when you stop. Investors stopped being impressed by the first and started asking for the second. That shift is the whole game in AI due diligence."

Frequently Asked Questions

What is technical due diligence for an AI startup?
An investor or acquirer review of the code, infrastructure, AI systems, data flows, and team behind an AI product, verifying it works, scales, is legally clean, and is not a single-person or single-vendor liability. For AI it adds eval evidence, model and prompt versioning, inference economics, and data-rights checks that generalist software diligence skips.
What do investors check in an AI MVP?
Whether it is more than a thin wrapper on one model API, its defensibility through data or workflow depth, gross margin after inference cost, retention, and increasingly private eval results that prove production quality rather than a demo.
What eval evidence do I need before a raise?
A versioned golden and regression dataset, scored results per model and prompt version, a CI gate that blocks regressions, and a measured error or hallucination rate on a domain-representative benchmark. "We test manually" fails this bar.
How is AI due diligence different from normal software due diligence?
Normal diligence asks whether the code is good and scales. AI diligence adds whether you can reproduce any model output, whether you log and observe model calls, what an inference costs, what happens when the model fails, and whether you have the rights to the data you train or retrieve on.
Do I need an SBOM for due diligence?
Increasingly yes. A current SBOM in SPDX or CycloneDX surfaces open-source license conflicts and known vulnerabilities, and both M&A buyers and the EU Cyber Resilience Act now expect machine-readable SBOMs.
What is IP chain of title and why does it kill deals?
The documented proof that the company owns all of its IP. Copyright defaults to the author, so paying a contractor's invoice does not transfer IP. An unassigned co-founder or contractor module is a classic reason startups fail diligence before Series A.
How does GDPR affect AI due diligence in the EU?
Diligence checks your record of processing (Article 30), a lawful basis for training on personal data (Articles 6 and 9), a DPIA for high-risk processing (Article 35), and DPAs with sub-processors, including the model API that ingests your users' prompts. EDPB Opinion 28/2024 warns trained models are not automatically anonymous.
Does the EU AI Act apply to my MVP yet?
Partly. Prohibited practices and AI-literacy duties have applied since February 2025, GPAI obligations since August 2025, and most high-risk and transparency duties from 2 August 2026. A proposed delay exists but is not law, so do not plan around it.
Austria technical due diligence, is anything different?
The substance is EU-standard, GDPR and the AI Act. The Austrian specifics are an actively enforcing data protection authority that gives AI no carve-out, and aws or FFG public-funding covenants on your cap table that a later investor will want clean.
How do I prove my AI product is not just a GPT wrapper?
Show workflow depth and switching cost through integrations, proprietary or permissioned data, and persistent context, an abstraction layer that lets you swap providers, and unit economics that survive at a non-subsidized model price.

Final thoughts

Technical due diligence on an AI MVP is not a generic code review with the word AI added. The layers that decide your round are the AI-specific ones: evals that prove the thing works and will not regress, versioning that makes any output reproducible, honest inference economics, and clean rights to your data.

The good news is that all of it is cheaper to fix before diligence than to explain during it. Build the eval set, pin the models, log the calls, get the IP chain of title signed, and have the artifacts ready in a folder. Do that and diligence becomes a formality. Skip it and every gap turns into a discount on your valuation.

Want the eval set and artifacts in place before you raise?

 Book Free Consultation
Kevin Riedl

13 min read · 16 Jun 2026