Back
Kevin Riedl

11 min read · 5 Jul 2026

Next

Fully Homomorphic Encryption in 2026: What Ships and What Is Still Hype

Fully homomorphic encryption has the strangest reputation in applied cryptography: simultaneously "the holy grail" and "forever ten years away". Both reputations are now wrong. FHE runs today on hundreds of millions of iPhones, checks passwords in Microsoft Edge, and settles encrypted transactions on Ethereum. It is also still three to four orders of magnitude slower than plaintext, which rules out most of what people imagine using it for. This post draws the line precisely: what FHE does in production in 2026, what the honest numbers are, and which claims to discount. The companion pragmatic guide covers when FHE is the right tool at all.

Engineering perspective, not a vendor pitch. Where a number comes from a vendor or is a roadmap projection rather than a shipped result, we label it. Reference points come from Wavect's frontier-tech and AI work.

Weighing FHE against a TEE for your product?

 Book Free Consultation

What is FHE, and why did it suddenly get interesting?

FHE lets a server compute on encrypted data without ever decrypting it. The client encrypts the input, the server runs the computation blind, and only the client can decrypt the result. The server learns nothing, not even the answer. That is a categorically stronger promise than encryption at rest, and stronger than a trusted execution environment, because there is no hardware vendor in the trust model.

It got interesting for two reasons. First, performance: a TFHE bootstrapping operation, the basic unit of unlimited encrypted computation, fell from tens of milliseconds on a CPU to under a millisecond on an NVIDIA H100. Second, credibility: Apple shipped it at consumer scale, and Zama became the field's first unicorn with a 57 million dollar Series B at a valuation above 1 billion dollars in June 2025 (Zama). Money and production deployments changed the conversation.

Which scheme for which job?

"FHE" is a family, and picking the wrong member is a common first mistake:

SchemeData modelStrengthTypical use
TFHE (CGGI)Bits and small integersFast programmable bootstrapping, arbitrary logic via lookup tablesComparisons, branching logic, encrypted smart contracts
CKKSApproximate real numbers, SIMD-packedBest amortized throughput for numeric workloadsMachine learning inference, statistics, analytics
BGV / BFVExact integers, SIMD-packedExact arithmetic at scalePrivate lookups (PIR), exact analytics, counting

Rule of thumb: logic and comparisons want TFHE, ML wants CKKS, exact lookups want BFV. A good introduction to the schemes is the 2025 "Beginner's Textbook for Fully Homomorphic Encryption" (arXiv 2503.05136). Modern stacks increasingly switch schemes mid-computation, which is exactly what compiler projects like Google's HEIR are for.

What actually ships in production?

The production list is short, real, and instructive:

  • Apple Live Caller ID Lookup (iOS 18+). Your iPhone checks an unknown caller against a caller-ID provider's database without revealing the phone number to the server, using BFV-based private information retrieval. Apple open-sourced the stack as swift-homomorphic-encryption (Swift.org, 2024). Together with Enhanced Visual Search below, this is likely the largest consumer FHE footprint in existence.
  • Apple Enhanced Visual Search. Photos matches landmarks in your pictures against a server index using FHE plus differential privacy. Cryptographically excellent, and still a consent case study: Apple enabled it by default without asking, and took a justified public backlash in January 2025 (The Register). Privacy tech does not excuse skipping the opt-in.
  • Microsoft Edge Password Monitor. Checks your credentials against breach corpora homomorphically, so Microsoft never sees the password. Same architectural shape as Apple's deployment: a private set lookup.
  • Zama Protocol on Ethereum. Mainnet since December 2025, enabling encrypted token balances and confidential transfers on public chains via TFHE (Zama docs). Throughput is in the tens of transactions per second today; the published roadmap toward thousands via FPGAs and ASICs is a projection, not a shipped result.
  • Enterprise data collaboration. Duality Technologies runs homomorphic and federated analytics with healthcare partners including Dana-Farber, typically combining HE with federated learning rather than running everything under FHE.

Notice what every consumer deployment has in common: it is a private lookup against a server dataset, known in the literature as private information retrieval. Small query, bounded computation, asynchronous-tolerant latency. That is the pattern that ships. Nobody is running their backend under FHE, including the companies with the most money on earth.

Kevin Riedl

"Every FHE deployment that actually shipped is a narrow private lookup. The teams that fail are the ones that try to encrypt the whole backend."

How slow is FHE, really?

The honest numbers, as of mid-2026:

  • General overhead: roughly 1,000x to 10,000x versus the same computation in plaintext, depending on scheme and workload. Additions are cheap, multiplications and comparisons are expensive.
  • TFHE bootstrapping: single-digit milliseconds on a modern CPU core, under a millisecond on an H100-class GPU, with reports of around 189,000 bootstraps per second across an 8-GPU node (vendor figure).
  • Small-model ML inference: logistic regression, decision trees, and small neural networks run in seconds under CKKS or via Zama's Concrete ML, which converts quantized models and keeps accuracy within a few points of plaintext for 4-bit quantization (Hugging Face / Zama).
  • PIR at scale: practical today. Apple answers encrypted lookups for a large fraction of the world's iPhones with acceptable latency and server cost.

Plan with the 1,000x rule: if the plaintext computation takes a microsecond, the encrypted version takes a millisecond and is probably viable. If the plaintext version takes a second, the encrypted version takes upwards of 15 minutes and is not a product.

Can you run an LLM under FHE?

No, not interactively, and this section exists because the numbers most often quoted to prove otherwise are wrong in an instructive way. The widely circulated figure of "8.2 minutes per token for GPT-2 with 25.3 GB of communication" comes from secure two-party computation research, not FHE (arXiv 2410.13060). The gigabytes-of-network-traffic tell is the giveaway: MPC burns bandwidth between parties, while FHE burns local compute with almost no communication. Conflating the two is the most common technical error in content about private AI.

The actual FHE picture: GPU-accelerated research runs a GPT-2 class forward pass roughly 200x faster than CPU baselines (ICML 2025), which still leaves it far from interactive chat. Hybrid schemes (running attention layers in the clear and sensitive layers encrypted) trade privacy for speed and remain research. What works in production terms is small-model inference on genuinely sensitive data: credit scoring, medical pre-screening, fraud signals, where a few seconds of latency on a bounded model is acceptable. If you need private frontier-model inference today, the pragmatic answer is a confidential GPU (NVIDIA H100-class TEE), and we compare those trust models in the decision framework post.

Will hardware fix the overhead?

Partially, on a believable timeline:

  • Shipping today: GPUs. The under-a-millisecond bootstrap numbers above are real and reproducible today, while Zama's mainnet still runs its coprocessors on CPUs at tens of transactions per second, with the GPU migration on the 2026 roadmap. GPU acceleration delivers one to two orders of magnitude and is the only acceleration you can buy right now.
  • 2027 and later: ASICs and exotics. DARPA's DPRIVE program seeded a wave of dedicated FHE silicon (Niobium, Duality's TREBUCHET lineage, Intel's HERACLES work), and startups raised heavily: Fabric Cryptography took a 33 million dollar Series A for a cryptographic VPU, Optalysys raised roughly 30 million dollars for photonic FHE in early 2026, Cornami builds many-core FHE accelerators. The frequently quoted 5,000x to 17,000x speedups from this generation are simulation results and projections, not shipped silicon. Architect for GPUs now; treat ASIC timelines as upside, not plan.

Which library should you start with?

LibraryScheme focusLanguagePick it when
TFHE-rs / Concrete (Zama)TFHERust, PythonEncrypted logic and integers; the de-facto TFHE standard with the largest community
Concrete ML (Zama)TFHEPython, scikit-learn-like APIPrivate ML inference on small models without cryptography expertise
OpenFHE (Duality and academic consortium)All major schemesC++CKKS/BGV workloads, research-grade flexibility, enterprise analytics
swift-homomorphic-encryption (Apple)BFVSwiftPIR-style private lookups, especially in Apple ecosystems
Lattigo (Tune Insight)CKKS, BGV, multipartyGoGo shops and multiparty-HE setups
Microsoft SEALBFV, CKKSC++Existing integrations; maintenance resumed in 2026 after a quiet stretch, but the ecosystem momentum sits with TFHE-rs and OpenFHE
HEIR (Google)Compiler across schemesMLIR-basedCompiling high-level code to FHE backends; the likeliest long-term abstraction layer (heir.dev)

Default picks: TFHE-rs for logic, Concrete ML for small private ML, OpenFHE for CKKS analytics, Apple's library for PIR. All are open source; the cost sits in parameter selection, noise budgeting, and performance engineering, which is where an experienced partner earns their fee.

Frequently Asked Questions

Is fully homomorphic encryption practical in 2026?
Yes, for narrow workloads: private lookups (the Apple and Microsoft pattern), small-model ML inference, and encrypted logic at tens of transactions per second. No, for general-purpose or interactive computation, where the 1,000x to 10,000x overhead still rules it out. The scoping decision is the whole game.
What is the difference between FHE and a TEE like Intel TDX or a confidential GPU?
A TEE runs plaintext computation inside hardware isolation at near-native speed, but you trust the chip vendor and the absence of side-channel attacks. FHE removes that hardware trust entirely at a cost of three to four orders of magnitude in performance. Most products that need confidential compute at scale today choose a TEE; FHE wins where no hardware trust root is acceptable.
Can FHE run ChatGPT-style models privately?
Not interactively in 2026. GPU-accelerated research has sped up encrypted GPT-2 class inference dramatically, but frontier-scale encrypted inference remains far from real-time. The often-quoted minutes-per-token figures with gigabytes of traffic actually describe MPC systems, not FHE. For private LLM inference today, confidential GPUs are the pragmatic option.
What is TFHE versus CKKS?
TFHE computes on bits and small integers with fast bootstrapping, making it ideal for comparisons, branching, and exact logic. CKKS computes on approximate real numbers with heavy SIMD packing, making it the scheme of choice for machine learning and statistics. Serious applications often combine both via scheme switching.
Who are the main FHE vendors and libraries?
Zama (TFHE-rs, Concrete, Concrete ML, and the Zama Protocol on Ethereum) leads the commercial field and became its first unicorn in 2025. OpenFHE is the reference open-source library for CKKS and BGV, Apple open-sourced its BFV stack, Google develops the HEIR compiler, and Duality and Tune Insight serve enterprise analytics.

Final thoughts

FHE in 2026 is neither grail nor vaporware. It is a specialist tool with a proven production pattern: a client encrypts a small query, a server computes blind, nobody but the user ever sees the data. Apple, Microsoft, and Zama all ship exactly that shape, and the tooling to build it (TFHE-rs, Concrete ML, OpenFHE, Apple's Swift library) is open source and usable by strong engineering teams today.

The discipline is in what you refuse to build: anything interactive, anything frontier-model-sized, anything where a database plus access control already satisfies the trust model. Scope FHE to the one computation that must stay blind, run it on GPUs, treat ASIC roadmaps as upside, and let a TEE carry the workloads FHE cannot. That is how you get the strongest privacy guarantee in cryptography into a product without the product dying of latency.

Want an honest feasibility check for FHE in your stack?

 Book Free Consultation
Back
Kevin Riedl

11 min read · 5 Jul 2026

Next