Fully Homomorphic Encryption in 2026: What Ships and What Is Still Hype

TL;DR

FHE is neither grail nor vaporware in 2026. It ships in production wherever the workload is a narrow private lookup: Apple's Live Caller ID and Enhanced Visual Search, Microsoft Edge's Password Monitor, and Zama's encrypted-transaction mainnet on Ethereum at tens of transactions per second. The honest overhead remains roughly 1,000x to 10,000x versus plaintext, so anything interactive or frontier-model-sized is out of scope; the widely quoted minutes-per-token encrypted-LLM figures actually come from MPC research, not FHE. Scheme choice is the first decision: TFHE for logic and comparisons, CKKS for ML and statistics, BFV for exact lookups. GPU acceleration is real and shipping; Intel demonstrated HERACLES silicon in 2026, but it remains a research prototype. Start with TFHE-rs for logic or Apple's Swift library for PIR, and benchmark Poulpy, Lattigo, SEAL, OpenFHE, and GPU-native options such as FIDESlib before choosing a CKKS stack. Figures are a mid-2026 snapshot; re-check before you commit.

Fully homomorphic encryption has the strangest reputation in applied cryptography: simultaneously "the holy grail" and "forever ten years away". Both reputations are now wrong. FHE runs today on hundreds of millions of iPhones, checks passwords in Microsoft Edge, and settles encrypted transactions on Ethereum. It is also still three to four orders of magnitude slower than plaintext, which rules out most of what people imagine using it for. This post draws the line precisely: what FHE does in production in 2026, what the honest numbers are, and which claims to discount. The companion pragmatic guide covers when FHE is the right tool at all.

Engineering perspective, not a vendor pitch. Where a number comes from a vendor or is a roadmap projection rather than a shipped result, we label it. Reference points come from Wavect's frontier-tech and AI work.

Weighing FHE against a TEE for your product?

Book Free Consultation

What is FHE, and why did it suddenly get interesting?

FHE lets a server compute on encrypted data without ever decrypting it. The client encrypts the input, the server runs the computation blind, and only the client can decrypt the result. The server learns nothing, not even the answer. That is a categorically stronger promise than encryption at rest, and stronger than a trusted execution environment, because there is no hardware vendor in the trust model.

It got interesting for two reasons. First, performance: a TFHE bootstrapping operation, the basic unit of unlimited encrypted computation, fell from tens of milliseconds on a CPU to under a millisecond on an NVIDIA H100. Second, credibility: Apple shipped it at consumer scale, and Zama became the field's first unicorn with a 57 million dollar Series B at a valuation above 1 billion dollars in June 2025 (Zama). Money and production deployments changed the conversation.

Which scheme for which job?

"FHE" is a family, and picking the wrong member is a common first mistake:

Scheme	Data model	Strength	Typical use
TFHE (CGGI)	Bits and small integers	Fast programmable bootstrapping, arbitrary logic via lookup tables	Comparisons, branching logic, encrypted smart contracts
CKKS	Approximate real numbers, SIMD-packed	Best amortized throughput for numeric workloads	Machine learning inference, statistics, analytics
BGV / BFV	Exact integers, SIMD-packed	Exact arithmetic at scale	Private lookups (PIR), exact analytics, counting

Rule of thumb: logic and comparisons want TFHE, ML wants CKKS, exact lookups want BFV. A good introduction to the schemes is the 2025 "Beginner's Textbook for Fully Homomorphic Encryption" (arXiv 2503.05136). Modern stacks increasingly switch schemes mid-computation, which is exactly what compiler projects like Google's HEIR are for.

What actually ships in production?

The production list is short, real, and instructive:

Apple Live Caller ID Lookup (iOS 18+). Your iPhone checks an unknown caller against a caller-ID provider's database without revealing the phone number to the server, using BFV-based private information retrieval. Apple open-sourced the stack as swift-homomorphic-encryption (Swift.org, 2024). Together with Enhanced Visual Search below, this is likely the largest consumer FHE footprint in existence.
Apple Enhanced Visual Search. Photos matches landmarks in your pictures against a server index using FHE plus differential privacy. Cryptographically excellent, and still a consent case study: Apple enabled it by default without asking, and took a justified public backlash in January 2025 (The Register). Privacy tech does not excuse skipping the opt-in.
Microsoft Edge Password Monitor. Checks your credentials against breach corpora homomorphically, so Microsoft never sees the password. Same architectural shape as Apple's deployment: a private set lookup.
Zama Protocol on Ethereum. Mainnet since December 2025, enabling encrypted token balances and confidential transfers on public chains via TFHE (Zama docs). Throughput is in the tens of transactions per second today; the published roadmap toward thousands via FPGAs and ASICs is a projection, not a shipped result.
Enterprise data collaboration. Duality Technologies runs homomorphic and federated analytics with healthcare partners including Dana-Farber, typically combining HE with federated learning rather than running everything under FHE.

Notice what every consumer deployment has in common: it is a private lookup against a server dataset, known in the literature as private information retrieval. Small query, bounded computation, asynchronous-tolerant latency. That is the pattern that ships. Nobody is running their backend under FHE, including the companies with the most money on earth.

"Every FHE deployment that actually shipped is a narrow private lookup. The teams that fail are the ones that try to encrypt the whole backend."

How slow is FHE, really?

The honest numbers, as of mid-2026:

General overhead: roughly 1,000x to 10,000x versus the same computation in plaintext, depending on scheme and workload. Additions are cheap, multiplications and comparisons are expensive.
TFHE bootstrapping: single-digit milliseconds on a modern CPU core, under a millisecond on an H100-class GPU, with reports of around 189,000 bootstraps per second across an 8-GPU node (vendor figure).
Small-model ML inference: logistic regression, decision trees, and small neural networks run in seconds under CKKS or via Zama's Concrete ML, which converts quantized models and keeps accuracy within a few points of plaintext for 4-bit quantization (Hugging Face / Zama).
PIR at scale: practical today. Apple answers encrypted lookups for a large fraction of the world's iPhones with acceptable latency and server cost.

Plan with the 1,000x rule: if the plaintext computation takes a microsecond, the encrypted version takes a millisecond and is probably viable. If the plaintext version takes a second, the encrypted version takes upwards of 15 minutes and is not a product.

Can you run an LLM under FHE?

No, not interactively, and this section exists because the numbers most often quoted to prove otherwise are wrong in an instructive way. The widely circulated figure of "8.2 minutes per token for GPT-2 with 25.3 GB of communication" comes from secure two-party computation research, not FHE (arXiv 2410.13060). The gigabytes-of-network-traffic tell is the giveaway: MPC burns bandwidth between parties, while FHE burns local compute with almost no communication. Conflating the two is the most common technical error in content about private AI.

The actual FHE picture: GPU-accelerated research runs a GPT-2 class forward pass roughly 200x faster than CPU baselines (ICML 2025), which still leaves it far from interactive chat. Hybrid schemes (running attention layers in the clear and sensitive layers encrypted) trade privacy for speed and remain research. What works in production terms is small-model inference on genuinely sensitive data: credit scoring, medical pre-screening, fraud signals, where a few seconds of latency on a bounded model is acceptable. If you need private frontier-model inference today, the pragmatic answer is a confidential GPU (NVIDIA H100-class TEE), and we compare those trust models in the decision framework post.

Will hardware fix the overhead?

Partially, on a believable timeline:

Shipping today: GPUs. The under-a-millisecond bootstrap numbers above are real and reproducible today, while Zama's mainnet still runs its coprocessors on CPUs at tens of transactions per second, with the GPU migration on the 2026 roadmap. GPU acceleration delivers one to two orders of magnitude and is the only acceleration you can buy right now.
Prototype silicon: Intel HERACLES. Intel demonstrated a fabricated 8192-way SIMD FHE accelerator at ISSCC in February 2026, reporting 1,074x to 5,547x speedups over a Xeon across seven primitives (IEEE Spectrum). That corrects two opposite mistakes: HERACLES was not stopped, and it is no longer merely a simulation. It is still a research prototype with no announced commercial availability, so architect for GPUs you can deploy now and treat ASICs as future upside.

Which library should you start with?

Library	Scheme focus	Language	Pick it when
TFHE-rs / Concrete (Zama)	TFHE	Rust, Python	Encrypted logic and integers; the de-facto TFHE standard with the largest community
Concrete ML (Zama)	TFHE	Python, scikit-learn-like API	Private ML inference on small models; not officially deprecated, but verify its release cadence and support fit before a new production commitment (official docs)
OpenFHE (consortium)	All major schemes	C++	Multi-scheme research, interoperability, and advanced features; not the automatic performance pick for CKKS analytics
Poulpy	CKKS, binary FHE	Rust	An emerging CPU-focused CKKS option with AVX2, AVX-512, and ARM backends; v0.7 added full CKKS bootstrapping, but the public API is still evolving (v0.7 release)
FIDESlib	CKKS	C++ / CUDA	Performance-sensitive server-side CKKS on NVIDIA GPUs; interoperates with OpenFHE clients and reports at least 70x faster bootstrapping than AVX-optimized OpenFHE (paper)
swift-homomorphic-encryption (Apple)	BFV	Swift	PIR-style private lookups, especially in Apple ecosystems
Lattigo (Tune Insight)	CKKS, BGV, multiparty	Go	Go shops and multiparty-HE setups
Microsoft SEAL	BFV, CKKS	C++	Existing integrations and a compact C++ stack; not deprecated, with 4.3.3 released in May 2026. Microsoft stopped publishing new NuGet packages, so .NET users must build newer packages from source (official repository)
HEIR (Google)	Compiler across schemes	MLIR-based	Compiling high-level code to FHE backends; the likeliest long-term abstraction layer (heir.dev)

Default picks: TFHE-rs for logic, Apple's library for PIR, and a workload benchmark before choosing any CKKS stack. Start a CPU CKKS bake-off with Poulpy, Lattigo, SEAL, and OpenFHE; include a GPU-native library such as FIDESlib when NVIDIA deployment is acceptable. OpenFHE remains valuable for breadth and interoperability, but specialized libraries can be one to two orders of magnitude faster on important CKKS paths. Do not choose from a generic leaderboard: match the ring size, depth, precision, bootstrapping frequency, batch size, and target hardware to your real workload.

Frequently Asked Questions

Is fully homomorphic encryption practical in 2026?

Yes, for narrow workloads: private lookups (the Apple and Microsoft pattern), small-model ML inference, and encrypted logic at tens of transactions per second. No, for general-purpose or interactive computation, where the 1,000x to 10,000x overhead still rules it out. The scoping decision is the whole game.

What is the difference between FHE and a TEE like Intel TDX or a confidential GPU?

A TEE runs plaintext computation inside hardware isolation at near-native speed, but you trust the chip vendor and the absence of side-channel attacks. FHE removes that hardware trust entirely at a cost of three to four orders of magnitude in performance. Most products that need confidential compute at scale today choose a TEE; FHE wins where no hardware trust root is acceptable.

Can FHE run ChatGPT-style models privately?

Not interactively in 2026. GPU-accelerated research has sped up encrypted GPT-2 class inference dramatically, but frontier-scale encrypted inference remains far from real-time. The often-quoted minutes-per-token figures with gigabytes of traffic actually describe MPC systems, not FHE. For private LLM inference today, confidential GPUs are the pragmatic option.

What is TFHE versus CKKS?

TFHE computes on bits and small integers with fast bootstrapping, making it ideal for comparisons, branching, and exact logic. CKKS computes on approximate real numbers with heavy SIMD packing, making it the scheme of choice for machine learning and statistics. Serious applications often combine both via scheme switching.

Which FHE library should a new project use?

Choose by scheme and measured workload, not vendor prominence. TFHE-rs is the mature default for TFHE logic, Apple's Swift library fits BFV private lookups, and CKKS needs a bake-off: Poulpy, Lattigo, SEAL, and OpenFHE on CPU, plus a GPU-native option such as FIDESlib when appropriate. OpenFHE offers exceptional breadth, but it is not automatically the fastest CKKS implementation.

Final thoughts

FHE in 2026 is neither grail nor vaporware. It is a specialist tool with a proven production pattern: a client encrypts a small query, a server computes blind, nobody but the user ever sees the data. Apple, Microsoft, and Zama all ship exactly that shape, and strong engineering teams can build it with maintained open-source stacks today.

The discipline is in what you refuse to build: anything interactive, anything frontier-model-sized, anything where a database plus access control already satisfies the trust model. Scope FHE to the one computation that must stay blind, benchmark libraries on the exact workload and hardware, and let a TEE carry the workloads FHE cannot. That is how you get the strongest privacy guarantee in cryptography into a product without the product dying of latency.