LLM Gateways Compared 2026: LiteLLM vs OpenRouter vs Portkey vs RouteLLM

Most teams wire their product straight to one provider's SDK. It works until it doesn't. Then the provider has an outage and your app goes down with it. Then finance asks why one runaway job burned a month of budget in an afternoon. Then a new model ships that is cheaper and better for half your traffic, and switching means touching every call site. So the team starts bolting on retries, a spend cap, a second provider, a cache, and within a quarter you have built half an agent infrastructure layer badly, in your own codebase, with no one owning it.

That missing layer has a name: the LLM gateway. One endpoint in front of many providers, with fallback, caching, spend limits, and routing handled in one place. This is an engineering perspective, not a vendor pitch. The feature and pricing points below are directional and dated June 2026, drawn from each tool's public docs and site, so re-check them before you commit. The reference points come from Wavect's AI product work, where we have put a gateway in front of production traffic and lived with the tradeoffs.

Wiring an AI product to one provider?

Book Free Consultation

What does an LLM gateway actually do?

A gateway is a proxy that sits between your application and the model providers. Your code calls one endpoint, usually in the OpenAI request format, and the gateway translates and forwards the call to whichever provider should serve it. That single seam is where you get the features you would otherwise rebuild by hand:

One endpoint over many providers. Swap Claude for GPT for Gemini for an open-weight model by changing a config value, not a call site. The provider SDK lock-in disappears.
Fallback. When a provider returns an error or times out, the gateway retries against another model or provider so a single outage does not take your product down.
Caching. Identical or semantically similar requests can return a stored response and skip the model call entirely, which cuts both cost and latency on repetitive traffic.
Spend limits and keys. Per-key, per-user, or per-team budgets and rate limits, so one bad loop cannot drain the account, and so you can hand a scoped virtual key to each team.
Routing. Send the easy majority to a cheap model and the hard minority to a frontier one, on rules or on a learned policy.
Observability. One place to see every request, its cost, its latency, and its tokens, broken down by model, key, and feature.

Routing is the cost lever most people come for, and we covered the economics of it in how to cut LLM token costs in 2026. This post is about the tools that give you that lever plus the rest of the layer.

How do LiteLLM, OpenRouter, Portkey, and RouteLLM differ?

These four names come up together but they are not the same kind of thing. Two are full gateways, one is a hosted aggregator, and one is a routing research framework. Here is the shape of each, with capabilities verified against each tool's own docs in June 2026. Treat it as a snapshot and re-check before you commit.

Tool	Type	Hosting	Routing / fallback	Caching	Observability	Best for
LiteLLM	Open-source proxy (OSS + paid enterprise)	Self-host (or managed)	Yes, load-balancing and fallback across 100+ providers	Yes, incl. Redis-backed	Logs, spend tracking, integrations (Langfuse, OTel)	Teams that want full control on their own infra
OpenRouter	Hosted aggregator / marketplace	Run for you (SaaS)	Yes, provider failover	Provider-side prompt caching passthrough	Dashboard and usage analytics	Fast access to 300+ models behind one key
Portkey	Gateway + observability + guardrails (OSS core + cloud)	Self-host core, or cloud; air-gapped on enterprise	Yes, routing, fallback, retries across 1,600+ models	Yes	Deep, logs, traces, analytics, 50+ guardrails	Production teams that want guardrails and observability built in
RouteLLM	Open-source routing framework (research)	Self-host, you embed it	Routing decision only, not a full gateway	No	No	Building a cost-quality router into your own stack

LiteLLM is the open-source workhorse: a proxy you deploy yourself that normalizes 100+ providers behind one OpenAI-format endpoint and centralizes virtual keys, budgets, rate limits, fallback, and logging in a config file you version in your repo. OpenRouter is a hosted aggregator: one key, hundreds of models, run for you, with a marketplace model so you can reach a new model the day it lands without an account at every provider. Portkey is a gateway that leads with observability and guardrails; it open-sourced its core gateway in 2026 and also sells a cloud tier. RouteLLM is different in kind: an LMSYS and Berkeley research framework for training and serving the routing decision itself, not the surrounding gateway.

"Three of these are gateways and one is a router. Comparing RouteLLM to LiteLLM is comparing the brain to the body. Most teams need both, and most reach for the body first."

Self-host or hosted: which should you pick?

This is the first real decision, and it usually decides the tool. The tradeoff is control and data residency against operational burden.

Hosted (OpenRouter, Portkey cloud). You get the layer in an afternoon, with no infrastructure to run. The cost is a dependency you do not control and, for OpenRouter, data passing through a third party, which an EU team has to weigh against its data-residency posture. OpenRouter's catalog rates broadly match each provider's published rates, but check the platform and credit-card fees, which can add a meaningful percentage on small top-ups, and a BYOK fee above a monthly request threshold. Re-verify current fees before you budget.
Self-host (LiteLLM, Portkey core, RouteLLM). You run the proxy on your own infrastructure, so the data path and the upgrade cadence are yours. LiteLLM is free as open source with no usage fees, but you operate the proxy plus its Postgres and Redis, and that ops time is the real cost. For a team with DevOps capacity and a compliance requirement, this is usually the right call. For a two-person product team in a hurry, it is overhead they do not yet need.

The honest default: start hosted to prove the layer earns its place, move to self-host when data residency, cost at volume, or control forces the question. That mirrors the build-versus-buy logic we apply across AI product engagements, including work like Twinsoft AI.

How good is the cost tracking and observability?

The reason finance noticed your bill late is that the provider SDK gives you almost no view into spend by feature or team. A gateway is where that view lives, and the four tools sit at different depths.

LiteLLM tracks spend per key, user, and team, enforces budgets and rate limits, and ships logs to tools like Langfuse and OpenTelemetry. It is solid, and because it is your deployment, the data stays with you.
Portkey leads with observability. Logs, traces, and analytics are the product, and it bills on recorded logs rather than raw requests, so the pricing tracks how much you observe. If you want the dashboard, the guardrails, and the audit trail out of the box, this is the deepest of the four.
OpenRouter gives you a usage dashboard and analytics across the models you call, which is enough for many teams and requires zero setup.
RouteLLM gives you none of this. It is the routing decision, not the surrounding platform, so observability is whatever you wrap around it.

One caution we always add: spend dashboards tell you what you paid, not whether quality held. A router that quietly sends harder queries to a cheaper model will look like a win on the cost chart and a loss in production. You need an eval harness next to the gateway to catch that, and no gateway ships one. That discipline is on you.

Where does RouteLLM fit, and are the numbers real?

RouteLLM is the only one of the four that is purely about the routing decision: given a query, send it to the strong model or the cheap one. The published figures are genuinely strong, and worth citing carefully. The LMSYS and Berkeley team report that, with augmented training data, their matrix-factorization router reaches about 95% of GPT-4's performance while sending only 14% of calls to the strong model, which they put at roughly 75% cheaper than a random baseline, and over 85% cost reduction on the MT Bench evaluation.

Read those numbers as directional and as the source frames them: they come from the original RouteLLM paper, benchmarked on specific datasets (MT Bench, MMLU, GSM8K) against a GPT-4-class strong model, published 2024 and presented at ICLR 2025. Your traffic is not those benchmarks, so the savings on your workload will differ. The honest takeaway is the shape, not the exact percentage: a learned router can hold most of the quality while sending a minority of calls to the expensive model. You still have to prove it on your own eval before you trust it.

In practice you do not choose RouteLLM instead of a gateway. You can run RouteLLM as the routing brain and put a gateway around it for fallback, keys, caching, and observability, or you use a gateway's own simpler routing rules and skip the framework. RouteLLM earns its place when routing is your single biggest cost lever and rule-based routing is leaving savings on the table.

How should you choose?

Map the tool to the constraint that is actually binding you, not to the longest feature list:

Need it today, no infra to run. Reach for a hosted option. OpenRouter for breadth of models behind one key, Portkey cloud if you also want observability and guardrails from day one.
Data residency or full control matters. Self-host. LiteLLM if you want a lean, widely-used open-source proxy; Portkey's open-source core if observability and guardrails are first-class requirements.
Observability and guardrails are the priority. Portkey is built around them. The others log; Portkey makes the log the product.
Routing is your biggest cost lever. Add RouteLLM as the routing brain inside whichever gateway you picked, and prove the savings on your own eval.
You are unsure. Start with one hosted gateway, instrument it, and let two weeks of real spend-by-feature data tell you what to optimize. The data answers the question faster than the comparison table.

None of these is a permanent commitment if you keep your application talking to the OpenAI request format. That is the quiet benefit of the whole pattern: the gateway is the thing you can swap, precisely because it standardizes the seam your code depends on.

Final thoughts

An LLM gateway is the layer most teams rebuild badly before they realize it has a name. Put it in early and you get fallback, caching, spend limits, routing, and observability in one place, behind one endpoint your code can keep calling while you swap what is behind it.

The four tools are not interchangeable. LiteLLM is the open-source self-hosted workhorse. OpenRouter is the fastest hosted path to many models. Portkey leads with observability and guardrails and gives you both open-source and cloud. RouteLLM is the routing brain, not a gateway, with strong but benchmark-specific numbers you must re-prove on your own traffic. Pick on the constraint that binds you, hosted versus self-host first, then instrument it, and let your own spend and eval data, not a vendor's chart, decide what to optimize next.

Want a second opinion on your AI infra layer?