Cost-aware routing

Route every request to the right model. Automatically.

Spendplane Smart Router evaluates intent, latency, and budget policies in real time, then dispatches each request to the best-fit provider or local model without changing your workflow.

View routing docs See how it works

Decision Engine

Live route selection for one incoming request

policy active

Incoming signal

task: summarize support ticket

region: eu-central

budget mode: balanced

latency ceiling: 700ms

decision: evaluate providers

gpt-4o-mini

selected

420ms

$0.002

claude-3-5-sonnet

quality lane

690ms

$0.011

local-llama

fallback

230ms

$0.000

Outcomeaccepted

The router selected the lowest-cost path that still clears the configured latency budget and quality rule.

Latency Ledger

What it does

Classification tasks

Lower cost for routine work

Before

Frontier model

After

Small/fast model

Summarization

Same UX, less spend

Before

High-end model

After

Fast mid-tier model

Code review

Quality where it matters

Before

Single provider

After

Policy-selected provider

Incident failover

Continuity without a deploy

Before

Outage = errors

After

Failover chain

Spend stays predictable

Guardrails enforce project and team budgets, then downgrade, reroute, or block when thresholds are hit.

Latency stays in range

Routing selects healthy endpoints that meet your performance targets, with multi-provider resilience.

Failover is automatic

Define fallback chains so outages and rate limits do not become product incidents.

How it works

A router built for production traffic.

Intercept

Point tools and SDKs at Spendplane once. Keep OpenAI-compatible request shapes and existing integrations.

Evaluate

Policies classify the request by intent, budget, and quality targets before any provider call happens.

Dispatch

The router selects a provider or local model and applies fallback chains when conditions change.

Account

Every decision is logged to the Control Plane so you can trace cost, latency, and outcomes by team and project.

Routing primitives

The knobs teams actually need.

Smart Router should feel like an operating surface, not a feature catalog. The controls below reflect the decisions teams actually need to make live.

Routing Modes

balanced

Performance Modepolicy tier

Balanced Modepolicy tier

Budget Modepolicy tier

Cost waterfall

Route low-risk requests (summaries, extraction, classification) to cheaper models while reserving frontier models for high-impact work.

Latency optimization

Prefer endpoints that hit your SLA, then trade off cost versus speed based on policy instead of guesswork.

Failover chains

Define backup providers and local endpoints. If a route degrades, traffic moves automatically without dropped requests.

Budget guardrails

Hard caps per project, per team, or per key. When limits are reached, route to lower-cost models or block with a clear reason.

Multi-provider hub

One base URL across providers (OpenAI, Anthropic, Google, Mistral, and OpenAI-compatible APIs), with consistent logging and controls.

Local model support

Route sensitive or high-volume work to on-prem inference (Ollama, vLLM, or any internal gateway) alongside cloud models.

Stop paying frontier prices for routine requests.

Put budget and routing policy where it belongs: in the request path, before providers see traffic.

Read routing docs Talk to enterprise