Spend stays predictable
Guardrails enforce project and team budgets, then downgrade, reroute, or block when thresholds are hit.
Spendplane Smart Router evaluates intent, latency, and budget policies in real time, then dispatches each request to the best-fit provider or local model without changing your workflow.
Decision Engine
Live route selection for one incoming request
Incoming signal
task: summarize support ticket
region: eu-central
budget mode: balanced
latency ceiling: 700ms
decision: evaluate providers
gpt-4o-mini
selected
420ms
$0.002
claude-3-5-sonnet
quality lane
690ms
$0.011
local-llama
fallback
230ms
$0.000
The router selected the lowest-cost path that still clears the configured latency budget and quality rule.
Latency Ledger
Classification tasks
Lower cost for routine work
Before
Frontier model
After
Small/fast model
Summarization
Same UX, less spend
Before
High-end model
After
Fast mid-tier model
Code review
Quality where it matters
Before
Single provider
After
Policy-selected provider
Incident failover
Continuity without a deploy
Before
Outage = errors
After
Failover chain
Guardrails enforce project and team budgets, then downgrade, reroute, or block when thresholds are hit.
Routing selects healthy endpoints that meet your performance targets, with multi-provider resilience.
Define fallback chains so outages and rate limits do not become product incidents.
How it works
Point tools and SDKs at Spendplane once. Keep OpenAI-compatible request shapes and existing integrations.
Policies classify the request by intent, budget, and quality targets before any provider call happens.
The router selects a provider or local model and applies fallback chains when conditions change.
Every decision is logged to the Control Plane so you can trace cost, latency, and outcomes by team and project.
Routing primitives
Smart Router should feel like an operating surface, not a feature catalog. The controls below reflect the decisions teams actually need to make live.
Routing Modes
balancedRoute low-risk requests (summaries, extraction, classification) to cheaper models while reserving frontier models for high-impact work.
Prefer endpoints that hit your SLA, then trade off cost versus speed based on policy instead of guesswork.
Define backup providers and local endpoints. If a route degrades, traffic moves automatically without dropped requests.
Hard caps per project, per team, or per key. When limits are reached, route to lower-cost models or block with a clear reason.
One base URL across providers (OpenAI, Anthropic, Google, Mistral, and OpenAI-compatible APIs), with consistent logging and controls.
Route sensitive or high-volume work to on-prem inference (Ollama, vLLM, or any internal gateway) alongside cloud models.
Put budget and routing policy where it belongs: in the request path, before providers see traffic.