Skip to main content

LLM Routing Benchmark

Measuring how routing strategies affect tail latency in LLM inference serving

Multiple Load Balancing Strategies

Benchmark and compare Round Robin, Consistent Hashing, Least KV Cache and Least Queue routing strategies to find what works best for your LLM workload.

Real-Time Metrics with Prometheus

Built-in Prometheus integration tracks request counts and latency per backend, giving to evaluate routing performance.

Pluggable & Extensible

The router is built around loadbalancer.Router interface in Go, making it straightforward to add new routing strategies and run them against the same benchmark harness.