Run the models yourself.

06 · Inference

OpenAI-compatible serving on your GPUs, fronted by QUIC.

AIBrixKubernetesvLLMQUIC · HTTP/3

Drop-in compatible

/v1/chat/completions and /v1/audio/* — point the client you already use at the edge and go.

A QUIC front-proxy at the edge roughly halves WAN time-to-first-token versus HTTP/1 — measured, not marketed.

Models deploy per tenant on Kubernetes through the control plane, with metering and routing handled by the platform.

Self-hosted models plug straight into agents — including Qwen Omni speech-to-speech replacing an entire ASR/LLM/TTS pipeline.

0faster WAN time-to-first-token, HTTP/3 vs HTTP/1

0endpoint shape: /v1, drop-in compatible