ENGINEERING7 min read
How we cut time-to-first-token to 12ms
Time-to-first-token is the latency that users actually feel. Here's the routing and warm-pool work that took ours from 90ms to 12ms.
MARCUS WEBBGUIDE6 min read
A field guide to streaming LLM responses
Streaming is the difference between an app that feels instant and one that feels broken. A practical guide to doing it well.
DANI OKAFOR