Once you’ve shipped a streaming endpoint, going back to waiting for the whole completion feels broken. But streaming well takes a little more than flipping stream: true. Here’s what we’ve learned.
Render tokens as they arrive
Append each delta to your buffer and paint immediately. Don’t wait for sentence or word boundaries — partial words flicker for a frame and then resolve, and users read it as “fast,” not “glitchy.”
Handle backpressure
If your UI can’t paint as fast as tokens arrive, batch deltas on an animation frame instead of re-rendering per chunk. On the server, respect the client’s read rate so a slow consumer doesn’t balloon memory.
Fail gracefully mid-stream
A stream can drop after the first token. Keep what you’ve rendered, surface a subtle retry affordance, and resume from where you left off when you can. Never blank the screen on a partial failure.
A minimal client
With the OpenAI SDK pointed at Fortis, the whole loop is a handful of lines — iterate the stream and write each delta. See the streaming section of the docs for the full example, including abort handling.
Ready to serve your first token?
Spin up an OpenAI-compatible endpoint on your first 1M tokens, free.