BLOG

Notes from the inference layer.

Announcements, engineering deep-dives, and practical guides from the team building Fortis.

ANNOUNCEMENT4 min read

Introducing Fortis Inference

A serverless, OpenAI-compatible inference layer that scales from your first request to billions — without managing a single GPU.

READ ARTICLE
ENGINEERING7 min read

How we cut time-to-first-token to 12ms

Time-to-first-token is the latency that users actually feel. Here's the routing and warm-pool work that took ours from 90ms to 12ms.

MARCUS WEBB
GUIDE6 min read

A field guide to streaming LLM responses

Streaming is the difference between an app that feels instant and one that feels broken. A practical guide to doing it well.

DANI OKAFOR