BLOG

Notes from the inference layer.

Announcements, engineering deep-dives, and practical guides from the team building Fortis.

ANNOUNCEMENT·June 10, 2026·4 min read

Introducing Fortis Inference

A serverless, OpenAI-compatible inference layer that scales from your first request to billions — without managing a single GPU.

ENGINEERING·May 28, 2026·7 min read

Time-to-first-token is the latency that users actually feel. Here's the routing and warm-pool work that took ours from 90ms to 12ms.

GUIDE·May 12, 2026·6 min read

Streaming is the difference between an app that feels instant and one that feels broken. A practical guide to doing it well.