Today we’re opening up Fortis Inferenceto everyone. It’s the same platform we’ve been running internally for the last year: a serverless endpoint that takes an OpenAI-compatible request and streams tokens back with a median time-to-first-token under 15ms.
Why we built it
Running inference in production means wrestling with GPU pools, autoscaling, cold starts, and routing — none of which is the product you actually want to ship. Fortis collapses all of that into one endpoint. You change a baseURLand an apiKey, and the rest of your code stays exactly the same.
What you get on day one
- OpenAI-compatible REST API for every model in the catalog
- Token streaming with sub-15ms time-to-first-token
- Autoscale to zero — you pay per second of compute, never for idle headroom
- Global edge routing across 200+ regions
Getting started
Create a free account, grab an API key, and point your SDK at https://api.fortis.dev/v1. Your first 1M tokens are on us. The quickstart guide takes you from zero to a streaming completion in about five minutes.
We can’t wait to see what you build.
Ready to serve your first token?
Spin up an OpenAI-compatible endpoint on your first 1M tokens, free.