Serve any open model behind one endpoint with automatic batching and routing.
On-demand H100 & B200 clusters with per-second billing and zero queue time.
Fine-tune, evaluate and ship your own checkpoints from a managed registry.
Median time-to-first-token across providers serving Fortis-L 70B, measured over 10k requests.
One OpenAI-compatible endpoint. Swap the base URL and your stack just got faster.
# pip install fortis from fortis import Client client = Client(api_key="fp-•••••••") out = client.infer( model="fortis-l-70b", prompt="Explain inference at scale", stream=True, ) for tok in out: print(tok.text, end="")