Everything you need to ship your first endpoint.
- 1M tokens / mo included
- OpenAI-compatible API
- Shared inference pool
- Community support
Usage-based pricing that grows with your traffic.
- Pay only for tokens served
- Autoscale to zero
- 99.99% uptime SLA
- Priority routing
- Email & Slack support
Dedicated capacity, private models, and white-glove onboarding.
- Reserved H100 / B200 clusters
- Private model registry
- VPC & on-prem deploy
- Custom SLAs
- Dedicated solutions engineer
Transparent token pricing
Priced per 1M tokens. Input and output billed separately, metered per second of compute.
| MODEL | TYPE | CONTEXT | INPUT / 1M | OUTPUT / 1M |
|---|---|---|---|---|
| Fortis-L 70B | Text | 128K | $0.50 | $0.60 |
| Fortis-L 8B | Text | 128K | $0.08 | $0.10 |
| Fortis-Vision 34B | Vision | 64K | $0.70 | $0.80 |
| Fortis-Code 16B | Code | 256K | $0.20 | $0.25 |
| Fortis-Voice 4B | Audio | 32K | $0.12 | $0.15 |
| Fortis-Embed | Text | 8K | $0.02 | — |
Questions, answered
How does usage-based billing work?
You’re billed only for the tokens you serve, metered per second of compute. There are no seat licenses and no idle charges — endpoints autoscale to zero when traffic stops.
What counts as a token?
Roughly four characters of English text. Input and output tokens are priced separately; see the per-model rates above for exact figures.
Can I reserve dedicated capacity?
Yes. Enterprise plans include reserved H100 / B200 clusters with per-second billing, private model registries, and VPC or on-prem deployment.
Is there a free tier?
The Starter plan includes 1M tokens per month at no cost — enough to prototype and ship your first endpoint.
Start serving inference in minutes.
Spin up an OpenAI-compatible endpoint on your first 1M tokens, free. No credit card required.