DOCS · QUICKSTART

Ship your first endpoint.

The Fortis API is OpenAI-compatible — point your existing SDK at our base URL and start serving inference in minutes.

Introduction

Fortis exposes a single, OpenAI-compatible REST API for every model in the catalog. If you already use theopenai SDK, you only change the base URL and the API key — everything else works unchanged.

This guide takes you from zero to a streaming completion. You’ll need a Fortis account and an API key, which you can create from the dashboard.

Authentication

Authenticate every request with a bearer token. Store your key in an environment variable — never commit it to source control.

SHELL

export FORTIS_API_KEY="sk-fortis-..."

Keys are scoped per project and can be rotated at any time. Requests without a valid key return 401 Unauthorized.

Your first request

Send a chat completion with the official OpenAI SDK by overriding the base URL:

TYPESCRIPT

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.FORTIS_API_KEY,
  baseURL: "https://api.fortis.dev/v1",
});

const res = await client.chat.completions.create({
  model: "fortis-l-70b",
  messages: [{ role: "user", content: "Explain inference in one line." }],
});

console.log(res.choices[0].message.content);

That’s the whole integration. The response shape matches the OpenAI spec, so existing parsing and tooling keep working.

Streaming responses

Set stream: trueto receive tokens as they’re generated — ideal for chat UIs where time-to-first-token matters.

TYPESCRIPT

const stream = await client.chat.completions.create({
  model: "fortis-l-70b",
  messages: [{ role: "user", content: "Write a haiku about GPUs." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Choosing a model

Pass any catalog model id as model. Start with fortis-l-8b for speed and cost, then scale up to fortis-l-70b for harder tasks. See the pricing page for per-model token rates.

Errors & retries

The API uses standard HTTP status codes. Retry 429 and 5xx responses with exponential backoff.

HTTP

200  OK             — completion returned
401  Unauthorized   — missing or invalid API key
429  Rate limited   — back off and retry
503  Overloaded     — capacity scaling, retry shortly

Rate limits scale with your plan. Every error body includes a machine-readable code and a human-readable message.

Next steps

You have a working endpoint. From here:

Tune throughput and cost on the pricing page.
Reserve dedicated capacity for production workloads.
Wire structured outputs and tool calls into your app.

GET AN API KEY SEE PRICING

Questions? Reach the team in the developer Slack.