Skip to content

OpenAI compatibility

Quant AI exposes an OpenAI-compatible API at /oai/v1. Point any OpenAI SDK or tool (the official Python/Node SDKs, LangChain, LiteLLM, Cline, OpenRouter-style proxies, Drupal AI, etc.) at this base URL and use your Quant API token as the API key — no adapter required.

Under the hood, Quant runs your requests on AWS Bedrock models (Anthropic Claude, Amazon Nova, and others) but returns them in the exact shapes the OpenAI SDKs expect.

https://dashboard.quantcdn.io/oai/v1

Configure your client’s base_url (or “API base”) to this value. The SDK appends /chat/completions, /embeddings, and /models automatically.

Use a Quant API token (the qc_... value) as the OpenAI api_key. It is sent as a standard Authorization: Bearer <token> header.

The organisation is resolved from the token automatically. If your token is scoped to more than one organisation, select one with the OpenAI-Organization header (mapped to your Quant organisation machine name).

Endpoint OpenAI method Notes
POST /oai/v1/chat/completions Chat completions Buffered and streaming (stream: true); tool/function calling
POST /oai/v1/embeddings Embeddings Single string or array of strings
GET /oai/v1/models List models Returns the model ids available to your organisation
GET /oai/v1/models/{model} Retrieve model
from openai import OpenAI
client = OpenAI(
api_key="qc_your_quant_token",
base_url="https://dashboard.quantcdn.io/oai/v1",
)
resp = client.chat.completions.create(
model="anthropic.claude-sonnet-4-6",
messages=[{"role": "user", "content": "Write a haiku about CDNs."}],
)
print(resp.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "qc_your_quant_token",
baseURL: "https://dashboard.quantcdn.io/oai/v1",
});
const resp = await client.chat.completions.create({
model: "anthropic.claude-sonnet-4-6",
messages: [{ role: "user", content: "Write a haiku about CDNs." }],
});
console.log(resp.choices[0].message.content);
Terminal window
curl https://dashboard.quantcdn.io/oai/v1/chat/completions \
-H "Authorization: Bearer qc_your_quant_token" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic.claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Hello!"}]
}'

Set stream: true to receive Server-Sent Events. Each event is a chat.completion.chunk object, and the stream terminates with data: [DONE] — exactly as OpenAI’s SDKs expect.

stream = client.chat.completions.create(
model="anthropic.claude-sonnet-4-6",
messages=[{"role": "user", "content": "Count to five."}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")

Pass stream_options={"include_usage": true} to receive a final usage chunk before [DONE].

Standard OpenAI tools are supported, including tool_choice:

resp = client.chat.completions.create(
model="anthropic.claude-sonnet-4-6",
messages=[{"role": "user", "content": "What's the weather in Melbourne?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}],
tool_choice="auto",
)
print(resp.choices[0].message.tool_calls)

tool_choice accepts "auto", "none", "required", or {"type": "function", "function": {"name": "..."}}. When you force a tool (required or a specific function), the tool call is returned to your client to execute — the OpenAI-standard behaviour.

List the models available to your organisation and pass an id as the model field:

Terminal window
curl https://dashboard.quantcdn.io/oai/v1/models \
-H "Authorization: Bearer qc_your_quant_token"

Model ids are Quant/Bedrock ids such as anthropic.claude-sonnet-4-6, amazon.nova-micro-v1:0, or amazon.titan-embed-text-v2:0.

The /oai/v1 surface targets the stateless drop-in tier of the OpenAI API. The following are intentional differences:

  • Single choice — responses always contain one choice; n > 1 is not supported.
  • Ignored parametersfrequency_penalty, presence_penalty, seed, logprobs, and logit_bias are accepted but not applied.
  • finish_reason — streaming responses report stop rather than length on max-token truncation (buffered responses report length correctly).
  • Not included — image generation, audio, moderations, the Assistants/Responses APIs, and batches. Use the native Quant AI API for agents, sessions, vector database, workflows, and other stateful features.

For everything beyond the drop-in tier — agents, orchestrations, skills, vector search, governance — see the AI API reference.