Chat Completions
POST /v1/chat/completions is an OpenAI-wire-compatible proxy. If you have
code that talks to the OpenAI Chat Completions API, point its base URL at AIUS,
use an aius_… bearer token, and pass an Anthropic Claude model slug — it just
works. The provider key is held server-side; you never send one.
POST https://aius.co/api/v1/chat/completions
Request
Authorization: Bearer aius_xxxxxxxx...
Content-Type: application/json
Body
The body is forwarded to the upstream model in OpenAI Chat Completions shape.
{
"model": "anthropic/claude-sonnet-4",
"messages": [
{ "role": "system", "content": "You are a concise assistant." },
{ "role": "user", "content": "Say hello in one sentence." }
],
"stream": false,
"max_tokens": 256,
"temperature": 0.7
}
| Field | Type | Required | Notes |
|---|
model | string | Yes | Anthropic Claude slug, e.g. anthropic/claude-sonnet-4. Call GET /v1/models for the live list. |
messages | array | Yes | OpenAI message objects (role + content). |
stream | boolean | No | true streams Server-Sent Events. Default false. |
max_tokens | integer | No | Upper bound on generated tokens. |
temperature | number | No | Sampling temperature. |
tools | array | No | OpenAI-style tool/function definitions for tool calling. |
Standard OpenAI Chat Completions fields (top_p, stop, tool_choice, etc.)
are forwarded upstream. The proxy does not strip them.
Non-streaming response
Returns the standard Chat Completions object:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1717123456,
"model": "anthropic/claude-sonnet-4",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello there!" },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 18, "completion_tokens": 4, "total_tokens": 22 }
}
The proxy also echoes correlation headers on the response so you can tie a call
to a run-model trace: x-aius-session-id, x-aius-run-id, x-aius-step-run-id.
Streaming response
Set "stream": true. The response is text/event-stream of OpenAI chunk
objects, terminated by data: [DONE]:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" there!"},"finish_reason":null}]}
data: [DONE]
Examples
Non-streaming
curl -X POST https://aius.co/api/v1/chat/completions \
-H "Authorization: Bearer aius_xxxxxxxx..." \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4",
"messages": [{"role": "user", "content": "Say hello in one sentence."}]
}'
Using the OpenAI SDK
Because the wire format matches, the official OpenAI SDKs work directly:
from openai import OpenAI
client = OpenAI(
base_url="https://aius.co/api/v1",
api_key="aius_xxxxxxxx...",
)
resp = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)
Streaming
import httpx
with httpx.stream(
"POST",
"https://aius.co/api/v1/chat/completions",
headers={"Authorization": "Bearer aius_xxxxxxxx..."},
json={
"model": "anthropic/claude-sonnet-4",
"messages": [{"role": "user", "content": "Tell me a short story."}],
"stream": True,
},
timeout=None,
) as r:
for line in r.iter_lines():
if line.startswith("data: ") and line != "data: [DONE]":
print(line[len("data: "):])
List models
GET /v1/models (bearer required) returns the OpenAI-style model list the
gateway will accept:
curl https://aius.co/api/v1/models \
-H "Authorization: Bearer aius_xxxxxxxx..."
When to use the run loop instead
/v1/chat/completions is a stateless proxy: you own the conversation, the tool
loop, and tool execution. If you want the server to drive the agent loop and
just hand you tool calls to execute locally, use the
run-loop WebSocket instead.