Skip to content

Chat inference via API Gateway (buffered responses) with multimodal support

POST
/api/v3/organizations/{organisation}/ai/chat

Sends requests to the AI API Gateway endpoint which buffers responses. Supports text, images, videos, and documents via base64 encoding. * * Execution Modes: * - Sync Mode (default): Standard JSON response, waits for completion (200 response) * - Async Mode: Set async: true for long-running tasks with polling (202 response) * * Async/Durable Mode (async: true): * - Returns immediately with requestId and pollUrl (HTTP 202) * - Uses AWS Lambda Durable Functions for long-running inference * - Supports client-executed tools via waiting_callback state * - Poll /ai/chat/executions/{requestId} for status * - Submit client tool results via /ai/chat/callback * - Ideal for complex prompts, large contexts, or client-side tools * * Multimodal Support: * - Text: Simple string content * - Images: Base64-encoded PNG, JPEG, GIF, WebP (up to 25MB) * - Videos: Base64-encoded MP4, MOV, WebM, etc. (up to 25MB) * - Documents: Base64-encoded PDF, DOCX, CSV, etc. (up to 25MB) * * Supported Models (Multimodal): * - Claude 4.5 Series: Sonnet 4.5, Haiku 4.5, Opus 4.5 (images, up to 20 per request) * - Claude 3.5 Series: Sonnet v1/v2 (images, up to 20 per request) * - Amazon Nova: Lite, Pro, Micro (images, videos, documents) * * Usage Tips: * - Use base64 encoding for images/videos < 5-10MB * - Place media before text prompts for best results * - Label multiple media files (e.g., ‘Image 1:’, ‘Image 2:’) * - Maximum 25MB total payload size * * Response Patterns: * - Text-only: Returns simple text response when no tools requested * - Single tool: Returns toolUse object when AI requests one tool * - Multiple tools: Returns toolUse array when AI requests multiple tools * - Auto-execute sync: Automatically executes tool and returns final text response * - Auto-execute async: Returns toolUse with executionId and status for polling

Authorizations

Parameters

Path Parameters

organisation
required
string

The organisation ID

Request Body required

Chat request with optional multimodal content blocks

object
messages
required

Array of chat messages. Content can be a simple string or an array of content blocks for multimodal input.

Array<object>
>= 1 items
object
role
required
string
Allowed values: user assistant system
content
required
One of:

Simple text message

string
modelId
required

Model ID. Use Nova models for multimodal support.

string
amazon.nova-lite-v1:0
temperature
number
default: 0.7 <= 2
maxTokens

Max tokens. Claude 4.5 supports up to 64k.

integer
default: 4096 >= 1 <= 65536
topP
number
<= 1
stream

Ignored in buffered mode, always returns complete response

boolean
systemPrompt

Optional custom system prompt. When tools are enabled, this is prepended with tool usage guidance.

string
stopSequences

Custom stop sequences

Array<string>
<= 4 items
responseFormat

Structured JSON output (Claude 3.5 Sonnet v1/v2, Nova Pro)

object
type
string
Allowed values: json
jsonSchema

JSON Schema defining expected structure

object
toolConfig

Function calling configuration (Claude 3+, Nova Pro)

object
tools
Array<object>
object
toolSpec
object
name
string
description
string
inputSchema
object
json

JSON Schema for function parameters

object
autoExecute

When true, backend automatically executes tools and feeds results back to AI. For async tools (e.g., image generation), returns executionId for polling. Security: Use allowedTools to whitelist which tools can auto-execute.

boolean
allowedTools

Whitelist of tool names that can be auto-executed. Required when autoExecute is true for security. Example: [‘get_weather’, ‘generate_image’]

Array<string>
sessionId

Optional session ID for conversation continuity. Omit to use stateless mode, include to continue an existing session.

string format: uuid
async

Enable async/durable execution mode. When true, returns 202 with pollUrl instead of waiting for completion. Use for long-running inference, client-executed tools, or operations >30 seconds.

boolean
allowedTools

Top-level convenience alias for toolConfig.allowedTools. Whitelists which tools can be auto-executed.

Array<string>
[
"get_weather",
"generate_image"
]
guardrails

AWS Bedrock guardrails configuration for content filtering and safety.

object
guardrailIdentifier

Guardrail identifier from AWS Bedrock

string
guardrailVersion

Guardrail version

string
trace

Enable guardrail trace output

string
Allowed values: enabled disabled

Responses

200

Chat inference completed (buffered response, sync mode)

object
response

Assistant’s response message. May contain text content and/or tool use requests.

object
role
string
Allowed values: assistant
assistant
content

Text response content

string
I'll help you with that.
toolUse
One of:

Single tool request

object
toolUseId
string
abc123
name
string
get_weather
input
object
{
"location": "Sydney"
}
executionId

Present for async tools with autoExecute

string
exec_abc123def456
status

Execution status (pending/running/complete/failed) - present for async tools with autoExecute

string
pending
result

Tool execution result (only present when status=‘complete’ for sync auto-executed tools). For async tools, poll /tools/executions/{executionId}

object
images

Base64 data URIs for images

Array<string>
s3Urls

Signed S3 URLs for downloads

Array<string>
model

Model used for generation

string
amazon.nova-pro-v1:0
requestId

Unique request identifier

string
req-abc123
finishReason

Why the model stopped generating

string
Allowed values: stop length content_filter tool_use
usage

Token usage information

object
inputTokens

Number of input tokens

integer
25
outputTokens

Number of output tokens

integer
150
totalTokens

Total tokens consumed

integer
175
Example
{
"response": {
"role": "assistant",
"content": "The capital of Australia is Canberra."
},
"model": "amazon.nova-lite-v1:0",
"requestId": "req-abc123",
"finishReason": "stop",
"usage": {
"inputTokens": 12,
"outputTokens": 8,
"totalTokens": 20
}
}

202

Async execution started (when async: true in request)

object
requestId
required

Unique request identifier for polling

string
XkdVWiEfSwMEPrw=
sessionId

Session ID for conversation continuity

string
session-1769056496430
status
required

Initial execution status

string
Allowed values: queued
queued
message

Human-readable status message

string
Execution started. Poll the status endpoint for updates.
pollUrl
required

URL to poll for execution status

string
/ai/chat/executions/XkdVWiEfSwMEPrw%3D

500

Failed to perform chat inference