Chat API Reference
Complete reference for the POST /chat endpoint — request schema, response modes (sync, streaming, async, and scheduled), authentication, threads, attachments, quoted contexts, and error handling.
Overview
The POST /chat endpoint is the primary interface for sending messages to an Interlocute node
and receiving responses. It supports four response modes: buffered JSON, real-time streaming (SSE), async (immediate background), and scheduled.
See Routes for all addressing patterns. The examples below use the alias (subdomain) form.
Authentication
The /chat endpoint supports three authentication modes:
Tenant API Key
Pass your API key as a Bearer token: Authorization: Bearer YOUR_API_KEY.
Tenant-scoped keys can access any node owned by the tenant. Node-scoped keys are restricted to a single node.
JWT Token
For browser-based or first-party integrations. The JWT is validated by middleware before the request reaches the node.
Anonymous
If the node operator has enabled anonymous chat, no credentials are required. Anonymous access is disabled by default and must be explicitly enabled per node.
403 Forbidden is returned if a node-scoped key is used against a different node.
See Auth & Keys for key management details.
Request
Send a JSON body with Content-Type: application/json.
| Field | Type | Required | Description |
|---|---|---|---|
| content | string | Yes | The user's message text. |
| threadId | string | null | No | Omit or pass null to start a new thread. Pass an existing thread ID to continue a conversation. |
| externalCorrelationId | string | null | No | Client-supplied alias for thread resolution, scoped per node. When provided (and threadId is null), the system looks for an existing thread with this alias. If found it is reused; if not, a new thread is created and stamped with the alias. Enables sticky sessions without persisting Interlocute thread GUIDs on your side. Max 256 chars, alphanumeric + -_.:/. |
| clientMessageId | string | null | No | Client-supplied identifier for the assistant reply. Useful for hydrating prompt metadata without fetching the full thread. |
| quotedContexts | array | null | No | Quoted context references scoped to this turn. See Quoted Contexts below. |
| attachments | array | null | No | File attachments (images, documents) via URL or base64 data URL. See Attachments below. |
| options | object | null | No | Per-request options: response mode, web search, geolocation, and reasoning effort. See Options below. |
Minimal request
{
"content": "Hello! How can you help me today?"
}Full request
{
"content": "Summarize the highlighted paragraph.",
"threadId": "thr_abc123",
"clientMessageId": "msg_client_456",
"externalCorrelationId": "session-user-42",
"quotedContexts": [
{
"sourceMessageId": "msg_789",
"sourceRole": "assistant",
"quotedText": "The deployment completed successfully at 14:32 UTC.",
"sourceTimestamp": "2025-01-15T14:32:00Z"
}
],
"attachments": [
{
"name": "screenshot.png",
"contentType": "image/png",
"url": "https://cdn.example.com/screenshots/deploy-result.png"
}
],
"options": {
"webSearchEnabled": true,
"searchContextSize": "medium"
}
}Options
The options object provides per-request control over response delivery, model tools, and execution behaviour.
All fields are optional and default to off or null.
Response mode
| Field | Type | Default | Description |
|---|---|---|---|
| responseMode | string | null | null |
Controls how the response is delivered. All non-sync modes return 202 Accepted with an invocationId for polling.
See Async modes.
"sync" or null: default synchronous.
"async": immediate background processing.
"scheduled": process at scheduledAtUtc.
"economy": batched, low-cost (coming soon).
|
| scheduledAtUtc | string (ISO 8601) | null | null |
UTC time to process the message. Required when responseMode is "Scheduled".
Must be in the future and within 7 days. Ignored for other modes.
|
Web search
When webSearchEnabled is true,
the web_search_preview built-in tool is attached to the request.
The model may search the web and cite results inline in its response.
Only effective when the node's provider supports built-in tools
(currently OpenAIResponses).
| Field | Type | Default | Description |
|---|---|---|---|
| webSearchEnabled | boolean | false |
Enables the web_search_preview tool for this message. The model may search the web and include citations in its response. |
| searchContextSize | string | null | "medium" |
Controls how much web context the model retrieves per search. "low" is faster and cheaper; "high" pulls more results for comprehensive answers. Only applies when webSearchEnabled is true. |
| useGeolocation | boolean | false |
When true, the user_location parameter is set on the web search tool, geo-biasing results toward the user's location. Pair with the userCity/userRegion/userCountry fields for accuracy. |
| userCity | string | null | null | Approximate city for geo-biasing (e.g., "New York"). Typically derived from the user's browser timezone. Only used when useGeolocation is true. |
| userRegion | string | null | null | Approximate region or state (e.g., "New York", "California"). Only used when useGeolocation is true. |
| userCountry | string | null | null | ISO 3166-1 alpha-2 country code (e.g., "US", "GB"). Only used when useGeolocation is true. |
AzureOpenAI) will silently ignore webSearchEnabled.
// Minimal web search
{
"content": "What is the current status of the SpaceX Starship program?",
"options": {
"webSearchEnabled": true
}
}
// With geo-biasing (city/region/country from browser timezone)
{
"content": "What are the best coffee shops near me?",
"options": {
"webSearchEnabled": true,
"searchContextSize": "low",
"useGeolocation": true,
"userCity": "Seattle",
"userRegion": "Washington",
"userCountry": "US"
}
}Quoted Contexts
Quoted contexts let you reference specific content from earlier in the conversation (or from another source) to give the node precise context for the current turn. This is especially useful for "reply to this" or "explain this paragraph" interactions.
| Field | Type | Description |
|---|---|---|
| sourceMessageId | string | The ID of the message being quoted. |
| sourceRole | string | Role of the quoted message: user or assistant. |
| quotedText | string | The highlighted or selected text being quoted. |
| sourceTimestamp | string (ISO 8601) | When the original message was created. |
Attachments
Attachments let you send files (images, documents, etc.) alongside your message. Each attachment supports two input modes — provide exactly one per attachment:
url— a public HTTP(S) URL; the server fetches the file for you (easiest for Postman & API clients)dataUrl— inline base64 data URL (what browser JavaScript produces from FileReader)
| Field | Type | Description |
|---|---|---|
| name | string | The file name (e.g., report.pdf). |
| contentType | string | MIME type (e.g., image/png, application/pdf). Optional for url mode (inferred from response). |
| url | string | Public HTTP(S) URL. Mutually exclusive with dataUrl. |
| dataUrl | string | Base64-encoded data URI: data:{contentType};base64,.... Mutually exclusive with url. |
| sizeBytes | integer | File size in bytes. Required for dataUrl. Optional for url (server determines). |
URL attachment example
{
"content": "What does this floor plan show?",
"attachments": [{
"name": "floorplan.png",
"contentType": "image/png",
"url": "https://cdn.example.com/floorplan.png"
}]
}Base64 data URL example
{
"content": "Describe this screenshot.",
"attachments": [{
"name": "screenshot.png",
"contentType": "image/png",
"dataUrl": "data:image/png;base64,iVBORw0KGgo...",
"sizeBytes": 24576
}]
}url field — just paste a public link to your file. No base64 encoding needed.
Pre-signed S3/GCS URLs and Azure Blob SAS tokens work too.
url mode, the server fetches the file synchronously before processing your chat message.
Keep files under 10 MB for optimal latency. Supported file types depend on the underlying model's capabilities.
Response modes
The /chat endpoint supports four response modes.
Sync and streaming are selected by the Accept header;
async modes are selected via options.responseMode.
Buffered JSON (default)
The default mode. The server waits for the complete response, then returns a single JSON object. No special headers are required.
curl -X POST https://my-node.interlocute.ai/chat \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"content": "Hello!"}'Streaming (SSE)
Set Accept: text/event-stream to receive tokens as they are generated.
The response is a Server-Sent Events stream with the following event sequence:
curl -N -X POST https://my-node.interlocute.ai/chat \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"content": "Write a haiku about AI."}'1
[META] event
Sent first. Contains a JSON payload with requestId,
nodeId, threadId,
inputMessageId, and outputMessageId.
The content field is empty in this event.
2
data: token events
Each generated token is sent as a data: line. Concatenate all tokens
to build the complete response.
3
[DONE] event
Sent last. Indicates the stream is complete. Close your connection after receiving this event.
!
[ERROR] event
Sent if an error occurs during streaming. Contains a JSON payload with an error field.
The stream ends after this event.
Example SSE stream
data: [META]{"requestId":"req_abc","nodeId":"nd_123","threadId":"thr_456","inputMessageId":"msg_in_1","outputMessageId":"msg_out_1","content":""}
data: Silicon
data: dreams
data: awake
data: ,
data: thoughts
data: bloom
data: like
data: spring
data: [DONE]Async modes (Async, Scheduled, Economy)
All non-sync modes are deferred in protocol terms: the server validates the request, resolves (or creates) the thread,
then returns 202 Accepted immediately with an invocationId
you can poll for status and output. What differs is when and how processing happens.
| Mode | When it runs | Use case | Status |
|---|---|---|---|
| Async | Immediately (background worker) | Avoid timeouts, fire-and-forget, batch | Available |
| Scheduled | At scheduledAtUtc |
"Send this at 9 AM", delayed campaigns | Available |
| Economy | Batched (up to 24 h) | Cost-optimized bulk processing (~50% savings) | Coming soon |
Async — immediate background processing
curl -X POST https://my-node.interlocute.ai/chat \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Analyze this quarter's revenue trends in detail.",
"options": { "responseMode": "async" }
}'Scheduled — process at a future time
curl -X POST https://my-node.interlocute.ai/chat \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Good morning! Here is your daily briefing.",
"options": {
"responseMode": "scheduled",
"scheduledAtUtc": "2025-07-16T09:00:00Z"
}
}'202 response (all async modes)
{
"disposition": "deferred",
"invocationId": "01JX4K7M2N...",
"nodeId": "nd_abc123",
"threadId": "thr_xyz789",
"status": "queued",
"responseMode": "scheduled",
"scheduledAtUtc": "2025-07-16T09:00:00Z",
"pollUrl": "nodes/nd_abc123/chit/01JX4K7M2N..."
}responseMode and scheduledAtUtc fields
are echoed in the 202 response. For Async mode,
scheduledAtUtc is omitted.
Polling for results
Use the pollUrl from the 202 response to check status.
The polling endpoint is GET /chit/{invocationId} — part of the node's
discovery surface — so no separate polling URL scheme is needed.
# JSON (default) — full invocation receipt
curl https://my-node.interlocute.ai/chit/01JX4K7M2N... \
-H "Authorization: Bearer $API_KEY"
# Plain text — single human-readable sentence
curl "https://my-node.interlocute.ai/chit/01JX4K7M2N...?format=plain" \
-H "Authorization: Bearer $API_KEY"
# Markdown — compact status table
curl "https://my-node.interlocute.ai/chit/01JX4K7M2N...?format=markdown" \
-H "Authorization: Bearer $API_KEY"
The JSON response is an invocation receipt with status
(queued, running,
completed, failed),
timing fields, and an outputs[] array containing
message IDs and thread references once processing completes.
// Completed receipt (abbreviated)
{
"invocationId": "01JX4K7M2N...",
"status": "completed",
"threadId": "thr_xyz789",
"startedAtUtc": "2025-07-15T10:00:01Z",
"completedAtUtc": "2025-07-15T10:00:08Z",
"durationMs": 7200,
"outputs": [
{ "kind": "userMessage", "outputId": "msg_in_001", "threadId": "thr_xyz789" },
{ "kind": "assistantMessage", "outputId": "msg_out_001", "threadId": "thr_xyz789" }
]
}Scheduled mode, don't start polling until after scheduledAtUtc.
When to use async modes
Non-streaming JSON callers behind proxies → Async
Buffered JSON holds the connection open until the full response is ready. Behind Azure Front Door (240 s timeout) or typical HTTP clients (30–60 s), complex prompts risk a gateway timeout. Async mode returns in <1 second.
Webhook and serverless integrations → Async
Zapier, n8n, and cloud functions often enforce short timeouts. Submit work via Async, then poll or process asynchronously.
Delayed messages and campaigns → Scheduled
"Send this daily briefing at 9 AM", "Process this report Monday morning".
Set scheduledAtUtc and the message is held until that time.
Maximum 7 days ahead.
Batch and fire-and-forget workflows → Async
Submit multiple requests, collect invocation IDs, and poll for results later — ideal for background processing pipelines.
Cost-optimized bulk work → Economy (coming soon)
Routes to the OpenAI Batch API at ~50% cost reduction. Higher latency (up to 24 h), significantly lower cost. Ideal for offline analysis, bulk summarization, non-urgent work.
Async mode by default. The Interlocute Runtime API sits behind Azure Front Door,
which enforces a 240-second idle timeout. Complex prompts with web search, large context windows, or reasoning
models can approach this limit. Async mode eliminates the risk entirely.
Streaming (SSE) mode is not affected — it sends tokens continuously, keeping the connection alive.
Response schema
The buffered JSON response (and the [META] event in streaming mode) follows this structure:
| Field | Type | Description |
|---|---|---|
| requestId | string | Server-generated correlation ID for tracing and support. |
| nodeId | string | The node that processed this request. |
| threadId | string | The thread ID for this conversation. Save this to continue the thread in subsequent requests. |
| inputMessageId | string | The ID assigned to your user message. |
| outputMessageId | string | The ID assigned to the assistant's reply. |
| content | string | The assistant's full response text. Empty in streaming [META] events. |
| usage | object | null | Token usage breakdown (when available). Contains inputTokens and outputTokens. |
Example response (buffered)
{
"requestId": "req_a1b2c3d4",
"nodeId": "nd_abc123",
"threadId": "thr_xyz789",
"inputMessageId": "msg_in_001",
"outputMessageId": "msg_out_001",
"content": "Hello! I'm your support assistant. I can help you with order lookups, account questions, and troubleshooting.",
"usage": {
"inputTokens": 12,
"outputTokens": 28
}
}Thread lifecycle
Threads are the unit of conversation state. Understanding how they work helps you build multi-turn integrations.
New thread
Omit threadId (or pass null).
A new thread is created automatically. The response includes the new threadId —
save it to continue the conversation.
Continue a thread
Pass an existing threadId. The node resumes the conversation with full
history context. The thread must belong to the same tenant and node.
Validation
If the provided threadId doesn't exist, or belongs to a different
tenant or node, the request fails with 404 Not Found.
threadId from the first response and pass it in all subsequent messages.
This is the standard pattern for multi-turn conversations.
Error responses
Errors are returned in the RFC 7807 Problem Details format:
{
"type": "https://tools.ietf.org/html/rfc9110#section-15.5.1",
"title": "Bad Request",
"detail": "Message is required.",
"status": 400
}| Status | Meaning | Common causes |
|---|---|---|
| 400 | Bad Request | Missing content/message, empty body, or invalid JSON. |
| 401 | Unauthorized | Missing or invalid API key / JWT token (and node does not allow anonymous access). |
| 403 | Forbidden | Node-scoped API key used against a different node than the one it was issued for. |
| 404 | Not Found | Node not found, chat not enabled on this node, thread not found, or thread tenant/node mismatch. |
| 429 | Too Many Requests | Prepaid credit balance exhausted. Top up your account to resume usage. |
| 500 | Internal Server Error | Provider error or unexpected runtime failure. Include the requestId in support requests. |
5xx errors with exponential backoff. Do not retry 4xx errors without fixing the request.
During streaming, errors are delivered as [ERROR] SSE events instead of HTTP status codes (since headers have already been sent).
Credit enforcement
Interlocute uses a prepaid credit model. Before a chat request is processed:
- Reserve — credits are estimated and reserved based on message length and expected output.
- Execute — the node processes the request.
- Finalize — actual token usage is reconciled. Overestimates are refunded; underestimates are adjusted.
If your credit balance is insufficient, the request is rejected with 429 Too Many Requests
before any processing occurs. Credits are automatically refunded if the request fails or the client disconnects mid-stream.
Examples
Minimal: new thread
curl -X POST https://my-node.interlocute.ai/chat \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Hello! What can you do?"
}'Continue a thread with streaming
curl -N -X POST https://my-node.interlocute.ai/chat \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"content": "Tell me more about that.",
"threadId": "thr_abc123"
}'Full request: attachments + quoted contexts
curl -X POST https://my-node.interlocute.ai/chat \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "What does the highlighted section of this document mean?",
"threadId": "thr_abc123",
"clientMessageId": "my-client-id-001",
"quotedContexts": [
{
"sourceMessageId": "msg_prev_reply",
"sourceRole": "assistant",
"quotedText": "Revenue grew 15% quarter-over-quarter.",
"sourceTimestamp": "2025-01-10T09:00:00Z"
}
],
"attachments": [
{
"name": "q4-report.pdf",
"contentType": "application/pdf",
"dataUrl": "data:application/pdf;base64,JVBERi0xLjQK...",
"sizeBytes": 102400
}
]
}'Async: submit and poll
# 1. Submit (returns 202 immediately)
RESPONSE=$(curl -s -X POST https://my-node.interlocute.ai/chat \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Generate a detailed market analysis for Q3.",
"externalCorrelationId": "batch-job-42",
"options": { "responseMode": "async" }
}')
# 2. Extract invocation ID
INVOCATION_ID=$(echo $RESPONSE | jq -r '.invocationId')
# 3. Poll via /chit until completed
curl https://my-node.interlocute.ai/chit/$INVOCATION_ID \
-H "Authorization: Bearer $API_KEY"Scheduled: send a message at a future time
curl -X POST https://my-node.interlocute.ai/chat \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Good morning! Here is your daily briefing.",
"threadId": "thr_abc123",
"options": {
"responseMode": "scheduled",
"scheduledAtUtc": "2025-07-16T09:00:00Z"
}
}'Triggers & automated invocations
The same chat processing pipeline powers scheduled and event-driven triggers. When a trigger fires, it invokes the node's chat capability with a system-generated message. Triggers support several thread modes that control how conversations are organized:
- New thread per run — each trigger execution creates a fresh thread
- Singleton per trigger — all executions of a trigger share a single, long-lived thread
- Fixed thread ID — the trigger always targets a specific, pre-existing thread
Deferred chat, triggers, and synchronous API calls all share the same invocation receipt infrastructure.
Every execution — regardless of mode — appears in the invocations log with a source
label (api-chat, schedule,
event, etc.) and full attribution (API key name, IP, user agent).
Use the invocationId or requestId
to correlate any execution back to its source.
Which response mode should I use?
| Scenario | Recommended mode | Why |
|---|---|---|
| Interactive chat UI (browser) | Streaming (SSE) | Users see tokens appear in real-time. Connection stays alive via continuous data flow. |
| Simple server-to-server, fast prompts | Buffered JSON | Single JSON response, easy to parse. Fine when prompts complete in <30 s. |
| Server-to-server, complex/long prompts | Async | Avoids proxy timeouts (Azure Front Door: 240 s). Returns in <1 s, poll for result. |
| Webhook / Zapier / n8n | Async | Short platform timeouts. Submit, get ID, process result asynchronously. |
| Batch processing | Async | Submit N requests, collect IDs, poll all. No connections held open. |
| Delayed messages ("send at 9 AM") | Scheduled | Message is held until scheduledAtUtc. Max 7 days. |
| Mobile / unreliable network | Async | Resilient to network drops. Poll when reconnected. |
| Non-urgent bulk summarization | Economy (soon) | ~50% cost reduction via OpenAI Batch API. Up to 24 h latency. |
Next steps
- Chit API Reference — deterministic node information sheets
- API Examples — copy-paste starters in cURL, C#, and JavaScript
- Auth & Keys — credential setup and key scoping
- Triggers — scheduled and event-driven execution