interlocute.ai beta

Chat API Reference

Complete reference for the POST /chat endpoint — request schema, response modes (sync, streaming, async, and scheduled), authentication, threads, attachments, quoted contexts, and error handling.

Overview

The POST /chat endpoint is the primary interface for sending messages to an Interlocute node and receiving responses. It supports four response modes: buffered JSON, real-time streaming (SSE), async (immediate background), and scheduled.

See Routes for all addressing patterns. The examples below use the alias (subdomain) form.

Authentication

The /chat endpoint supports three authentication modes:

Tenant API Key

Pass your API key as a Bearer token: Authorization: Bearer YOUR_API_KEY. Tenant-scoped keys can access any node owned by the tenant. Node-scoped keys are restricted to a single node.

JWT Token

For browser-based or first-party integrations. The JWT is validated by middleware before the request reaches the node.

Anonymous

If the node operator has enabled anonymous chat, no credentials are required. Anonymous access is disabled by default and must be explicitly enabled per node.

Node-scoped API keys add a layer of isolation: even if a key is compromised, it can only access the single node it was issued for. A 403 Forbidden is returned if a node-scoped key is used against a different node. See Auth & Keys for key management details.

Request

Send a JSON body with Content-Type: application/json.

Field Type Required Description
content string Yes The user's message text.
threadId string | null No Omit or pass null to start a new thread. Pass an existing thread ID to continue a conversation.
externalCorrelationId string | null No Client-supplied alias for thread resolution, scoped per node. When provided (and threadId is null), the system looks for an existing thread with this alias. If found it is reused; if not, a new thread is created and stamped with the alias. Enables sticky sessions without persisting Interlocute thread GUIDs on your side. Max 256 chars, alphanumeric + -_.:/.
clientMessageId string | null No Client-supplied identifier for the assistant reply. Useful for hydrating prompt metadata without fetching the full thread.
quotedContexts array | null No Quoted context references scoped to this turn. See Quoted Contexts below.
attachments array | null No File attachments (images, documents) via URL or base64 data URL. See Attachments below.
options object | null No Per-request options: response mode, web search, geolocation, and reasoning effort. See Options below.

Minimal request

{
  "content": "Hello! How can you help me today?"
}

Full request

{
  "content": "Summarize the highlighted paragraph.",
  "threadId": "thr_abc123",
  "clientMessageId": "msg_client_456",
  "externalCorrelationId": "session-user-42",
  "quotedContexts": [
    {
      "sourceMessageId": "msg_789",
      "sourceRole": "assistant",
      "quotedText": "The deployment completed successfully at 14:32 UTC.",
      "sourceTimestamp": "2025-01-15T14:32:00Z"
    }
  ],
  "attachments": [
    {
      "name": "screenshot.png",
      "contentType": "image/png",
      "url": "https://cdn.example.com/screenshots/deploy-result.png"
    }
  ],
  "options": {
    "webSearchEnabled": true,
    "searchContextSize": "medium"
  }
}

Options

The options object provides per-request control over response delivery, model tools, and execution behaviour. All fields are optional and default to off or null.

Response mode

Field Type Default Description
responseMode string | null null Controls how the response is delivered. All non-sync modes return 202 Accepted with an invocationId for polling. See Async modes.
"sync" or null: default synchronous. "async": immediate background processing. "scheduled": process at scheduledAtUtc. "economy": batched, low-cost (coming soon).
scheduledAtUtc string (ISO 8601) | null null UTC time to process the message. Required when responseMode is "Scheduled". Must be in the future and within 7 days. Ignored for other modes.

Web search

When webSearchEnabled is true, the web_search_preview built-in tool is attached to the request. The model may search the web and cite results inline in its response. Only effective when the node's provider supports built-in tools (currently OpenAIResponses).

Field Type Default Description
webSearchEnabled boolean false Enables the web_search_preview tool for this message. The model may search the web and include citations in its response.
searchContextSize string | null "medium" Controls how much web context the model retrieves per search. "low" is faster and cheaper; "high" pulls more results for comprehensive answers. Only applies when webSearchEnabled is true.
useGeolocation boolean false When true, the user_location parameter is set on the web search tool, geo-biasing results toward the user's location. Pair with the userCity/userRegion/userCountry fields for accuracy.
userCity string | null null Approximate city for geo-biasing (e.g., "New York"). Typically derived from the user's browser timezone. Only used when useGeolocation is true.
userRegion string | null null Approximate region or state (e.g., "New York", "California"). Only used when useGeolocation is true.
userCountry string | null null ISO 3166-1 alpha-2 country code (e.g., "US", "GB"). Only used when useGeolocation is true.
Web search availability depends on the node's execution profile. Nodes using providers that don't support built-in tools (such as AzureOpenAI) will silently ignore webSearchEnabled.
// Minimal web search
{
  "content": "What is the current status of the SpaceX Starship program?",
  "options": {
    "webSearchEnabled": true
  }
}

// With geo-biasing (city/region/country from browser timezone)
{
  "content": "What are the best coffee shops near me?",
  "options": {
    "webSearchEnabled": true,
    "searchContextSize": "low",
    "useGeolocation": true,
    "userCity": "Seattle",
    "userRegion": "Washington",
    "userCountry": "US"
  }
}

Quoted Contexts

Quoted contexts let you reference specific content from earlier in the conversation (or from another source) to give the node precise context for the current turn. This is especially useful for "reply to this" or "explain this paragraph" interactions.

Field Type Description
sourceMessageId string The ID of the message being quoted.
sourceRole string Role of the quoted message: user or assistant.
quotedText string The highlighted or selected text being quoted.
sourceTimestamp string (ISO 8601) When the original message was created.
Quoted contexts require the node's prompt configuration to include quoted context support. If the node does not have this capability enabled, the quoted contexts are accepted but may not influence the response.

Attachments

Attachments let you send files (images, documents, etc.) alongside your message. Each attachment supports two input modes — provide exactly one per attachment:

  • url — a public HTTP(S) URL; the server fetches the file for you (easiest for Postman & API clients)
  • dataUrl — inline base64 data URL (what browser JavaScript produces from FileReader)
Field Type Description
name string The file name (e.g., report.pdf).
contentType string MIME type (e.g., image/png, application/pdf). Optional for url mode (inferred from response).
url string Public HTTP(S) URL. Mutually exclusive with dataUrl.
dataUrl string Base64-encoded data URI: data:{contentType};base64,.... Mutually exclusive with url.
sizeBytes integer File size in bytes. Required for dataUrl. Optional for url (server determines).

URL attachment example

{
  "content": "What does this floor plan show?",
  "attachments": [{
    "name": "floorplan.png",
    "contentType": "image/png",
    "url": "https://cdn.example.com/floorplan.png"
  }]
}

Base64 data URL example

{
  "content": "Describe this screenshot.",
  "attachments": [{
    "name": "screenshot.png",
    "contentType": "image/png",
    "dataUrl": "data:image/png;base64,iVBORw0KGgo...",
    "sizeBytes": 24576
  }]
}
Postman / cURL users: use the url field — just paste a public link to your file. No base64 encoding needed. Pre-signed S3/GCS URLs and Azure Blob SAS tokens work too.
For url mode, the server fetches the file synchronously before processing your chat message. Keep files under 10 MB for optimal latency. Supported file types depend on the underlying model's capabilities.

Response modes

The /chat endpoint supports four response modes. Sync and streaming are selected by the Accept header; async modes are selected via options.responseMode.

Buffered JSON (default)

The default mode. The server waits for the complete response, then returns a single JSON object. No special headers are required.

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "Hello!"}'

Streaming (SSE)

Set Accept: text/event-stream to receive tokens as they are generated. The response is a Server-Sent Events stream with the following event sequence:

curl -N -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"content": "Write a haiku about AI."}'
1

[META] event

Sent first. Contains a JSON payload with requestId, nodeId, threadId, inputMessageId, and outputMessageId. The content field is empty in this event.

2

data: token events

Each generated token is sent as a data: line. Concatenate all tokens to build the complete response.

3

[DONE] event

Sent last. Indicates the stream is complete. Close your connection after receiving this event.

!

[ERROR] event

Sent if an error occurs during streaming. Contains a JSON payload with an error field. The stream ends after this event.

Example SSE stream

data: [META]{"requestId":"req_abc","nodeId":"nd_123","threadId":"thr_456","inputMessageId":"msg_in_1","outputMessageId":"msg_out_1","content":""}

data: Silicon
data:  dreams
data:  awake
data: ,
data:  thoughts
data:  bloom
data:  like
data:  spring

data: [DONE]

Async modes (Async, Scheduled, Economy)

All non-sync modes are deferred in protocol terms: the server validates the request, resolves (or creates) the thread, then returns 202 Accepted immediately with an invocationId you can poll for status and output. What differs is when and how processing happens.

Mode When it runs Use case Status
Async Immediately (background worker) Avoid timeouts, fire-and-forget, batch Available
Scheduled At scheduledAtUtc "Send this at 9 AM", delayed campaigns Available
Economy Batched (up to 24 h) Cost-optimized bulk processing (~50% savings) Coming soon

Async — immediate background processing

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Analyze this quarter's revenue trends in detail.",
    "options": { "responseMode": "async" }
  }'

Scheduled — process at a future time

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Good morning! Here is your daily briefing.",
    "options": {
      "responseMode": "scheduled",
      "scheduledAtUtc": "2025-07-16T09:00:00Z"
    }
  }'

202 response (all async modes)

{
  "disposition": "deferred",
  "invocationId": "01JX4K7M2N...",
  "nodeId": "nd_abc123",
  "threadId": "thr_xyz789",
  "status": "queued",
  "responseMode": "scheduled",
  "scheduledAtUtc": "2025-07-16T09:00:00Z",
  "pollUrl": "nodes/nd_abc123/chit/01JX4K7M2N..."
}
The responseMode and scheduledAtUtc fields are echoed in the 202 response. For Async mode, scheduledAtUtc is omitted.

Polling for results

Use the pollUrl from the 202 response to check status. The polling endpoint is GET /chit/{invocationId} — part of the node's discovery surface — so no separate polling URL scheme is needed.

# JSON (default) — full invocation receipt
curl https://my-node.interlocute.ai/chit/01JX4K7M2N... \
  -H "Authorization: Bearer $API_KEY"

# Plain text — single human-readable sentence
curl "https://my-node.interlocute.ai/chit/01JX4K7M2N...?format=plain" \
  -H "Authorization: Bearer $API_KEY"

# Markdown — compact status table
curl "https://my-node.interlocute.ai/chit/01JX4K7M2N...?format=markdown" \
  -H "Authorization: Bearer $API_KEY"

The JSON response is an invocation receipt with status (queued, running, completed, failed), timing fields, and an outputs[] array containing message IDs and thread references once processing completes.

// Completed receipt (abbreviated)
{
  "invocationId": "01JX4K7M2N...",
  "status": "completed",
  "threadId": "thr_xyz789",
  "startedAtUtc": "2025-07-15T10:00:01Z",
  "completedAtUtc": "2025-07-15T10:00:08Z",
  "durationMs": 7200,
  "outputs": [
    { "kind": "userMessage", "outputId": "msg_in_001", "threadId": "thr_xyz789" },
    { "kind": "assistantMessage", "outputId": "msg_out_001", "threadId": "thr_xyz789" }
  ]
}
Poll with a reasonable interval — we recommend starting at 1 second, backing off to 3–5 seconds. Most single-turn requests complete in under 15 seconds. For Scheduled mode, don't start polling until after scheduledAtUtc.

When to use async modes

Non-streaming JSON callers behind proxies → Async

Buffered JSON holds the connection open until the full response is ready. Behind Azure Front Door (240 s timeout) or typical HTTP clients (30–60 s), complex prompts risk a gateway timeout. Async mode returns in <1 second.

Webhook and serverless integrations → Async

Zapier, n8n, and cloud functions often enforce short timeouts. Submit work via Async, then poll or process asynchronously.

Delayed messages and campaigns → Scheduled

"Send this daily briefing at 9 AM", "Process this report Monday morning". Set scheduledAtUtc and the message is held until that time. Maximum 7 days ahead.

Batch and fire-and-forget workflows → Async

Submit multiple requests, collect invocation IDs, and poll for results later — ideal for background processing pipelines.

Cost-optimized bulk work → Economy (coming soon)

Routes to the OpenAI Batch API at ~50% cost reduction. Higher latency (up to 24 h), significantly lower cost. Ideal for offline analysis, bulk summarization, non-urgent work.

Timeout guidance: If you are calling the buffered JSON mode (no streaming) from a server-side integration, consider using Async mode by default. The Interlocute Runtime API sits behind Azure Front Door, which enforces a 240-second idle timeout. Complex prompts with web search, large context windows, or reasoning models can approach this limit. Async mode eliminates the risk entirely.

Streaming (SSE) mode is not affected — it sends tokens continuously, keeping the connection alive.

Response schema

The buffered JSON response (and the [META] event in streaming mode) follows this structure:

Field Type Description
requestId string Server-generated correlation ID for tracing and support.
nodeId string The node that processed this request.
threadId string The thread ID for this conversation. Save this to continue the thread in subsequent requests.
inputMessageId string The ID assigned to your user message.
outputMessageId string The ID assigned to the assistant's reply.
content string The assistant's full response text. Empty in streaming [META] events.
usage object | null Token usage breakdown (when available). Contains inputTokens and outputTokens.

Example response (buffered)

{
  "requestId": "req_a1b2c3d4",
  "nodeId": "nd_abc123",
  "threadId": "thr_xyz789",
  "inputMessageId": "msg_in_001",
  "outputMessageId": "msg_out_001",
  "content": "Hello! I'm your support assistant. I can help you with order lookups, account questions, and troubleshooting.",
  "usage": {
    "inputTokens": 12,
    "outputTokens": 28
  }
}

Thread lifecycle

Threads are the unit of conversation state. Understanding how they work helps you build multi-turn integrations.

New thread

Omit threadId (or pass null). A new thread is created automatically. The response includes the new threadId — save it to continue the conversation.

Continue a thread

Pass an existing threadId. The node resumes the conversation with full history context. The thread must belong to the same tenant and node.

Validation

If the provided threadId doesn't exist, or belongs to a different tenant or node, the request fails with 404 Not Found.

Store the threadId from the first response and pass it in all subsequent messages. This is the standard pattern for multi-turn conversations.

Error responses

Errors are returned in the RFC 7807 Problem Details format:

{
  "type": "https://tools.ietf.org/html/rfc9110#section-15.5.1",
  "title": "Bad Request",
  "detail": "Message is required.",
  "status": 400
}
Status Meaning Common causes
400 Bad Request Missing content/message, empty body, or invalid JSON.
401 Unauthorized Missing or invalid API key / JWT token (and node does not allow anonymous access).
403 Forbidden Node-scoped API key used against a different node than the one it was issued for.
404 Not Found Node not found, chat not enabled on this node, thread not found, or thread tenant/node mismatch.
429 Too Many Requests Prepaid credit balance exhausted. Top up your account to resume usage.
500 Internal Server Error Provider error or unexpected runtime failure. Include the requestId in support requests.
Retry on 5xx errors with exponential backoff. Do not retry 4xx errors without fixing the request. During streaming, errors are delivered as [ERROR] SSE events instead of HTTP status codes (since headers have already been sent).

Credit enforcement

Interlocute uses a prepaid credit model. Before a chat request is processed:

  1. Reserve — credits are estimated and reserved based on message length and expected output.
  2. Execute — the node processes the request.
  3. Finalize — actual token usage is reconciled. Overestimates are refunded; underestimates are adjusted.

If your credit balance is insufficient, the request is rejected with 429 Too Many Requests before any processing occurs. Credits are automatically refunded if the request fails or the client disconnects mid-stream.

Examples

Minimal: new thread

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Hello! What can you do?"
  }'

Continue a thread with streaming

curl -N -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "content": "Tell me more about that.",
    "threadId": "thr_abc123"
  }'

Full request: attachments + quoted contexts

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "What does the highlighted section of this document mean?",
    "threadId": "thr_abc123",
    "clientMessageId": "my-client-id-001",
    "quotedContexts": [
      {
        "sourceMessageId": "msg_prev_reply",
        "sourceRole": "assistant",
        "quotedText": "Revenue grew 15% quarter-over-quarter.",
        "sourceTimestamp": "2025-01-10T09:00:00Z"
      }
    ],
    "attachments": [
      {
        "name": "q4-report.pdf",
        "contentType": "application/pdf",
        "dataUrl": "data:application/pdf;base64,JVBERi0xLjQK...",
        "sizeBytes": 102400
      }
    ]
  }'

Async: submit and poll

# 1. Submit (returns 202 immediately)
RESPONSE=$(curl -s -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Generate a detailed market analysis for Q3.",
    "externalCorrelationId": "batch-job-42",
    "options": { "responseMode": "async" }
  }')

# 2. Extract invocation ID
INVOCATION_ID=$(echo $RESPONSE | jq -r '.invocationId')

# 3. Poll via /chit until completed
curl https://my-node.interlocute.ai/chit/$INVOCATION_ID \
  -H "Authorization: Bearer $API_KEY"

Scheduled: send a message at a future time

curl -X POST https://my-node.interlocute.ai/chat \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Good morning! Here is your daily briefing.",
    "threadId": "thr_abc123",
    "options": {
      "responseMode": "scheduled",
      "scheduledAtUtc": "2025-07-16T09:00:00Z"
    }
  }'

Triggers & automated invocations

The same chat processing pipeline powers scheduled and event-driven triggers. When a trigger fires, it invokes the node's chat capability with a system-generated message. Triggers support several thread modes that control how conversations are organized:

  • New thread per run — each trigger execution creates a fresh thread
  • Singleton per trigger — all executions of a trigger share a single, long-lived thread
  • Fixed thread ID — the trigger always targets a specific, pre-existing thread

Deferred chat, triggers, and synchronous API calls all share the same invocation receipt infrastructure. Every execution — regardless of mode — appears in the invocations log with a source label (api-chat, schedule, event, etc.) and full attribution (API key name, IP, user agent). Use the invocationId or requestId to correlate any execution back to its source.

Which response mode should I use?

Scenario Recommended mode Why
Interactive chat UI (browser) Streaming (SSE) Users see tokens appear in real-time. Connection stays alive via continuous data flow.
Simple server-to-server, fast prompts Buffered JSON Single JSON response, easy to parse. Fine when prompts complete in <30 s.
Server-to-server, complex/long prompts Async Avoids proxy timeouts (Azure Front Door: 240 s). Returns in <1 s, poll for result.
Webhook / Zapier / n8n Async Short platform timeouts. Submit, get ID, process result asynchronously.
Batch processing Async Submit N requests, collect IDs, poll all. No connections held open.
Delayed messages ("send at 9 AM") Scheduled Message is held until scheduledAtUtc. Max 7 days.
Mobile / unreliable network Async Resilient to network drops. Poll when reconnected.
Non-urgent bulk summarization Economy (soon) ~50% cost reduction via OpenAI Batch API. Up to 24 h latency.

Next steps