POST /v1/chat/stream

The primary endpoint. Send a message, get an AI response streamed back as Server-Sent Events. Pass a conversation id to keep the thread alive across requests. Available on all plans.

Endpoint

http

POST https://api.sentientone.ai/v1/chat/stream

Headers

X-Api-KeyYour platform key. See Authentication.
X-Agent-IdUUID of the agent to invoke.
Content-TypeAlways application/json.

Request body

message (required)The user's message. Appended to the conversation history and sent to the LLM along with the agent's system prompt.
conversation_id (optional)Pass an existing id to continue a thread. Omit it and a new conversation is created automatically.
prompt (optional)Override the agent's configured system prompt for this request.

Example request

bash

curl -N -X POST https://api.sentientone.ai/v1/chat/stream \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: sk-so-YOUR_KEY" \
  -H "X-Agent-Id: YOUR_AGENT_ID" \
  -d '{
    "message": "Get the order details for OrderId: ORD-1234"
  }'

The -N flag disables curl's output buffering so each event prints the moment it arrives.

Response (200 OK) — Server-Sent Events

The response is Content-Type: text/event-stream. Read it line-by-line; each event is a JSON payload on its own data: line. The full answer is the concatenation of every delta event's content.

text

data: {"type":"meta","conversation_id":"conv-uuid-...","trace_id":"trace-uuid-..."}

data: {"type":"sources","sources":[{"index":1,"id":"doc-...","title":"Order ORD-1234","url":"https://...","source_type":"order","score":0.91,"snippet":"shipped via FedEx..."}]}

data: {"type":"delta","content":"Order ORD-1234 "}

data: {"type":"delta","content":"is shipped via FedEx "}

data: {"type":"delta","content":"(FX-998877)."}

data: {"type":"done","conversation_id":"conv-uuid-...","trace_id":"trace-uuid-..."}

metaFirst event. Carries the conversation_id and trace_id — store the conversation id now to keep history across requests.
sourcesEmitted when retrieval ran. Each source has index, id, title, url, source_type, score, and snippet.
deltaIncremental token chunk. Concatenate every content value in order to build the full reply text.
doneFinal event. Carries the conversation_id and trace_id; the stream closes after it.
errorEmitted when something goes wrong. Carries an error code and message.

What happens server-side

1
Authenticate
X-Api-Key resolves to your account; rate limits and plan checks fire.
2
Load the agent
X-Agent-Id selects the agent config — system prompt, model, provider key, parameters.
3
Resolve conversation
If you passed conversation_id, history is loaded. Otherwise a new conversation is created.
4
Persist your message
The new user message is saved to the conversation history before any LLM work starts.
5
Call the LLM
System prompt + all prior messages + new message are sent to the provider.
6
Run MCP tools if requested
If the model asks for a tool, the platform runs it and feeds the result back (up to 8 rounds by default — configurable per agent).
7
Stream and save
Tokens stream back as delta events while they generate; the final reply is saved and the stream closes with a done event.

Multi-turn conversations

The first request returns a conversation_id. Pass it back on every subsequent request and the agent sees the full prior thread — same model context window as a single long conversation.

bash

# 1) Start a thread
curl -N -X POST https://api.sentientone.ai/v1/chat/stream \
  -H "X-Api-Key: sk-so-..." -H "X-Agent-Id: ..." \
  -H "Content-Type: application/json" \
  -d '{ "message": "Hi, my order ORD-1234 hasn't arrived." }'
# → data: {"type":"meta","conversation_id":"conv-abc123","trace_id":"..."}
#   data: {"type":"delta","content":"..."}  (repeated)
#   data: {"type":"done","conversation_id":"conv-abc123","trace_id":"..."}

# 2) Continue it
curl -N -X POST https://api.sentientone.ai/v1/chat/stream \
  -H "X-Api-Key: sk-so-..." -H "X-Agent-Id: ..." \
  -H "Content-Type: application/json" \
  -d '{ "conversation_id": "conv-abc123", "message": "What carrier is it on?" }'

Working with the stream

Every reply arrives as Server-Sent Events — there is no single-JSON mode. See Streaming for the full event reference and a client read loop, including how to concatenate delta content into the complete answer.