API Documentation

SentientOne AI Platform

Send a single POST request with your platform API key, agent ID header, and message — get an AI-powered response from a pre-configured agent. No LLM setup needed in your code. Agents are created and managed through the SentientOne AI platform.

POSThttps://api.sentientone.ai/v1/chat

X-Api-Key:sk-so-your_platform_key

X-Agent-Id:your-agent-uuid

Overview

SentientOne AI is an agent orchestration platform. Instead of hardcoding LLM calls in your application, you configure agents through the SentientOne platform — each with its own provider, model, system prompt, and parameters. Your application then calls a single POST endpoint with two headers (your API key and the agent ID) plus a message body, and gets an AI response back.

Why SentientOne — request flow architecture

Configure Agents

Create specialized agents in the platform UI with system prompts, model selection, and MCP tool definitions.

One Endpoint

Call POST /v1/chat on api.sentientone.ai with X-Api-Key and X-Agent-Id headers from any language or platform.

Streaming Support

Get responses as standard JSON or real-time Server-Sent Events (SSE) for token-by-token streaming.

How It Works

Create an Agent in the SentientOne Platform

Log into SentientOne AI and create an agent. Configure its system prompt, choose the LLM provider (OpenAI/Anthropic), model, temperature, and provide the provider API key. Connect MCP servers so the agent can call your tools. Each agent gets a unique Agent ID — copy it from the Agents page.

Get Your Platform API Key

Your SentientOne platform key (sk-so-…) is found in Settings. Send it in every request as the X-Api-Key header. This is separate from the LLM provider key — the platform key authenticates your application, while each agent's LLM key is stored server-side and never exposed.

Call POST /v1/chat on api.sentientone.ai

From any HTTP client — send a POST request with X-Api-Key (your platform key) and X-Agent-Id (the agent UUID) headers, plus a JSON body containing message. The platform loads the agent config, injects the system prompt, runs any MCP tool calls, manages conversation history, and returns the full response.

Why Configure Agents?

Agents are the core abstraction that makes SentientOne AI powerful. Instead of embedding LLM configuration in your application code, you define specialized agents — each tailored for a specific task. Here's why this matters:

1. Separation of Concerns

Your application code stays clean — just a POST call. All LLM-specific logic (system prompt, model choice, temperature, API keys) lives in the agent configuration. Change the model from GPT-4o to Claude without modifying a single line of application code.

2. Task-Specific Specialization

Each agent has a focused system prompt. An "Order Agent" knows how to query order details via MCP tools. A "Product Agent" understands product catalogs. A "Support Agent" handles customer inquiries. The system prompt constrains the LLM to excel at one specific domain.

3. MCP Integration Ready

If your company exposes APIs through MCP (Model Context Protocol), you configure agents whose system prompts instruct the LLM to use those MCP tools. The agent becomes the bridge between your MCP server and any application that needs AI-powered access to your data.

4. Per-Agent API Keys & Models

Each agent carries its own LLM API key and model configuration. Use GPT-4o for complex reasoning tasks and Claude Haiku for fast classification. Different departments can use different keys for cost tracking.

5. Zero LLM Code in Your App

No OpenAI SDK, no prompt engineering, no token management in your codebase. Your app sends a message and gets a response — the platform handles everything else. This means faster development, easier testing, and no vendor lock-in at the application layer.

Authentication

All API requests require two headers: your platform API key and the agent ID. The platform key authenticates your application; the agent ID selects which configured agent handles the request.

Required Headers

X-Api-Keystringrequired

Your SentientOne platform API key (sk-so-…). Find this in Settings → API Key. Authenticates your application with the platform.

X-Agent-Idstringrequired

The UUID of the agent to invoke. Copy this from the Agents page — it's shown prominently below the agent name.

Content-Typestringrequired

Must be application/json.

bash

curl -X POST https://api.sentientone.ai/v1/chat \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: sk-so-your_api_key_here" \
  -H "X-Agent-Id: a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
  -d '{ "message": "Hello!" }'

Keep your API key secure

Never expose your API key in client-side code, public repositories, or browser network requests. Call the SentientOne API from your backend server and proxy responses to your frontend.

Header Reference

Header	Example Value	Where to find it
`X-Api-Key`	sk-so-abc123…	Settings → API Key
`X-Agent-Id`	a1b2c3d4-…	Agents page → Agent ID chip
`Content-Type`	application/json	Always required

Chat API

The primary endpoint for all agent interactions

This is the only endpoint your application needs to call. Send a message to any configured agent and receive an AI-powered response. The platform handles system prompt injection, conversation history, LLM routing, and response storage.

POSThttps://api.sentientone.ai/v1/chat

POST /v1/chat — Standard Response

Send a message and receive the complete AI response in a single JSON payload. Agent ID is passed as a header — no agent_id in the body.

bash

curl -X POST https://api.sentientone.ai/v1/chat \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: sk-so-your_api_key_here" \
  -H "X-Agent-Id: a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
  -d '{
    "message": "Get the order details for OrderId: ORD-1234 and UserId: USR-5678"
  }'

Request Headers

X-Api-Keystringrequired

Your SentientOne platform API key (sk-so-…). From Settings → API Key.

X-Agent-Idstringrequired

The UUID of the agent to invoke. From Agents page → Agent ID.

Content-Typestringrequired

Must be application/json.

Request Body

messagestringrequired

The user's message or prompt. Appended to the conversation history and sent to the LLM along with the agent's system prompt.

conversation_idstring

Optional. Pass an existing conversation ID to continue a multi-turn conversation with full history context. If omitted, a new conversation is created automatically.

Response (200 OK):

json

{
  "conversation_id": "conv-uuid-...",
  "message": {
    "id": "msg-uuid-...",
    "conversation_id": "conv-uuid-...",
    "role": "assistant",
    "content": "The order ORD-1234 is currently shipped via FedEx (tracking FX-998877)...",
    "token_count": 156,
    "created_at": "2026-03-28T14:30:00Z"
  },
  "tool_calls": [
    {
      "name": "get_order",
      "input": { "order_id": "ORD-1234", "user_id": "USR-5678" },
      "output": "{\"status\":\"shipped\",\"carrier\":\"FedEx\",\"tracking\":\"FX-998877\"}"
    }
  ]
}

Response Fields

conversation_idstring

The conversation this message belongs to. Store this to continue the conversation in follow-up requests.

message.idstring

Unique ID of the saved assistant message.

message.rolestring

Always "assistant" for responses.

message.contentstring

The AI response text. Format depends on the agent's output_type (text, json, markdown, or code).

message.token_countnumber

Total tokens used across all LLM calls in this request (including tool-use rounds).

tool_callsarray

Only present if the agent executed MCP tools. Each entry has name, input, and output.

What happens when you call this endpoint

X-Api-Key is validated — resolves to your account.
The agent is loaded via X-Agent-Id — its system prompt, model, provider, LLM API key, and parameters.
If no conversation_id is provided in the body, a new conversation is created.
Your message is saved to the conversation history.
The full history (system prompt + all prior messages + new message) is sent to the LLM.
If the LLM requests MCP tool calls, they are executed and results fed back (up to 8 rounds).
The final response is saved to the conversation and returned with an optional tool_calls array.

Streamable HTTP

For real-time token-by-token responses, use the Streamable HTTP endpoint. Send the same headers as the standard endpoint and include Accept: text/event-stream — the server responds with a Server-Sent Events stream. Ideal for chat UIs that want to display text as it generates, or for monitoring MCP tool execution in real time.

POST/v1/chat/streamReturns text/event-stream

bash

curl -N -X POST https://api.sentientone.ai/v1/chat/stream \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -H "X-Api-Key: sk-so-your_api_key_here" \
  -H "X-Agent-Id: a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
  -d '{ "message": "Get the order details for OrderId: ORD-1234" }'

Event stream format:

text

data: {"type":"meta","conversation_id":"conv-uuid-..."}

data: {"type":"tool_call","id":"tc-1","name":"get_order","input":{"order_id":"ORD-1234"}}
data: {"type":"tool_result","id":"tc-1","name":"get_order","output":"{\"status\":\"shipped\"}"}

data: {"type":"delta","content":"The order ORD-1234 is"}
data: {"type":"delta","content":" currently shipped via FedEx"}
...
data: {"type":"done","message":{"id":"msg-uuid","role":"assistant","content":"...","token_count":156}}

Event Types

metaevent

First event. Contains conversation_id for this session. Store it for multi-turn follow-ups.

tool_callevent

Emitted when the LLM decides to call an MCP tool. Contains id, name, and input.

tool_resultevent

Emitted after the tool executes. Contains id, name, and output. Matches the preceding tool_call by id.

deltaevent

Streamed text chunk. Concatenate all content values to build the full response.

doneevent

Final event. Contains the complete saved message object including token_count.

errorevent

Emitted on failure. Contains code and message fields matching the standard error envelope.

JavaScript — reading the stream

javascript

const res = await fetch("https://api.sentientone.ai/v1/chat/stream", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Accept": "text/event-stream",
    "X-Api-Key": "sk-so-your_api_key_here",
    "X-Agent-Id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  },
  body: JSON.stringify({ message: "Get the order details for OrderId: ORD-1234" }),
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let fullContent = "";

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop() ?? "";
  for (const line of lines) {
    if (!line.startsWith("data: ")) continue;
    const event = JSON.parse(line.slice(6));
    if (event.type === "meta") console.log("conversation:", event.conversation_id);
    if (event.type === "tool_call") console.log("Calling tool:", event.name, event.input);
    if (event.type === "tool_result") console.log("Tool result:", event.output);
    if (event.type === "delta") { fullContent += event.content; process.stdout.write(event.content); }
    if (event.type === "done") console.log("\nTokens used:", event.message.token_count);
  }
}

Multi-Turn Conversations

The platform maintains full conversation history. To continue a conversation, pass the conversation_id from a previous response.

bash

# First message — starts a new conversation
curl -X POST https://api.sentientone.ai/v1/chat \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: sk-so-your_api_key_here" \
  -H "X-Agent-Id: ORDER_AGENT_UUID" \
  -d '{ "message": "Look up order ORD-1234 for user USR-5678" }'
# Response: { "conversation_id": "conv-abc-123", "message": { ... } }

# Follow-up — continues the same conversation with full context
curl -X POST https://api.sentientone.ai/v1/chat \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: sk-so-your_api_key_here" \
  -H "X-Agent-Id: ORDER_AGENT_UUID" \
  -d '{ "message": "What is the delivery ETA for that order?", "conversation_id": "conv-abc-123" }'

The agent sees the full conversation history, so it knows "that order" refers to ORD-1234 without you needing to repeat it. This enables natural, contextual follow-up queries.

Error Code Reference

All error responses share a consistent JSON envelope. Every error includes a machine-readable code field so you can handle each failure case precisely, independent of HTTP status.

Standard (JSON) error envelope

json

{
  "error": {
    "code": "ERROR_CODE",
    "message": "Human-readable description"
  }
}

Streaming errors arrive as SSE events

text

data: {"type": "error", "code": "ERROR_CODE", "message": "..."}

Authentication Errors

MISSING_API_KEY401AuthNot retryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

The X-Api-Key header was not included in the request.

Resolution

bash

X-Api-Key: sk-so-xxxxxxxxxxxxxxxx

INVALID_API_KEY401AuthNot retryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

The provided API key does not match any active user account. The key may be mistyped or truncated, the account may be deactivated, or the key was rotated.

Resolution

Double-check the key value in your .env or secrets store. Log in to the dashboard and verify your API key under Settings → API Key. Generate a new key if the old one was rotated or revoked.

Plan & Quota Errors

QUOTA_EXCEEDED429QuotaNot retryable

Endpoints: /v1/chat, /v1/chat/streamExtra fields: used, limit, upgrade_url

Why it happens

Your account has consumed all of its monthly API call quota.

Resolution

Wait for the quota to reset at the start of your next billing cycle, or upgrade your plan at the URL provided in upgrade_url.

RATE_LIMIT_EXCEEDED429Rate LimitRetryable

Endpoints: /v1/chat, /v1/chat/stream (shared counter)Extra fields: retry_after

Why it happens

The agent has exceeded its per-minute request rate limit (rate_limit_rpm). Both /v1/chat and /v1/chat/stream share the same per-agent counter. Starter plans default to 30 req/min, Pro plans to 50 req/min.

Resolution

Implement exponential backoff and retry after the number of seconds indicated in the X-RateLimit-Reset response header. Reduce request concurrency and cache responses where possible. See the Rate Limiting section for full details.

SUBSCRIPTION_CANCELED403PlanNot retryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

The account owner's subscription has been canceled. API access is disabled until the subscription is reactivated.

Resolution

Reactivate the subscription from the Settings → Billing page in the dashboard, or contact support if you believe this is an error.

PLAN_RESTRICTED403PlanNot retryable

Endpoints: /v1/chat/streamExtra fields: upgrade_url

Why it happens

The streaming endpoint requires a Pro or Enterprise plan. Your account is on the Starter plan.

Resolution

Upgrade your plan at the URL provided in upgrade_url, or use the non-streaming /v1/chat endpoint which is available on all plans.

Request Validation Errors

MISSING_AGENT_ID400ValidationNot retryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

No agent ID was provided. The API requires an X-Agent-Id request header.

Resolution

Pass the agent UUID in the request header:

bash

X-Agent-Id: 736c5015-f67e-4fe5-9932-590d9048b014

MISSING_MESSAGE400ValidationNot retryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

The message field in the request body was empty or missing.

Resolution

json

{ "message": "Your question here" }

VALIDATION_ERROR422ValidationNot retryable

Endpoints: AllExtra fields: details (array of field-level errors)

Why it happens

The request body failed schema validation. Common causes: sending a non-JSON Content-Type, wrong field types (e.g. a number where a string is expected), or a completely empty body when fields are required.

Resolution

Check the details array in the error response — it lists each invalid field and the reason. Ensure you are sending Content-Type: application/json.

Agent Errors

AGENT_NOT_FOUND404AgentNot retryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

No agent exists with the provided ID in the database.

Resolution

Verify the agent UUID in the dashboard under Agents. Ensure you are targeting the correct environment — staging and production have separate databases and separate agent IDs.

AGENT_INACTIVE403AgentNot retryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

The agent exists but has been disabled (is_active = false). An is_active check is always performed against the database on every request — even if the agent config is otherwise cached.

Resolution

Re-enable the agent in the dashboard under Agents → Settings → Publish. If you believe this is an error, check whether the agent was deactivated by an admin or automated process.

LLM Provider Errors

These errors originate from the underlying AI provider (Anthropic or OpenAI) configured on the agent, not from the SentientOne platform itself.

LLM_AUTH_ERROR502LLM ProviderNot retryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

The API key stored on the agent for its AI provider (Anthropic/OpenAI) is invalid or has been revoked.

Resolution

Go to Agents → Settings → AI Provider in the dashboard and update the provider API key with a valid one from your Anthropic or OpenAI account. Contact support if you did not change the key.

LLM_RATE_LIMITED429LLM ProviderRetryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

The AI provider is rate-limiting requests from the agent's API key. This is a provider-level limit, entirely separate from the Infonex API rate limit.

Resolution

Retry after a short delay — provider limits typically reset within seconds to minutes. If this happens frequently, request a rate limit increase from your AI provider account, or distribute load across multiple agent configurations with different provider keys.

LLM_UNAVAILABLE503LLM ProviderRetryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

The AI provider is temporarily unavailable or overloaded: Anthropic HTTP 529 (overloaded) or 503, or an OpenAI connection timeout / network failure.

Resolution

Retry with exponential backoff — provider outages are typically brief. Check the provider's status page: status.anthropic.com or status.openai.com.

LLM_ERROR502LLM ProviderMaybe retryable

Endpoints: /v1/chat, /v1/chat/stream

Why it happens

The AI provider returned an unexpected error — not a rate limit, auth, or availability issue. Common causes: unsupported model name, malformed tool/function definition, or a provider-side bug.

Resolution

Check the agent's configured model name is valid and available in your provider account. Review any custom tool definitions attached to the agent for schema issues. Contact support with your conversation_id so the underlying provider error can be investigated.

Server Errors

INTERNAL_ERROR500ServerRetryable

Endpoints: All

Why it happens

An unexpected exception occurred that was not caught by a more specific error handler. This should not happen in normal operation.

Resolution

Retry the request — transient infrastructure issues may resolve on their own. If the error persists, contact support with: the conversation_id from a previous successful response, the full request payload, and the timestamp of the failing request.

Quick Reference

Code	HTTP	Category	Retryable
`MISSING_API_KEY`	401	Auth	No— Fix the request
`INVALID_API_KEY`	401	Auth	No— Fix the key
`QUOTA_EXCEEDED`	429	Quota	No— Upgrade or wait for reset
`RATE_LIMIT_EXCEEDED`	429	Rate Limit	Yes— Back off and retry
`SUBSCRIPTION_CANCELED`	403	Plan	No— Reactivate subscription
`PLAN_RESTRICTED`	403	Plan	No— Upgrade plan
`MISSING_AGENT_ID`	400	Validation	No— Fix the request
`MISSING_MESSAGE`	400	Validation	No— Fix the request
`VALIDATION_ERROR`	422	Validation	No— Fix the request
`AGENT_NOT_FOUND`	404	Agent	No— Verify agent ID
`AGENT_INACTIVE`	403	Agent	No— Re-enable agent
`LLM_AUTH_ERROR`	502	LLM Provider	No— Fix provider key
`LLM_RATE_LIMITED`	429	LLM Provider	Yes— Retry with backoff
`LLM_UNAVAILABLE`	503	LLM Provider	Yes— Retry with backoff
`LLM_ERROR`	502	LLM Provider	Maybe— Contact support
`INTERNAL_ERROR`	500	Server	Yes— Retry, then contact support

Rate Limiting

Every agent has a per-minute request limit. Both /v1/chat and /v1/chat/stream share the same counter — a request to either endpoint counts toward your agent's allowance.

Starter Plan

30 req/min

Per agent

Pro Plan

50 req/min

Per agent

Need higher limits? Enterprise plans support custom rate limits per agent. Contact your account manager or reach out via the dashboard.

Response Headers

Every API response includes rate limit headers so your application always knows where it stands:

Header	Type	Description
`X-RateLimit-Limit`	integer	Maximum requests allowed per 60-second window
`X-RateLimit-Remaining`	integer	Requests remaining in the current window
`X-RateLimit-Reset`	integer	Seconds until the current window resets

Example — normal response headers

text

HTTP/2 200
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 29
X-RateLimit-Reset: 58

Handling 429 Errors

When you exceed the rate limit, the API returns HTTP 429. The response includes the same rate limit headers plus a standard error body:

429 response headers

text

HTTP/2 429
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 42

429 response body

json

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Please retry after 42 seconds."
  }
}

The X-RateLimit-Reset header tells you exactly how many seconds to wait. Always use this value instead of hard-coding a retry delay.

Retry Strategy

We recommend exponential backoff with the X-RateLimit-Reset header as the minimum wait time. Here are ready-to-use examples:

Python — retry with backoff

python

import time, requests

def chat_with_retry(base_url, headers, payload, max_retries=3):
    for attempt in range(max_retries + 1):
        res = requests.post(f"{base_url}/v1/chat", headers=headers, json=payload)

        if res.status_code != 429:
            return res.json()

        reset = int(res.headers.get("X-RateLimit-Reset", 60))
        wait = reset + (2 ** attempt)  # backoff on top of reset
        print(f"Rate limited. Retrying in {wait}s (attempt {attempt + 1}/{max_retries})")
        time.sleep(wait)

    raise Exception("Rate limit exceeded after max retries")

JavaScript — retry with backoff

javascript

async function chatWithRetry(baseUrl, headers, payload, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const res = await fetch(`${baseUrl}/v1/chat`, {
      method: "POST",
      headers,
      body: JSON.stringify(payload),
    });

    if (res.status !== 429) return res.json();

    const reset = parseInt(res.headers.get("X-RateLimit-Reset") ?? "60", 10);
    const wait = (reset + 2 ** attempt) * 1000;
    console.log(`Rate limited. Retrying in ${wait / 1000}s (attempt ${attempt + 1}/${maxRetries})`);
    await new Promise((r) => setTimeout(r, wait));
  }
  throw new Error("Rate limit exceeded after max retries");
}

Best practices

Read the headers proactively

Check X-RateLimit-Remaining on every response. If it's approaching zero, throttle outgoing requests before you hit the limit.

Use the reset header, not a fixed delay

Hard-coding sleep(60) wastes time if the window resets sooner. Always read X-RateLimit-Reset for the exact wait.

Queue and batch where possible

If your application generates bursts of requests, implement a client-side queue that spaces them out evenly across the 60-second window.

Cache responses

If multiple users ask the same question, cache the agent's response on your side to avoid consuming your rate limit with duplicate calls.

Performance & Cost Optimization

Calling an LLM directly is simple — until you need it to be fast, reliable, and affordable at scale. SentientOne handles the hardest parts of LLM integration automatically, so you get production-grade performance without building any of this yourself.

Prompt Caching

Every agent has a system prompt, tool definitions, and often a base set of instructions that are identical across every request. When you call an LLM directly, these tokens are re-processed and re-billed on every single call.

Without SentientOne

•System prompt re-sent on every request
•Tool schemas re-tokenised every call
•You pay full input token cost each time
•Requires custom caching logic per provider

With SentientOne

•Prompt cache hits use provider-native caching (Anthropic, OpenAI)
•Repeated system prompts and tool definitions are cached automatically
•Up to 90% reduction in input token costs for cached prefixes
•Zero configuration — works out of the box for all agents

Automatic Retries & Failover

LLM providers have transient failures — rate limits, overload errors, network timeouts. When you call them directly, you need to build retry logic, backoff strategies, and error classification yourself. SentientOne handles all of this transparently.

Provider-Level Retries

If Anthropic returns a 529 (overloaded) or OpenAI returns a 503, SentientOne automatically retries with exponential backoff — up to 3 attempts. Your application never sees the transient failure.

Intelligent Error Classification

Not all errors should be retried. SentientOne classifies every provider error — auth failures return immediately, rate limits wait and retry, server errors use backoff. You get the right behaviour without writing error-handling code.

Timeout Protection

Long-running LLM calls are bounded with configurable timeouts. If a provider hangs, the request is cleanly terminated and reported — your application doesn't block indefinitely.

Streaming Resilience

Streaming responses are monitored for stalls. If a stream stops producing chunks, it is detected and surfaced as an error event rather than leaving your client waiting forever.

Token Optimization

Token usage directly impacts your LLM costs. SentientOne applies several techniques to keep token consumption as low as possible without sacrificing response quality.

Smart Conversation Truncation

Long conversation histories are automatically truncated to fit within the model's context window while preserving the most recent and relevant messages. You don't need to manage context windows yourself.

Efficient Tool Definitions

MCP tool schemas are optimised before being sent to the LLM. Redundant descriptions and unnecessary metadata are stripped to reduce prompt token usage on every request.

Response Cost Tracking

Every request logs prompt tokens, completion tokens, and USD cost. You can identify expensive agents or conversations and optimise system prompts to reduce spend — data most direct-call setups never capture.

Model-Aware Encoding

Token counting and context management use the correct tokeniser for each model (cl100k for GPT-4, Claude's tokeniser for Anthropic). This avoids silent truncation or unexpected overflows that happen when using a generic token counter.

Why This Matters

Building all of this yourself is possible — but it takes significant engineering effort, ongoing maintenance, and deep familiarity with each LLM provider's quirks.

Capability	Direct LLM Call	SentientOne
Prompt caching	Build per provider	Automatic
Retry with backoff	Build & maintain	Built-in
Error classification	Parse each provider	Standardised codes
Context window management	Manual truncation	Automatic
Token cost tracking	DIY logging	Per-request, per-agent
Streaming resilience	Handle stalls yourself	Monitored & reported
Multi-provider support	Separate SDK per provider	One API, any model

Bottom line: SentientOne gives you production-grade LLM infrastructure from day one. You write one API call — we handle caching, retries, token management, cost tracking, and multi-provider support behind the scenes. Your team ships faster, your costs stay lower, and you don't maintain any of the plumbing.

Observability

Every agent interaction is automatically captured by SentientOne. You get full visibility into requests, responses, token usage, latency, and cost — across every conversation, every agent, every day. These are insights that would take significant engineering effort to implement independently in each application your organisation runs.

Zero instrumentation required

You don't add logging libraries, configure tracing sidecars, or write cost-calculation middleware. Every call to /v1/chat is recorded automatically — your application code stays clean.

Full Request / Response Logs

Every message sent to an agent and every response it generates is stored and viewable in the platform. Inspect the exact prompt history, system prompt injections, and LLM output for any conversation — invaluable for debugging unexpected agent behaviour.

Token Usage per Conversation

Prompt tokens, completion tokens, and total tokens are tracked per request and aggregated per agent. Spot which agents or conversation flows are consuming the most tokens and optimise system prompts accordingly.

Response Time Tracking

End-to-end latency is recorded for every request — including time spent in MCP tool calls. Identify slow agents, slow tools, or LLM provider latency spikes without instrumenting a single line of your own code.

Cost per Conversation

Based on the token counts and the cost-per-1k-tokens configured for each agent, the platform calculates the exact LLM cost for every request. Roll up by agent, by day, or across your whole organisation to track AI spend precisely.

What is captured per request

Field	Description
`request_messages`	Full conversation history sent to the LLM (up to 50 messages)
`response_content`	Complete LLM response text
`prompt_tokens`	Tokens consumed by the input / prompt
`completion_tokens`	Tokens consumed by the generated response
`total_tokens`	Sum across all LLM rounds including tool-use iterations
`cost`	Calculated LLM cost in USD based on agent's cost_per_1m_tokens
`provider / model`	LLM provider and model name used for this request
`status`	`success` or `error` with error message if applicable

Why this matters at scale

One place, all agents. A typical organisation running 5–20 agents across multiple teams would need separate logging, monitoring, and cost-tracking implementations per app. SentientOne centralises all of this automatically.

Audit and compliance. Every LLM interaction is logged with timestamps, user IDs, and conversation IDs — ready for compliance reviews, security audits, or dispute resolution.

Cost control. Know exactly which agents, users, or workflows are driving AI spend before your LLM bill arrives. Set up alerts or impose limits at the agent level.

Prompt engineering feedback loop. Compare token counts and response quality across system prompt iterations to find the most efficient and accurate configuration for each agent.

Security

Security isn't an afterthought — it's built into every layer of the SentientOne AI platform. From how we handle your API keys to how data flows between services, we follow industry-leading standards to keep your information protected.

Encryption at Rest & In Transit

All data is encrypted using AES-256 at rest and TLS 1.3 in transit. API keys, LLM provider credentials, and conversation data are never stored in plaintext.

API Key Authentication

Every request is authenticated via scoped API keys. Keys are hashed before storage, rate-limited per key, and can be rotated or revoked instantly from the dashboard.

Data Isolation

Each organization's agents, conversations, and credentials are fully isolated. Row-level security policies ensure no cross-tenant data access, even at the database layer.

Audit Logging

Every API call, agent configuration change, and authentication event is logged with timestamps and user context. Full audit trails for compliance and forensics.

Compliance & Standards

SOC 2

Type II

GDPR

Compliant

ISO 27001

Certified

OWASP

Top 10 Covered

LLM Provider Key Security

Your OpenAI, Anthropic, or other provider API keys are encrypted with per-organization encryption keys and stored in a dedicated secrets vault. They are only decrypted server-side at the moment of an LLM call and are never exposed in API responses — the platform returns masked values (e.g. ••••••••sk-4f2a).

No Data Training

Your conversations and agent prompts are never used to train any models. Data flows through the platform to the LLM provider and back — we don't retain, analyze, or share your content beyond what's needed to deliver the service.

Role-Based Access Control

Admins manage agents and API keys. Users interact through the chat interface. API consumers are scoped to specific agents. Each role has precisely the permissions it needs — nothing more.

Hosting & Deployment

Your data, your rules. SentientOne AI runs wherever your security and compliance requirements demand — in the cloud, on your own servers, or a hybrid of both. You choose where your data lives.

Cloud Hosted

Fastest way to get started

•Fully managed by SentientOne — zero infrastructure to maintain
•Auto-scaling to handle traffic spikes without config changes
•Global CDN with edge routing for low-latency API calls
•Automatic updates, patches, and security fixes
•99.9% uptime SLA with multi-region failover

On-Premise

Maximum control & compliance

•Deploy on your own servers, VPC, or private cloud
•Data never leaves your network — full sovereignty
•Integrate with your existing SSO, LDAP, and IAM policies
•Air-gapped deployment option for regulated industries
•Custom retention policies and data residency controls

Hybrid Deployment

Need the best of both worlds? Run the agent orchestration layer in the cloud for simplicity while keeping sensitive data processing on-premise. Or use cloud for development and staging, with on-premise for production.

How every request flows

Your App

Any platform

POST /v1/chat→

SentientOne AI

API gateway

→

Agent Config

prompt · model · tools

→

LLM + MCP Tools

OpenAI · Anthropic

Zero AI code in your application. Response includes conversation_id, message, and tool_calls.

Data Residency

Choose where your data is stored — US, EU, APAC, or your own data center. Meet regional compliance requirements without compromising performance.

Zero-Downtime Updates

Platform updates are rolled out with blue-green deployments. No maintenance windows, no service interruptions. On-premise customers control their own update schedule.

Disaster Recovery

Automated backups, point-in-time recovery, and cross-region replication. Your agent configurations and conversation history are always recoverable.

Real-World Use Cases

Example 1: Order Management Agent

A company has an MCP server that exposes order management tools. They create an agent in the Infonex platform that instructs the LLM to use those tools to fetch order details, delivery status, and product information.

Step 1 — Create the agent in the SentientOne platform UI:

NameOrder Agent

ProviderOpenAI

Modelgpt-4o

Temperature0.2

Output TypeJSON

System Prompt"You are an Order Management Assistant connected to the company MCP server. You have access to: get_order(order_id, user_id), get_delivery_status(order_id), get_order_products(order_id). When asked about an order, extract the OrderId and UserId, call the appropriate tools, and return a JSON summary with status, items, prices, delivery estimate, and tracking number."

Step 2 — Call the agent from your application:

bash

# From your e-commerce backend, customer portal, or mobile app
curl -X POST https://api.sentientone.ai/v1/chat \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: sk-so-your_api_key_here" \
  -H "X-Agent-Id: ORDER_AGENT_UUID" \
  -d '{ "message": "Get full order details including delivery and products for OrderId: ORD-78923 and UserId: USR-4412" }'

Step 3 — Receive structured response:

json

{
  "conversation_id": "conv-uuid-...",
  "message": {
    "id": "msg-uuid-...",
    "conversation_id": "conv-uuid-...",
    "role": "assistant",
    "content": "{\n  \"order_id\": \"ORD-78923\",\n  \"status\": \"shipped\",\n  \"customer\": \"USR-4412\",\n  \"items\": [\n    { \"name\": \"Wireless Headphones\", \"qty\": 1, \"price\": 89.99 },\n    { \"name\": \"USB-C Cable\", \"qty\": 2, \"price\": 12.99 }\n  ],\n  \"total\": 115.97,\n  \"delivery\": {\n    \"carrier\": \"FedEx\",\n    \"tracking\": \"FX-998877\",\n    \"estimated_delivery\": \"2026-04-07\",\n    \"status\": \"in_transit\"\n  }\n}",
    "token_count": 198,
    "created_at": "2026-03-28T14:30:00Z"
  },
  "tool_calls": [
    {
      "name": "get_order",
      "input": { "order_id": "ORD-78923", "user_id": "USR-4412" },
      "output": "{\"status\":\"shipped\",\"carrier\":\"FedEx\"}"
    }
  ]
}

Example 2: Product Lookup Agent

A separate agent focused purely on product catalog queries — using Anthropic's Claude with different MCP tools and a different system prompt.

NameProduct Agent

ProviderAnthropic

Modelclaude-sonnet-4-20250514

Output TypeText

System Prompt"You are a Product Catalog Assistant. You have access to: search_products(query, category?, limit?), get_product(product_id), get_recommendations(product_id, limit?). Return comprehensive details including name, price, availability, key specs, and related products."

Call the product agent:

bash

curl -X POST https://api.sentientone.ai/v1/chat \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: sk-so-your_api_key_here" \
  -H "X-Agent-Id: PRODUCT_AGENT_UUID" \
  -d '{ "message": "Show me the details for product PRD-2210 and suggest similar items" }'

Integration Pattern

Here's the recommended pattern for companies integrating Infonex AI into their stack:

Your Company Integration

Web App

React

Mobile App

Flutter

Internal Tools

Slack / CLI

POST https://api.sentientone.ai/v1/chat · X-Api-Key + X-Agent-Id

↓

Infonex AI Platform

Order Agent (GPT-4o, JSON)→ MCP: get_order()

Product Agent (Claude, Text)→ MCP: search_products()

Support Agent (GPT-4o, MD)→ MCP: get_tickets()

↓

Your MCP Server

REST / gRPC APIs

Quick Start Summary

Create agents — In the SentientOne platform, configure agents with system prompts, models, and LLM keys for each use case.
Copy your API key — From Settings → API Key (sk-so-…).
Copy your Agent ID — From Agents page → Agent ID chip.
Call from your app — POST https://api.sentientone.ai/v1/chat with X-Api-Key and X-Agent-Id headers, body {"message": "…"}.
Parse the response — message.content contains the AI response; tool_calls (if present) shows any MCP tool executions.

Code Examples

Python

python

import requests

BASE_URL = "https://api.sentientone.ai"
API_KEY = "sk-so-your_api_key_here"
AGENT_ID = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"  # from Agents page

headers = {
    "Content-Type": "application/json",
    "X-Api-Key": API_KEY,
    "X-Agent-Id": AGENT_ID,
}

# Single message
response = requests.post(f"{BASE_URL}/v1/chat", headers=headers, json={
    "message": "Get order details for OrderId: ORD-1234, UserId: USR-5678"
})

data = response.json()
print(data["message"]["content"])

# Check if any MCP tools were called
if data.get("tool_calls"):
    for tc in data["tool_calls"]:
        print(f"Tool: {tc['name']}, Input: {tc['input']}")

# Follow-up in same conversation
response2 = requests.post(f"{BASE_URL}/v1/chat", headers=headers, json={
    "message": "What is the delivery ETA?",
    "conversation_id": data["conversation_id"],
})
print(response2.json()["message"]["content"])

JavaScript / TypeScript

javascript

const API_KEY = "sk-so-your_api_key_here";
const AGENT_ID = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"; // from Agents page
const BASE_URL = "https://api.sentientone.ai";

const headers = {
  "Content-Type": "application/json",
  "X-Api-Key": API_KEY,
  "X-Agent-Id": AGENT_ID,
};

// Single request
const res = await fetch(`${BASE_URL}/v1/chat`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    message: "Get order details for OrderId: ORD-1234, UserId: USR-5678",
  }),
});

const { conversation_id, message, tool_calls } = await res.json();
console.log(message.content);
if (tool_calls) console.log("Tools used:", tool_calls.map(t => t.name));

// Multi-turn follow-up
const res2 = await fetch(`${BASE_URL}/v1/chat`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    message: "What is the delivery ETA?",
    conversation_id,
  }),
});
console.log((await res2.json()).message.content);

C# / .NET

csharp

using var client = new HttpClient();
client.DefaultRequestHeaders.Add("X-Api-Key", "sk-so-your_api_key_here");
client.DefaultRequestHeaders.Add("X-Agent-Id", "a1b2c3d4-e5f6-7890-abcd-ef1234567890");

// Single request
var response = await client.PostAsJsonAsync(
    "https://api.sentientone.ai/v1/chat",
    new { message = "Get order details for OrderId: ORD-1234, UserId: USR-5678" }
);
var result = await response.Content.ReadFromJsonAsync<ChatResponse>();
Console.WriteLine(result.Message.Content);

// Multi-turn follow-up
var response2 = await client.PostAsJsonAsync(
    "https://api.sentientone.ai/v1/chat",
    new {
        message = "What is the delivery ETA?",
        conversation_id = result.ConversationId
    }
);
Console.WriteLine((await response2.Content.ReadFromJsonAsync<ChatResponse>()).Message.Content);

Best Practices

Use Specific System Prompts

The more specific the agent's system prompt, the better the responses. Include exact MCP tool names, expected input/output formats, and domain constraints. A vague prompt leads to vague answers.

One Agent Per Domain

Create separate agents for orders, products, support, etc. rather than one agent that does everything. Focused agents produce better, more reliable results and are easier to tune.

Store conversation_id

If your use case involves multi-turn interactions, persist the conversation_id from the first response. This gives the agent full context for follow-up queries without re-sending history.

Use JSON Output Type for Structured Data

When you need parseable responses (order details, product data), set the agent's output type to JSON and instruct the system prompt to return valid JSON. This makes JSON.parse(message.content) reliable.

Low Temperature for Deterministic Responses

For data retrieval agents (orders, products), use a low temperature (0.1–0.3). For creative tasks or open-ended chat, use higher values (0.7–1.0). This significantly affects response consistency.

Proxy Through Your Backend

Never call the SentientOne API directly from client-side code. Route requests through your own backend server to keep your API key secure and add any additional validation or logging.

SentientOne AI Platform

Open Platform