SentientOne AI Platform
Send a single POST request with your platform API key, agent ID header, and message — get an AI-powered response from a pre-configured agent. No LLM setup needed in your code. Agents are created and managed through the SentientOne AI platform.
https://api.sentientone.ai/v1/chatX-Api-Key:sk-so-your_platform_keyX-Agent-Id:your-agent-uuidOverview
SentientOne AI is an agent orchestration platform. Instead of hardcoding LLM calls in your application, you configure agents through the SentientOne platform — each with its own provider, model, system prompt, and parameters. Your application then calls a single POST endpoint with two headers (your API key and the agent ID) plus a message body, and gets an AI response back.

Configure Agents
Create specialized agents in the platform UI with system prompts, model selection, and MCP tool definitions.
One Endpoint
Call POST /v1/chat on api.sentientone.ai with X-Api-Key and X-Agent-Id headers from any language or platform.
Streaming Support
Get responses as standard JSON or real-time Server-Sent Events (SSE) for token-by-token streaming.
How It Works
Create an Agent in the SentientOne Platform
Log into SentientOne AI and create an agent. Configure its system prompt, choose the LLM provider (OpenAI/Anthropic), model, temperature, and provide the provider API key. Connect MCP servers so the agent can call your tools. Each agent gets a unique Agent ID — copy it from the Agents page.
Get Your Platform API Key
Your SentientOne platform key (sk-so-…) is found in Settings. Send it in every request as the X-Api-Key header. This is separate from the LLM provider key — the platform key authenticates your application, while each agent's LLM key is stored server-side and never exposed.
Call POST /v1/chat on api.sentientone.ai
From any HTTP client — send a POST request with X-Api-Key (your platform key) and X-Agent-Id (the agent UUID) headers, plus a JSON body containing message. The platform loads the agent config, injects the system prompt, runs any MCP tool calls, manages conversation history, and returns the full response.
Why Configure Agents?
Agents are the core abstraction that makes SentientOne AI powerful. Instead of embedding LLM configuration in your application code, you define specialized agents — each tailored for a specific task. Here's why this matters:
1. Separation of Concerns
Your application code stays clean — just a POST call. All LLM-specific logic (system prompt, model choice, temperature, API keys) lives in the agent configuration. Change the model from GPT-4o to Claude without modifying a single line of application code.
2. Task-Specific Specialization
Each agent has a focused system prompt. An "Order Agent" knows how to query order details via MCP tools. A "Product Agent" understands product catalogs. A "Support Agent" handles customer inquiries. The system prompt constrains the LLM to excel at one specific domain.
3. MCP Integration Ready
If your company exposes APIs through MCP (Model Context Protocol), you configure agents whose system prompts instruct the LLM to use those MCP tools. The agent becomes the bridge between your MCP server and any application that needs AI-powered access to your data.
4. Per-Agent API Keys & Models
Each agent carries its own LLM API key and model configuration. Use GPT-4o for complex reasoning tasks and Claude Haiku for fast classification. Different departments can use different keys for cost tracking.
5. Zero LLM Code in Your App
No OpenAI SDK, no prompt engineering, no token management in your codebase. Your app sends a message and gets a response — the platform handles everything else. This means faster development, easier testing, and no vendor lock-in at the application layer.
Authentication
All API requests require two headers: your platform API key and the agent ID. The platform key authenticates your application; the agent ID selects which configured agent handles the request.
Required Headers
X-Api-KeystringrequiredYour SentientOne platform API key (sk-so-…). Find this in Settings → API Key. Authenticates your application with the platform.
X-Agent-IdstringrequiredThe UUID of the agent to invoke. Copy this from the Agents page — it's shown prominently below the agent name.
Content-TypestringrequiredMust be application/json.
curl -X POST https://api.sentientone.ai/v1/chat \
-H "Content-Type: application/json" \
-H "X-Api-Key: sk-so-your_api_key_here" \
-H "X-Agent-Id: a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
-d '{ "message": "Hello!" }'Keep your API key secure
Never expose your API key in client-side code, public repositories, or browser network requests. Call the SentientOne API from your backend server and proxy responses to your frontend.
Header Reference
| Header | Example Value | Where to find it |
|---|---|---|
X-Api-Key | sk-so-abc123… | Settings → API Key |
X-Agent-Id | a1b2c3d4-… | Agents page → Agent ID chip |
Content-Type | application/json | Always required |
Chat API
The primary endpoint for all agent interactions
This is the only endpoint your application needs to call. Send a message to any configured agent and receive an AI-powered response. The platform handles system prompt injection, conversation history, LLM routing, and response storage.
https://api.sentientone.ai/v1/chatPOST /v1/chat — Standard Response
Send a message and receive the complete AI response in a single JSON payload. Agent ID is passed as a header — no agent_id in the body.
curl -X POST https://api.sentientone.ai/v1/chat \
-H "Content-Type: application/json" \
-H "X-Api-Key: sk-so-your_api_key_here" \
-H "X-Agent-Id: a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
-d '{
"message": "Get the order details for OrderId: ORD-1234 and UserId: USR-5678"
}'Request Headers
X-Api-KeystringrequiredYour SentientOne platform API key (sk-so-…). From Settings → API Key.
X-Agent-IdstringrequiredThe UUID of the agent to invoke. From Agents page → Agent ID.
Content-TypestringrequiredMust be application/json.
Request Body
messagestringrequiredThe user's message or prompt. Appended to the conversation history and sent to the LLM along with the agent's system prompt.
conversation_idstringOptional. Pass an existing conversation ID to continue a multi-turn conversation with full history context. If omitted, a new conversation is created automatically.
Response (200 OK):
{
"conversation_id": "conv-uuid-...",
"message": {
"id": "msg-uuid-...",
"conversation_id": "conv-uuid-...",
"role": "assistant",
"content": "The order ORD-1234 is currently shipped via FedEx (tracking FX-998877)...",
"token_count": 156,
"created_at": "2026-03-28T14:30:00Z"
},
"tool_calls": [
{
"name": "get_order",
"input": { "order_id": "ORD-1234", "user_id": "USR-5678" },
"output": "{\"status\":\"shipped\",\"carrier\":\"FedEx\",\"tracking\":\"FX-998877\"}"
}
]
}Response Fields
conversation_idstringThe conversation this message belongs to. Store this to continue the conversation in follow-up requests.
message.idstringUnique ID of the saved assistant message.
message.rolestringAlways "assistant" for responses.
message.contentstringThe AI response text. Format depends on the agent's output_type (text, json, markdown, or code).
message.token_countnumberTotal tokens used across all LLM calls in this request (including tool-use rounds).
tool_callsarrayOnly present if the agent executed MCP tools. Each entry has name, input, and output.
What happens when you call this endpoint
X-Api-Keyis validated — resolves to your account.- The agent is loaded via
X-Agent-Id— its system prompt, model, provider, LLM API key, and parameters. - If no
conversation_idis provided in the body, a new conversation is created. - Your message is saved to the conversation history.
- The full history (system prompt + all prior messages + new message) is sent to the LLM.
- If the LLM requests MCP tool calls, they are executed and results fed back (up to 8 rounds).
- The final response is saved to the conversation and returned with an optional
tool_callsarray.
Streamable HTTP
For real-time token-by-token responses, use the Streamable HTTP endpoint. Send the same headers as the standard endpoint and include Accept: text/event-stream — the server responds with a Server-Sent Events stream. Ideal for chat UIs that want to display text as it generates, or for monitoring MCP tool execution in real time.
/v1/chat/streamReturns text/event-streamcurl -N -X POST https://api.sentientone.ai/v1/chat/stream \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-H "X-Api-Key: sk-so-your_api_key_here" \
-H "X-Agent-Id: a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
-d '{ "message": "Get the order details for OrderId: ORD-1234" }'Event stream format:
data: {"type":"meta","conversation_id":"conv-uuid-..."}
data: {"type":"tool_call","id":"tc-1","name":"get_order","input":{"order_id":"ORD-1234"}}
data: {"type":"tool_result","id":"tc-1","name":"get_order","output":"{\"status\":\"shipped\"}"}
data: {"type":"delta","content":"The order ORD-1234 is"}
data: {"type":"delta","content":" currently shipped via FedEx"}
...
data: {"type":"done","message":{"id":"msg-uuid","role":"assistant","content":"...","token_count":156}}Event Types
metaeventFirst event. Contains conversation_id for this session. Store it for multi-turn follow-ups.
tool_calleventEmitted when the LLM decides to call an MCP tool. Contains id, name, and input.
tool_resulteventEmitted after the tool executes. Contains id, name, and output. Matches the preceding tool_call by id.
deltaeventStreamed text chunk. Concatenate all content values to build the full response.
doneeventFinal event. Contains the complete saved message object including token_count.
erroreventEmitted on failure. Contains code and message fields matching the standard error envelope.
JavaScript — reading the stream
const res = await fetch("https://api.sentientone.ai/v1/chat/stream", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Accept": "text/event-stream",
"X-Api-Key": "sk-so-your_api_key_here",
"X-Agent-Id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
},
body: JSON.stringify({ message: "Get the order details for OrderId: ORD-1234" }),
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let fullContent = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() ?? "";
for (const line of lines) {
if (!line.startsWith("data: ")) continue;
const event = JSON.parse(line.slice(6));
if (event.type === "meta") console.log("conversation:", event.conversation_id);
if (event.type === "tool_call") console.log("Calling tool:", event.name, event.input);
if (event.type === "tool_result") console.log("Tool result:", event.output);
if (event.type === "delta") { fullContent += event.content; process.stdout.write(event.content); }
if (event.type === "done") console.log("\nTokens used:", event.message.token_count);
}
}Multi-Turn Conversations
The platform maintains full conversation history. To continue a conversation, pass the conversation_id from a previous response.
# First message — starts a new conversation
curl -X POST https://api.sentientone.ai/v1/chat \
-H "Content-Type: application/json" \
-H "X-Api-Key: sk-so-your_api_key_here" \
-H "X-Agent-Id: ORDER_AGENT_UUID" \
-d '{ "message": "Look up order ORD-1234 for user USR-5678" }'
# Response: { "conversation_id": "conv-abc-123", "message": { ... } }
# Follow-up — continues the same conversation with full context
curl -X POST https://api.sentientone.ai/v1/chat \
-H "Content-Type: application/json" \
-H "X-Api-Key: sk-so-your_api_key_here" \
-H "X-Agent-Id: ORDER_AGENT_UUID" \
-d '{ "message": "What is the delivery ETA for that order?", "conversation_id": "conv-abc-123" }'The agent sees the full conversation history, so it knows "that order" refers to ORD-1234 without you needing to repeat it. This enables natural, contextual follow-up queries.
Error Code Reference
All error responses share a consistent JSON envelope. Every error includes a machine-readable code field so you can handle each failure case precisely, independent of HTTP status.
Standard (JSON) error envelope
{
"error": {
"code": "ERROR_CODE",
"message": "Human-readable description"
}
}Streaming errors arrive as SSE events
data: {"type": "error", "code": "ERROR_CODE", "message": "..."}Authentication Errors
MISSING_API_KEY401AuthNot retryableWhy it happens
Resolution
X-Api-Key: sk-so-xxxxxxxxxxxxxxxxINVALID_API_KEY401AuthNot retryableWhy it happens
Resolution
.env or secrets store. Log in to the dashboard and verify your API key under Settings → API Key. Generate a new key if the old one was rotated or revoked.Plan & Quota Errors
QUOTA_EXCEEDED429QuotaNot retryableused, limit, upgrade_urlWhy it happens
Resolution
upgrade_url.RATE_LIMIT_EXCEEDED429Rate LimitRetryableretry_afterWhy it happens
rate_limit_rpm). Both /v1/chat and /v1/chat/stream share the same per-agent counter. Starter plans default to 30 req/min, Pro plans to 50 req/min.Resolution
X-RateLimit-Reset response header. Reduce request concurrency and cache responses where possible. See the Rate Limiting section for full details.SUBSCRIPTION_CANCELED403PlanNot retryableWhy it happens
Resolution
PLAN_RESTRICTED403PlanNot retryableupgrade_urlWhy it happens
Resolution
upgrade_url, or use the non-streaming /v1/chat endpoint which is available on all plans.Request Validation Errors
MISSING_AGENT_ID400ValidationNot retryableWhy it happens
X-Agent-Id request header.Resolution
Pass the agent UUID in the request header:
X-Agent-Id: 736c5015-f67e-4fe5-9932-590d9048b014MISSING_MESSAGE400ValidationNot retryableWhy it happens
message field in the request body was empty or missing.Resolution
{ "message": "Your question here" }VALIDATION_ERROR422ValidationNot retryabledetails (array of field-level errors)Why it happens
Content-Type, wrong field types (e.g. a number where a string is expected), or a completely empty body when fields are required.Resolution
details array in the error response — it lists each invalid field and the reason. Ensure you are sending Content-Type: application/json.Agent Errors
AGENT_NOT_FOUND404AgentNot retryableWhy it happens
Resolution
AGENT_INACTIVE403AgentNot retryableWhy it happens
is_active = false). An is_active check is always performed against the database on every request — even if the agent config is otherwise cached.Resolution
LLM Provider Errors
These errors originate from the underlying AI provider (Anthropic or OpenAI) configured on the agent, not from the SentientOne platform itself.
LLM_AUTH_ERROR502LLM ProviderNot retryableWhy it happens
Resolution
LLM_RATE_LIMITED429LLM ProviderRetryableWhy it happens
Resolution
LLM_UNAVAILABLE503LLM ProviderRetryableWhy it happens
Resolution
LLM_ERROR502LLM ProviderMaybe retryableWhy it happens
Resolution
conversation_id so the underlying provider error can be investigated.Server Errors
INTERNAL_ERROR500ServerRetryableWhy it happens
Resolution
conversation_id from a previous successful response, the full request payload, and the timestamp of the failing request.Quick Reference
| Code | HTTP | Category | Retryable |
|---|---|---|---|
MISSING_API_KEY | 401 | Auth | No— Fix the request |
INVALID_API_KEY | 401 | Auth | No— Fix the key |
QUOTA_EXCEEDED | 429 | Quota | No— Upgrade or wait for reset |
RATE_LIMIT_EXCEEDED | 429 | Rate Limit | Yes— Back off and retry |
SUBSCRIPTION_CANCELED | 403 | Plan | No— Reactivate subscription |
PLAN_RESTRICTED | 403 | Plan | No— Upgrade plan |
MISSING_AGENT_ID | 400 | Validation | No— Fix the request |
MISSING_MESSAGE | 400 | Validation | No— Fix the request |
VALIDATION_ERROR | 422 | Validation | No— Fix the request |
AGENT_NOT_FOUND | 404 | Agent | No— Verify agent ID |
AGENT_INACTIVE | 403 | Agent | No— Re-enable agent |
LLM_AUTH_ERROR | 502 | LLM Provider | No— Fix provider key |
LLM_RATE_LIMITED | 429 | LLM Provider | Yes— Retry with backoff |
LLM_UNAVAILABLE | 503 | LLM Provider | Yes— Retry with backoff |
LLM_ERROR | 502 | LLM Provider | Maybe— Contact support |
INTERNAL_ERROR | 500 | Server | Yes— Retry, then contact support |
Rate Limiting
Every agent has a per-minute request limit. Both /v1/chat and /v1/chat/stream share the same counter — a request to either endpoint counts toward your agent's allowance.
Starter Plan
30 req/min
Per agent
Pro Plan
50 req/min
Per agent
Need higher limits? Enterprise plans support custom rate limits per agent. Contact your account manager or reach out via the dashboard.
Response Headers
Every API response includes rate limit headers so your application always knows where it stands:
| Header | Type | Description |
|---|---|---|
X-RateLimit-Limit | integer | Maximum requests allowed per 60-second window |
X-RateLimit-Remaining | integer | Requests remaining in the current window |
X-RateLimit-Reset | integer | Seconds until the current window resets |
Example — normal response headers
HTTP/2 200
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 29
X-RateLimit-Reset: 58Handling 429 Errors
When you exceed the rate limit, the API returns HTTP 429. The response includes the same rate limit headers plus a standard error body:
429 response headers
HTTP/2 429
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 42429 response body
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded. Please retry after 42 seconds."
}
}The X-RateLimit-Reset header tells you exactly how many seconds to wait. Always use this value instead of hard-coding a retry delay.
Retry Strategy
We recommend exponential backoff with the X-RateLimit-Reset header as the minimum wait time. Here are ready-to-use examples:
Python — retry with backoff
import time, requests
def chat_with_retry(base_url, headers, payload, max_retries=3):
for attempt in range(max_retries + 1):
res = requests.post(f"{base_url}/v1/chat", headers=headers, json=payload)
if res.status_code != 429:
return res.json()
reset = int(res.headers.get("X-RateLimit-Reset", 60))
wait = reset + (2 ** attempt) # backoff on top of reset
print(f"Rate limited. Retrying in {wait}s (attempt {attempt + 1}/{max_retries})")
time.sleep(wait)
raise Exception("Rate limit exceeded after max retries")JavaScript — retry with backoff
async function chatWithRetry(baseUrl, headers, payload, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const res = await fetch(`${baseUrl}/v1/chat`, {
method: "POST",
headers,
body: JSON.stringify(payload),
});
if (res.status !== 429) return res.json();
const reset = parseInt(res.headers.get("X-RateLimit-Reset") ?? "60", 10);
const wait = (reset + 2 ** attempt) * 1000;
console.log(`Rate limited. Retrying in ${wait / 1000}s (attempt ${attempt + 1}/${maxRetries})`);
await new Promise((r) => setTimeout(r, wait));
}
throw new Error("Rate limit exceeded after max retries");
}Best practices
Read the headers proactively
Check X-RateLimit-Remaining on every response. If it's approaching zero, throttle outgoing requests before you hit the limit.
Use the reset header, not a fixed delay
Hard-coding sleep(60) wastes time if the window resets sooner. Always read X-RateLimit-Reset for the exact wait.
Queue and batch where possible
If your application generates bursts of requests, implement a client-side queue that spaces them out evenly across the 60-second window.
Cache responses
If multiple users ask the same question, cache the agent's response on your side to avoid consuming your rate limit with duplicate calls.
Performance & Cost Optimization
Calling an LLM directly is simple — until you need it to be fast, reliable, and affordable at scale. SentientOne handles the hardest parts of LLM integration automatically, so you get production-grade performance without building any of this yourself.
Prompt Caching
Every agent has a system prompt, tool definitions, and often a base set of instructions that are identical across every request. When you call an LLM directly, these tokens are re-processed and re-billed on every single call.
Without SentientOne
- •System prompt re-sent on every request
- •Tool schemas re-tokenised every call
- •You pay full input token cost each time
- •Requires custom caching logic per provider
With SentientOne
- •Prompt cache hits use provider-native caching (Anthropic, OpenAI)
- •Repeated system prompts and tool definitions are cached automatically
- •Up to 90% reduction in input token costs for cached prefixes
- •Zero configuration — works out of the box for all agents
Automatic Retries & Failover
LLM providers have transient failures — rate limits, overload errors, network timeouts. When you call them directly, you need to build retry logic, backoff strategies, and error classification yourself. SentientOne handles all of this transparently.
Provider-Level Retries
If Anthropic returns a 529 (overloaded) or OpenAI returns a 503, SentientOne automatically retries with exponential backoff — up to 3 attempts. Your application never sees the transient failure.
Intelligent Error Classification
Not all errors should be retried. SentientOne classifies every provider error — auth failures return immediately, rate limits wait and retry, server errors use backoff. You get the right behaviour without writing error-handling code.
Timeout Protection
Long-running LLM calls are bounded with configurable timeouts. If a provider hangs, the request is cleanly terminated and reported — your application doesn't block indefinitely.
Streaming Resilience
Streaming responses are monitored for stalls. If a stream stops producing chunks, it is detected and surfaced as an error event rather than leaving your client waiting forever.
Token Optimization
Token usage directly impacts your LLM costs. SentientOne applies several techniques to keep token consumption as low as possible without sacrificing response quality.
Smart Conversation Truncation
Long conversation histories are automatically truncated to fit within the model's context window while preserving the most recent and relevant messages. You don't need to manage context windows yourself.
Efficient Tool Definitions
MCP tool schemas are optimised before being sent to the LLM. Redundant descriptions and unnecessary metadata are stripped to reduce prompt token usage on every request.
Response Cost Tracking
Every request logs prompt tokens, completion tokens, and USD cost. You can identify expensive agents or conversations and optimise system prompts to reduce spend — data most direct-call setups never capture.
Model-Aware Encoding
Token counting and context management use the correct tokeniser for each model (cl100k for GPT-4, Claude's tokeniser for Anthropic). This avoids silent truncation or unexpected overflows that happen when using a generic token counter.
Why This Matters
Building all of this yourself is possible — but it takes significant engineering effort, ongoing maintenance, and deep familiarity with each LLM provider's quirks.
| Capability | Direct LLM Call | SentientOne |
|---|---|---|
| Prompt caching | Build per provider | Automatic |
| Retry with backoff | Build & maintain | Built-in |
| Error classification | Parse each provider | Standardised codes |
| Context window management | Manual truncation | Automatic |
| Token cost tracking | DIY logging | Per-request, per-agent |
| Streaming resilience | Handle stalls yourself | Monitored & reported |
| Multi-provider support | Separate SDK per provider | One API, any model |
Bottom line: SentientOne gives you production-grade LLM infrastructure from day one. You write one API call — we handle caching, retries, token management, cost tracking, and multi-provider support behind the scenes. Your team ships faster, your costs stay lower, and you don't maintain any of the plumbing.
Observability
Every agent interaction is automatically captured by SentientOne. You get full visibility into requests, responses, token usage, latency, and cost — across every conversation, every agent, every day. These are insights that would take significant engineering effort to implement independently in each application your organisation runs.
Zero instrumentation required
You don't add logging libraries, configure tracing sidecars, or write cost-calculation middleware. Every call to /v1/chat is recorded automatically — your application code stays clean.
Full Request / Response Logs
Every message sent to an agent and every response it generates is stored and viewable in the platform. Inspect the exact prompt history, system prompt injections, and LLM output for any conversation — invaluable for debugging unexpected agent behaviour.
Token Usage per Conversation
Prompt tokens, completion tokens, and total tokens are tracked per request and aggregated per agent. Spot which agents or conversation flows are consuming the most tokens and optimise system prompts accordingly.
Response Time Tracking
End-to-end latency is recorded for every request — including time spent in MCP tool calls. Identify slow agents, slow tools, or LLM provider latency spikes without instrumenting a single line of your own code.
Cost per Conversation
Based on the token counts and the cost-per-1k-tokens configured for each agent, the platform calculates the exact LLM cost for every request. Roll up by agent, by day, or across your whole organisation to track AI spend precisely.
What is captured per request
| Field | Description |
|---|---|
request_messages | Full conversation history sent to the LLM (up to 50 messages) |
response_content | Complete LLM response text |
prompt_tokens | Tokens consumed by the input / prompt |
completion_tokens | Tokens consumed by the generated response |
total_tokens | Sum across all LLM rounds including tool-use iterations |
cost | Calculated LLM cost in USD based on agent's cost_per_1m_tokens |
provider / model | LLM provider and model name used for this request |
status | success or error with error message if applicable |
Why this matters at scale
One place, all agents. A typical organisation running 5–20 agents across multiple teams would need separate logging, monitoring, and cost-tracking implementations per app. SentientOne centralises all of this automatically.
Audit and compliance. Every LLM interaction is logged with timestamps, user IDs, and conversation IDs — ready for compliance reviews, security audits, or dispute resolution.
Cost control. Know exactly which agents, users, or workflows are driving AI spend before your LLM bill arrives. Set up alerts or impose limits at the agent level.
Prompt engineering feedback loop. Compare token counts and response quality across system prompt iterations to find the most efficient and accurate configuration for each agent.
Security
Security isn't an afterthought — it's built into every layer of the SentientOne AI platform. From how we handle your API keys to how data flows between services, we follow industry-leading standards to keep your information protected.
Encryption at Rest & In Transit
All data is encrypted using AES-256 at rest and TLS 1.3 in transit. API keys, LLM provider credentials, and conversation data are never stored in plaintext.
API Key Authentication
Every request is authenticated via scoped API keys. Keys are hashed before storage, rate-limited per key, and can be rotated or revoked instantly from the dashboard.
Data Isolation
Each organization's agents, conversations, and credentials are fully isolated. Row-level security policies ensure no cross-tenant data access, even at the database layer.
Audit Logging
Every API call, agent configuration change, and authentication event is logged with timestamps and user context. Full audit trails for compliance and forensics.
Compliance & Standards
SOC 2
Type II
GDPR
Compliant
ISO 27001
Certified
OWASP
Top 10 Covered
LLM Provider Key Security
Your OpenAI, Anthropic, or other provider API keys are encrypted with per-organization encryption keys and stored in a dedicated secrets vault. They are only decrypted server-side at the moment of an LLM call and are never exposed in API responses — the platform returns masked values (e.g. ••••••••sk-4f2a).
No Data Training
Your conversations and agent prompts are never used to train any models. Data flows through the platform to the LLM provider and back — we don't retain, analyze, or share your content beyond what's needed to deliver the service.
Role-Based Access Control
Admins manage agents and API keys. Users interact through the chat interface. API consumers are scoped to specific agents. Each role has precisely the permissions it needs — nothing more.
Hosting & Deployment
Your data, your rules. SentientOne AI runs wherever your security and compliance requirements demand — in the cloud, on your own servers, or a hybrid of both. You choose where your data lives.
Cloud Hosted
Fastest way to get started
- •Fully managed by SentientOne — zero infrastructure to maintain
- •Auto-scaling to handle traffic spikes without config changes
- •Global CDN with edge routing for low-latency API calls
- •Automatic updates, patches, and security fixes
- •99.9% uptime SLA with multi-region failover
On-Premise
Maximum control & compliance
- •Deploy on your own servers, VPC, or private cloud
- •Data never leaves your network — full sovereignty
- •Integrate with your existing SSO, LDAP, and IAM policies
- •Air-gapped deployment option for regulated industries
- •Custom retention policies and data residency controls
Hybrid Deployment
Need the best of both worlds? Run the agent orchestration layer in the cloud for simplicity while keeping sensitive data processing on-premise. Or use cloud for development and staging, with on-premise for production.
How every request flows
Your App
Any platform
SentientOne AI
API gateway
Agent Config
prompt · model · tools
LLM + MCP Tools
OpenAI · Anthropic
Zero AI code in your application. Response includes conversation_id, message, and tool_calls.
Data Residency
Choose where your data is stored — US, EU, APAC, or your own data center. Meet regional compliance requirements without compromising performance.
Zero-Downtime Updates
Platform updates are rolled out with blue-green deployments. No maintenance windows, no service interruptions. On-premise customers control their own update schedule.
Disaster Recovery
Automated backups, point-in-time recovery, and cross-region replication. Your agent configurations and conversation history are always recoverable.
Real-World Use Cases
Example 1: Order Management Agent
A company has an MCP server that exposes order management tools. They create an agent in the Infonex platform that instructs the LLM to use those tools to fetch order details, delivery status, and product information.
Step 1 — Create the agent in the SentientOne platform UI:
Step 2 — Call the agent from your application:
# From your e-commerce backend, customer portal, or mobile app
curl -X POST https://api.sentientone.ai/v1/chat \
-H "Content-Type: application/json" \
-H "X-Api-Key: sk-so-your_api_key_here" \
-H "X-Agent-Id: ORDER_AGENT_UUID" \
-d '{ "message": "Get full order details including delivery and products for OrderId: ORD-78923 and UserId: USR-4412" }'Step 3 — Receive structured response:
{
"conversation_id": "conv-uuid-...",
"message": {
"id": "msg-uuid-...",
"conversation_id": "conv-uuid-...",
"role": "assistant",
"content": "{\n \"order_id\": \"ORD-78923\",\n \"status\": \"shipped\",\n \"customer\": \"USR-4412\",\n \"items\": [\n { \"name\": \"Wireless Headphones\", \"qty\": 1, \"price\": 89.99 },\n { \"name\": \"USB-C Cable\", \"qty\": 2, \"price\": 12.99 }\n ],\n \"total\": 115.97,\n \"delivery\": {\n \"carrier\": \"FedEx\",\n \"tracking\": \"FX-998877\",\n \"estimated_delivery\": \"2026-04-07\",\n \"status\": \"in_transit\"\n }\n}",
"token_count": 198,
"created_at": "2026-03-28T14:30:00Z"
},
"tool_calls": [
{
"name": "get_order",
"input": { "order_id": "ORD-78923", "user_id": "USR-4412" },
"output": "{\"status\":\"shipped\",\"carrier\":\"FedEx\"}"
}
]
}Example 2: Product Lookup Agent
A separate agent focused purely on product catalog queries — using Anthropic's Claude with different MCP tools and a different system prompt.
Call the product agent:
curl -X POST https://api.sentientone.ai/v1/chat \
-H "Content-Type: application/json" \
-H "X-Api-Key: sk-so-your_api_key_here" \
-H "X-Agent-Id: PRODUCT_AGENT_UUID" \
-d '{ "message": "Show me the details for product PRD-2210 and suggest similar items" }'Integration Pattern
Here's the recommended pattern for companies integrating Infonex AI into their stack:
Your Company Integration
Web App
React
Mobile App
Flutter
Internal Tools
Slack / CLI
POST https://api.sentientone.ai/v1/chat · X-Api-Key + X-Agent-Id
↓Infonex AI Platform
Your MCP Server
REST / gRPC APIs
Quick Start Summary
- Create agents — In the SentientOne platform, configure agents with system prompts, models, and LLM keys for each use case.
- Copy your API key — From Settings → API Key (
sk-so-…). - Copy your Agent ID — From Agents page → Agent ID chip.
- Call from your app —
POST https://api.sentientone.ai/v1/chatwithX-Api-KeyandX-Agent-Idheaders, body{"message": "…"}. - Parse the response —
message.contentcontains the AI response;tool_calls(if present) shows any MCP tool executions.
Code Examples
Python
import requests
BASE_URL = "https://api.sentientone.ai"
API_KEY = "sk-so-your_api_key_here"
AGENT_ID = "a1b2c3d4-e5f6-7890-abcd-ef1234567890" # from Agents page
headers = {
"Content-Type": "application/json",
"X-Api-Key": API_KEY,
"X-Agent-Id": AGENT_ID,
}
# Single message
response = requests.post(f"{BASE_URL}/v1/chat", headers=headers, json={
"message": "Get order details for OrderId: ORD-1234, UserId: USR-5678"
})
data = response.json()
print(data["message"]["content"])
# Check if any MCP tools were called
if data.get("tool_calls"):
for tc in data["tool_calls"]:
print(f"Tool: {tc['name']}, Input: {tc['input']}")
# Follow-up in same conversation
response2 = requests.post(f"{BASE_URL}/v1/chat", headers=headers, json={
"message": "What is the delivery ETA?",
"conversation_id": data["conversation_id"],
})
print(response2.json()["message"]["content"])JavaScript / TypeScript
const API_KEY = "sk-so-your_api_key_here";
const AGENT_ID = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"; // from Agents page
const BASE_URL = "https://api.sentientone.ai";
const headers = {
"Content-Type": "application/json",
"X-Api-Key": API_KEY,
"X-Agent-Id": AGENT_ID,
};
// Single request
const res = await fetch(`${BASE_URL}/v1/chat`, {
method: "POST",
headers,
body: JSON.stringify({
message: "Get order details for OrderId: ORD-1234, UserId: USR-5678",
}),
});
const { conversation_id, message, tool_calls } = await res.json();
console.log(message.content);
if (tool_calls) console.log("Tools used:", tool_calls.map(t => t.name));
// Multi-turn follow-up
const res2 = await fetch(`${BASE_URL}/v1/chat`, {
method: "POST",
headers,
body: JSON.stringify({
message: "What is the delivery ETA?",
conversation_id,
}),
});
console.log((await res2.json()).message.content);C# / .NET
using var client = new HttpClient();
client.DefaultRequestHeaders.Add("X-Api-Key", "sk-so-your_api_key_here");
client.DefaultRequestHeaders.Add("X-Agent-Id", "a1b2c3d4-e5f6-7890-abcd-ef1234567890");
// Single request
var response = await client.PostAsJsonAsync(
"https://api.sentientone.ai/v1/chat",
new { message = "Get order details for OrderId: ORD-1234, UserId: USR-5678" }
);
var result = await response.Content.ReadFromJsonAsync<ChatResponse>();
Console.WriteLine(result.Message.Content);
// Multi-turn follow-up
var response2 = await client.PostAsJsonAsync(
"https://api.sentientone.ai/v1/chat",
new {
message = "What is the delivery ETA?",
conversation_id = result.ConversationId
}
);
Console.WriteLine((await response2.Content.ReadFromJsonAsync<ChatResponse>()).Message.Content);Best Practices
Use Specific System Prompts
The more specific the agent's system prompt, the better the responses. Include exact MCP tool names, expected input/output formats, and domain constraints. A vague prompt leads to vague answers.
One Agent Per Domain
Create separate agents for orders, products, support, etc. rather than one agent that does everything. Focused agents produce better, more reliable results and are easier to tune.
Store conversation_id
If your use case involves multi-turn interactions, persist the conversation_id from the first response. This gives the agent full context for follow-up queries without re-sending history.
Use JSON Output Type for Structured Data
When you need parseable responses (order details, product data), set the agent's output type to JSON and instruct the system prompt to return valid JSON. This makes JSON.parse(message.content) reliable.
Low Temperature for Deterministic Responses
For data retrieval agents (orders, products), use a low temperature (0.1–0.3). For creative tasks or open-ended chat, use higher values (0.7–1.0). This significantly affects response consistency.
Proxy Through Your Backend
Never call the SentientOne API directly from client-side code. Route requests through your own backend server to keep your API key secure and add any additional validation or logging.