This document outlines the specifications for a chat completions API that supports both regular invocations and streaming responses (JSON and SSE).
Endpoint | Method | Description | WebFlux Return Type |
---|---|---|---|
/chat/json |
POST | For regular non-streaming responses | Mono<LlmClientOutput> |
/chat/stream |
POST | For JSON streaming responses | Flux<LlmClientOutputChunk> |
/chat/sse |
POST | For SSE streaming (Server-Sent Events) | Flux<ServerSentEvent<LlmClientOutputChunk>> |
While many LLM providers use a single endpoint with a stream
parameter to handle both streaming and non-streaming requests, this approach creates significant implementation challenges. Separate endpoints offer several advantages:
Mono
for single responses, Flux
for streams)./chat/json
can return a Mono<LlmClientOutput>
for a single complete response/chat/stream
can return a Flux<LlmClientOutputChunk>
for a stream of JSON chunks/chat/sse
can return a Flux<ServerSentEvent<LlmClientOutputChunk>>
for SSE streamingContent-Type: application/json
Authorization: Bearer <your_api_key>
/chat/json
{
"model": "model-name", // Optional: Model identifier (server side to provide default)
"messages": [ // Required: Conversation history
{
"role": "system", // "system", "user", or "assistant"
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, how are you?"
}
],
"temperature": 0.7 // Optional: Controls randomness
}
/chat/stream
{
"model": "model-name", // Optional: Model identifier (server side to provide default)
"messages": [ // Required: Conversation history
{
"role": "system", // "system", "user", or "assistant"
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, how are you?"
}
],
"temperature": 0.7 // Optional: Controls randomness
}
/chat/sse
{
// Same parameters as other endpoints
"model": "model-name",
"messages": [
// conversation history
],
"temperature": 0.7
}
/chat/json
{
"id": "cmpl-123abc",
"model": "model-name",
"created": 1678048938,
"message": {
"role": "assistant",
"content": "I'm doing well, thank you for asking! How can I help you today?"
},
"done": true
}
/chat/stream
Each chunk is a JSON object with the following structure. The examples below are formatted for clarity only:
{
"message": {
"role": "assistant",
"content": "I'm "
},
"done": false,
"index": 0
}
Subsequent chunks:
{
"message": {
"role": "assistant",
"content": "doing well"
},
"done": false,
"index": 1
}
The last chunk has "done": true
:
{
"message": {
"role": "assistant",
"content": ", thank you!"
},
"done": true,
"index": 2
}
For RPC compliance, each JSON response is sent as a single line ending with a newline character \n
:
{"message":{"role":"assistant","content":"I'm "},"done":false,"index":0}\n
{"message":{"role":"assistant","content":"doing well"},"done":false,"index":1}\n
{"message":{"role":"assistant","content":", thank you!"},"done":true,"index":2}\n
/chat/sse
Each chunk is formatted as an SSE message. The example below shows the logical structure:
// First chunk
data: {
"message": {
"role": "assistant",
"content": "I'm "
},
"done": false,
"index": 0
}
// Second chunk
data: {
"message": {
"role": "assistant",
"content": "doing well"
},
"done": false,
"index": 1
}
// Third chunk
data: {
"message": {
"role": "assistant",
"content": ", thank you!"
},
"done": false,
"index": 2
}
// Final termination signal
data: [DONE]
The actual SSE stream follows the SSE protocol with each message on a single line followed by double newlines:
data: {"message":{"role":"assistant","content":"I'm "},"done":false,"index":0}\n\n
data: {"message":{"role":"assistant","content":"doing well"},"done":false,"index":1}\n\n
data: {"message":{"role":"assistant","content":", thank you!"},"done":false,"index":2}\n\n
data: [DONE]\n\n
The last chunk is a special data: [DONE]\n\n
message to signal the end of the stream.
/chat/json
)Content-Type: application/json
/chat/stream
)Content-Type: application/json
Transfer-Encoding: chunked
Cache-Control: no-cache
Connection: keep-alive
/chat/sse
)Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
For non-streaming requests, error responses will follow this format:
{
"error": {
"message": "Error message describing what went wrong",
"type": "error_type",
"code": "error_code"
}
}
For JSON streaming, error responses follow the same newline-delimited format but with an error object:
{"error":{"message":"Error message describing what went wrong","type":"error_type","code":"error_code"},"done":true}\n
Note: This is the most widely adopted format across APIs, but it differs structurally from success responses. In success cases, the “message” field contains a JSON object, while in error cases, “message” is a string. Clients will need to implement specific error parsing logic to handle this inconsistency.
For SSE streaming, errors use the standard SSE protocol’s event type feature:
event: error
data: {"message":"Error message describing what went wrong","type":"error_type","code":"error_code"}\n\n
data: [DONE]\n\n
This format is fully compliant with the SSE specification and will be automatically routed to error handlers in browser-based EventSource implementations.
curl -X POST http://your-api.example/chat/json \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key" \
-d '{
"model": "model-name",
"messages": [{"role": "user", "content": "Hello, world!"}]
}'
curl -X POST http://your-api.example/chat/stream \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key" \
-d '{
"model": "model-name",
"messages": [{"role": "user", "content": "Hello, world!"}]
}'
curl -X POST http://your-api.example/chat/sse \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key" \
-d '{
"model": "model-name",
"messages": [{"role": "user", "content": "Hello, world!"}]
}'
const eventSource = new EventSource('/chat/sse');
eventSource.onmessage = (event) => {
if (event.data === '[DONE]') {
console.log('Stream ended');
eventSource.close(); // Close the connection
} else {
const data = JSON.parse(event.data);
console.log(data.message.content); // Process chunk
}
};
eventSource.onerror = (error) => {
console.error('SSE error:', error);
eventSource.close(); // Close on error
};