API Reference

REST API

OpenAI-compatible REST API reference. Use any OpenAI SDK or make direct HTTP requests.

Overview

ZSE provides an OpenAI-compatible REST API. This means you can use any OpenAI SDK or existing code with minimal changes.

text
Base URL: http://localhost:8000/v1
Endpoints:
POST /v1/chat/completions - Generate chat completions
GET /v1/models - List available models
GET /health - Health check
If you're using the OpenAI Python SDK, just change the base_url parameter to point to your ZSE server.

Authentication

By default, ZSE does not require authentication. If you start the server with--api-key, all requests must include the key.

bash
# Start server with API key
zse serve model.zse --api-key sk-your-secret-key

Include the API key in the Authorization header:

bash
curl http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer sk-your-secret-key" \
-H "Content-Type: application/json" \
-d '{"model": "default", "messages": [{"role": "user", "content": "Hello"}]}'

Chat Completions

Generate chat completions with the /v1/chat/completions endpoint.

POST /v1/chat/completions
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 256
}'

Request Parameters:

ParameterTypeRequiredDescription
modelstringYesModel ID (use "default" for single model)
messagesarrayYesArray of message objects
temperaturefloatNoSampling temperature (0-2, default: 1.0)
max_tokensintNoMaximum tokens to generate
streamboolNoEnable streaming response
top_pfloatNoNucleus sampling (0-1, default: 1.0)
stoparrayNoStop sequences

Response:

Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709147520,
"model": "Qwen/Qwen2.5-7B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}

Streaming

Enable streaming to receive tokens as they're generated:

bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "default",
"messages": [{"role": "user", "content": "Count to 5"}],
"stream": true
}'

Streaming response (Server-Sent Events):

text
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"1"},"index":0}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":", "},"index":0}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"2"},"index":0}]}
...
data: [DONE]

Models

List available models with the /v1/models endpoint.

GET /v1/models
curl http://localhost:8000/v1/models
Response
{
"object": "list",
"data": [
{
"id": "Qwen/Qwen2.5-7B-Instruct",
"object": "model",
"created": 1709147520,
"owned_by": "zse"
}
]
}

Health Check

Check server health with the /health endpoint.

GET /health
curl http://localhost:8000/health
Response
{
"status": "healthy",
"model": "Qwen/Qwen2.5-7B-Instruct",
"gpu_memory_used": "5.2 GB",
"gpu_memory_total": "80 GB",
"uptime": "2h 15m"
}

Error Handling

ZSE returns standard HTTP status codes and JSON error responses:

StatusDescription
200Success
400Bad request (invalid parameters)
401Unauthorized (invalid API key)
404Model not found
429Rate limited
500Internal server error
Error Response
{
"error": {
"message": "Invalid request: 'messages' is required",
"type": "invalid_request_error",
"code": "missing_required_field"
}
}