REST API
OpenAI-compatible REST API reference. Use any OpenAI SDK or make direct HTTP requests.
Overview
ZSE provides an OpenAI-compatible REST API. This means you can use any OpenAI SDK or existing code with minimal changes.
text
Base URL: http://localhost:8000/v1 Endpoints: POST /v1/chat/completions - Generate chat completions GET /v1/models - List available models GET /health - Health checkIf you're using the OpenAI Python SDK, just change the
base_url parameter to point to your ZSE server.Authentication
By default, ZSE does not require authentication. If you start the server with--api-key, all requests must include the key.
bash
# Start server with API keyzse serve model.zse --api-key sk-your-secret-keyInclude the API key in the Authorization header:
bash
curl http://localhost:8000/v1/chat/completions \ -H "Authorization: Bearer sk-your-secret-key" \ -H "Content-Type: application/json" \ -d '{"model": "default", "messages": [{"role": "user", "content": "Hello"}]}'Chat Completions
Generate chat completions with the /v1/chat/completions endpoint.
POST /v1/chat/completions
curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "default", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ], "temperature": 0.7, "max_tokens": 256 }'Request Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (use "default" for single model) |
messages | array | Yes | Array of message objects |
temperature | float | No | Sampling temperature (0-2, default: 1.0) |
max_tokens | int | No | Maximum tokens to generate |
stream | bool | No | Enable streaming response |
top_p | float | No | Nucleus sampling (0-1, default: 1.0) |
stop | array | No | Stop sequences |
Response:
Response
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1709147520, "model": "Qwen/Qwen2.5-7B-Instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of France is Paris." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 25, "completion_tokens": 8, "total_tokens": 33 }}Streaming
Enable streaming to receive tokens as they're generated:
bash
curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "default", "messages": [{"role": "user", "content": "Count to 5"}], "stream": true }'Streaming response (Server-Sent Events):
text
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"1"},"index":0}]} data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":", "},"index":0}]} data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"2"},"index":0}]} ... data: [DONE]Models
List available models with the /v1/models endpoint.
GET /v1/models
curl http://localhost:8000/v1/modelsResponse
{ "object": "list", "data": [ { "id": "Qwen/Qwen2.5-7B-Instruct", "object": "model", "created": 1709147520, "owned_by": "zse" } ]}Health Check
Check server health with the /health endpoint.
GET /health
curl http://localhost:8000/healthResponse
{ "status": "healthy", "model": "Qwen/Qwen2.5-7B-Instruct", "gpu_memory_used": "5.2 GB", "gpu_memory_total": "80 GB", "uptime": "2h 15m"}Error Handling
ZSE returns standard HTTP status codes and JSON error responses:
| Status | Description |
|---|---|
| 200 | Success |
| 400 | Bad request (invalid parameters) |
| 401 | Unauthorized (invalid API key) |
| 404 | Model not found |
| 429 | Rate limited |
| 500 | Internal server error |
Error Response
{ "error": { "message": "Invalid request: 'messages' is required", "type": "invalid_request_error", "code": "missing_required_field" }}