API Reference

REST API

OpenAI-compatible REST API reference. Use any OpenAI SDK or make direct HTTP requests.

Overview

ZSE provides an OpenAI-compatible REST API. This means you can use any OpenAI SDK or existing code with minimal changes.

text

Base URL: http://localhost:8000/v1
 
Endpoints:
  POST /v1/chat/completions  - Generate chat completions
  GET  /v1/models            - List available models
  GET  /health               - Health check

If you're using the OpenAI Python SDK, just change the base_url parameter to point to your ZSE server.

Authentication

By default, ZSE does not require authentication. If you start the server with--api-key, all requests must include the key.

bash

# Start server with API key
zse serve model.zse --api-key sk-your-secret-key

Include the API key in the Authorization header:

bash

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer sk-your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "default", "messages": [{"role": "user", "content": "Hello"}]}'

Chat Completions

Generate chat completions with the /v1/chat/completions endpoint.

POST /v1/chat/completions

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Request Parameters:

Parameter	Type	Required	Description
`model`	string	Yes	Model ID (use "default" for single model)
`messages`	array	Yes	Array of message objects
`temperature`	float	No	Sampling temperature (0-2, default: 1.0)
`max_tokens`	int	No	Maximum tokens to generate
`stream`	bool	No	Enable streaming response
`top_p`	float	No	Nucleus sampling (0-1, default: 1.0)
`stop`	array	No	Stop sequences

Response:

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709147520,
  "model": "Qwen/Qwen2.5-7B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming

Enable streaming to receive tokens as they're generated:

bash

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "default",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'

Streaming response (Server-Sent Events):

text

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"1"},"index":0}]}
 
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":", "},"index":0}]}
 
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"2"},"index":0}]}
 
...
 
data: [DONE]

Models

List available models with the /v1/models endpoint.

GET /v1/models

curl http://localhost:8000/v1/models

Response

{
  "object": "list",
  "data": [
    {
      "id": "Qwen/Qwen2.5-7B-Instruct",
      "object": "model",
      "created": 1709147520,
      "owned_by": "zse"
    }
  ]
}

Health Check

Check server health with the /health endpoint.

GET /health

curl http://localhost:8000/health

Response

{
  "status": "healthy",
  "model": "Qwen/Qwen2.5-7B-Instruct",
  "gpu_memory_used": "5.2 GB",
  "gpu_memory_total": "80 GB",
  "uptime": "2h 15m"
}

Error Handling

ZSE returns standard HTTP status codes and JSON error responses:

Status	Description
200	Success
400	Bad request (invalid parameters)
401	Unauthorized (invalid API key)
404	Model not found
429	Rate limited
500	Internal server error

Error Response

{
  "error": {
    "message": "Invalid request: 'messages' is required",
    "type": "invalid_request_error",
    "code": "missing_required_field"
  }
}

← Python API

Configuration →