API Reference

CLI Commands

Complete reference for all ZSE command-line interface commands and options.

Overview

ZSE provides a powerful CLI for model management, serving, and inference. All commands follow the pattern:

bash
zse <command> [options] [arguments]

Get help for any command:

bash
zse --help # List all commands
zse serve --help # Help for specific command

zse serve

Start an OpenAI-compatible API server.

bash
zse serve <model> [options]

Arguments:

  • modelModel name (HuggingFace ID) or path to .zse/.gguf file

Options:

OptionDefaultDescription
--port8000Server port
--host127.0.0.1Bind address
--api-keyNoneRequire API key authentication
--quantint4Quantization type: int4, int8, nf4, fp16
--max-batch8Maximum batch size
--max-tokens4096Maximum output tokens
--offloadFalseEnable layer offloading (zStream)
--gpuautoGPU device ID or "auto"

Examples:

bash
# Basic usage
zse serve Qwen/Qwen2.5-7B-Instruct
# Production server
zse serve qwen-7b.zse --host 0.0.0.0 --port 8080 --api-key sk-xxx
# With layer offloading for large models
zse serve Qwen/Qwen2.5-32B-Instruct --offload
# Specific GPU
zse serve model.zse --gpu 1

zse convert

Convert models to the optimized .zse format.

bash
zse convert <model> [options]
OptionDefaultDescription
-o, --outputmodel.zseOutput file path
--quantint4Quantization type
--formatzseOutput format: zse, gguf
bash
# Convert to .zse
zse convert Qwen/Qwen2.5-7B-Instruct -o qwen-7b.zse
# Convert with NF4 quantization
zse convert Qwen/Qwen2.5-7B-Instruct -o qwen-7b-nf4.zse --quant nf4
# Convert to GGUF
zse convert Qwen/Qwen2.5-7B-Instruct -o qwen-7b.gguf --format gguf

zse chat

Interactive chat mode for quick testing.

bash
zse chat <model> [options]
OptionDefaultDescription
--promptNoneSingle prompt (non-interactive)
--systemNoneSystem prompt
--temperature0.7Sampling temperature
bash
# Interactive chat
zse chat Qwen/Qwen2.5-7B-Instruct
# Single prompt
zse chat Qwen/Qwen2.5-7B-Instruct --prompt "Explain quantum computing"
# With system prompt
zse chat model.zse --system "You are a helpful coding assistant"

zse info

Display information about a model.

bash
zse info <model>
bash
$ zse info qwen-7b.zse
Model Information
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Name: Qwen/Qwen2.5-7B-Instruct
Format: .zse
Quantization: INT4
Parameters: 7.6B
Size: 4.2 GB
Vocab Size: 152064
Context: 32768
Created: 2024-02-25

zse benchmark

Run performance benchmarks.

bash
zse benchmark <model> [options]
OptionDefaultDescription
--iterations10Number of iterations
--warmup2Warmup iterations
--outputNoneSave results to JSON file
bash
$ zse benchmark qwen-7b.zse
ZSE Benchmark Results
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Model: qwen-7b.zse
Cold Start: 3.9s
First Token: 45ms
Tokens/sec: 82.4
Memory (Peak): 5.2 GB

zse hardware

Detect and display hardware information.

bash
zse hardware
This command helps you understand what models your hardware can support and diagnose GPU detection issues.

Global Options

These options are available for all commands:

OptionDescription
--verbose, -vEnable verbose output
--quiet, -qSuppress non-essential output
--versionShow version and exit
--helpShow help message