API Reference

CLI Commands

Complete reference for all ZSE command-line interface commands and options.

Overview

ZSE provides a powerful CLI for model management, serving, and inference. All commands follow the pattern:

bash

zse <command> [options] [arguments]

Get help for any command:

bash

zse --help           # List all commands
zse serve --help     # Help for specific command

zse serve

Start an OpenAI-compatible API server.

bash

zse serve <model> [options]

Arguments:

modelModel name (HuggingFace ID) or path to .zse/.gguf file

Options:

Option	Default	Description
`--port`	8000	Server port
`--host`	127.0.0.1	Bind address
`--api-key`	None	Require API key authentication
`--quant`	int4	Quantization type: int4, int8, nf4, fp16
`--max-batch`	8	Maximum batch size
`--max-tokens`	4096	Maximum output tokens
`--offload`	False	Enable layer offloading (zStream)
`--gpu`	auto	GPU device ID or "auto"

Examples:

bash

# Basic usage
zse serve Qwen/Qwen2.5-7B-Instruct
 
# Production server
zse serve qwen-7b.zse --host 0.0.0.0 --port 8080 --api-key sk-xxx
 
# With layer offloading for large models
zse serve Qwen/Qwen2.5-32B-Instruct --offload
 
# Specific GPU
zse serve model.zse --gpu 1

zse convert

Convert models to the optimized .zse format.

bash

zse convert <model> [options]

Option	Default	Description
`-o, --output`	model.zse	Output file path
`--quant`	int4	Quantization type
`--format`	zse	Output format: zse, gguf

bash

# Convert to .zse
zse convert Qwen/Qwen2.5-7B-Instruct -o qwen-7b.zse
 
# Convert with NF4 quantization
zse convert Qwen/Qwen2.5-7B-Instruct -o qwen-7b-nf4.zse --quant nf4
 
# Convert to GGUF
zse convert Qwen/Qwen2.5-7B-Instruct -o qwen-7b.gguf --format gguf

zse chat

Interactive chat mode for quick testing.

bash

zse chat <model> [options]

Option	Default	Description
`--prompt`	None	Single prompt (non-interactive)
`--system`	None	System prompt
`--temperature`	0.7	Sampling temperature

bash

# Interactive chat
zse chat Qwen/Qwen2.5-7B-Instruct
 
# Single prompt
zse chat Qwen/Qwen2.5-7B-Instruct --prompt "Explain quantum computing"
 
# With system prompt
zse chat model.zse --system "You are a helpful coding assistant"

zse info

Display information about a model.

bash

zse info <model>

bash

$ zse info qwen-7b.zse
 
Model Information
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Name:          Qwen/Qwen2.5-7B-Instruct
Format:        .zse
Quantization:  INT4
Parameters:    7.6B
Size:          4.2 GB
Vocab Size:    152064
Context:       32768
Created:       2024-02-25

zse benchmark

Run performance benchmarks.

bash

zse benchmark <model> [options]

Option	Default	Description
`--iterations`	10	Number of iterations
`--warmup`	2	Warmup iterations
`--output`	None	Save results to JSON file

bash

$ zse benchmark qwen-7b.zse
 
ZSE Benchmark Results
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Model:              qwen-7b.zse
Cold Start:         3.9s
First Token:        45ms
Tokens/sec:         82.4
Memory (Peak):      5.2 GB

zse hardware

Detect and display hardware information.

bash

zse hardware

This command helps you understand what models your hardware can support and diagnose GPU detection issues.

Global Options

These options are available for all commands:

Option	Description
`--verbose, -v`	Enable verbose output
`--quiet, -q`	Suppress non-essential output
`--version`	Show version and exit
`--help`	Show help message

← zKV

Python API →