CLI Commands
Complete reference for all ZSE command-line interface commands and options.
Overview
ZSE provides a powerful CLI for model management, serving, and inference. All commands follow the pattern:
bash
zse <command> [options] [arguments]Get help for any command:
bash
zse --help # List all commandszse serve --help # Help for specific commandzse serve
Start an OpenAI-compatible API server.
bash
zse serve <model> [options]Arguments:
modelModel name (HuggingFace ID) or path to .zse/.gguf file
Options:
| Option | Default | Description |
|---|---|---|
--port | 8000 | Server port |
--host | 127.0.0.1 | Bind address |
--api-key | None | Require API key authentication |
--quant | int4 | Quantization type: int4, int8, nf4, fp16 |
--max-batch | 8 | Maximum batch size |
--max-tokens | 4096 | Maximum output tokens |
--offload | False | Enable layer offloading (zStream) |
--gpu | auto | GPU device ID or "auto" |
Examples:
bash
# Basic usagezse serve Qwen/Qwen2.5-7B-Instruct # Production serverzse serve qwen-7b.zse --host 0.0.0.0 --port 8080 --api-key sk-xxx # With layer offloading for large modelszse serve Qwen/Qwen2.5-32B-Instruct --offload # Specific GPUzse serve model.zse --gpu 1zse convert
Convert models to the optimized .zse format.
bash
zse convert <model> [options]| Option | Default | Description |
|---|---|---|
-o, --output | model.zse | Output file path |
--quant | int4 | Quantization type |
--format | zse | Output format: zse, gguf |
bash
# Convert to .zsezse convert Qwen/Qwen2.5-7B-Instruct -o qwen-7b.zse # Convert with NF4 quantizationzse convert Qwen/Qwen2.5-7B-Instruct -o qwen-7b-nf4.zse --quant nf4 # Convert to GGUFzse convert Qwen/Qwen2.5-7B-Instruct -o qwen-7b.gguf --format ggufzse chat
Interactive chat mode for quick testing.
bash
zse chat <model> [options]| Option | Default | Description |
|---|---|---|
--prompt | None | Single prompt (non-interactive) |
--system | None | System prompt |
--temperature | 0.7 | Sampling temperature |
bash
# Interactive chatzse chat Qwen/Qwen2.5-7B-Instruct # Single promptzse chat Qwen/Qwen2.5-7B-Instruct --prompt "Explain quantum computing" # With system promptzse chat model.zse --system "You are a helpful coding assistant"zse info
Display information about a model.
bash
zse info <model>bash
$ zse info qwen-7b.zse Model Information━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Name: Qwen/Qwen2.5-7B-InstructFormat: .zseQuantization: INT4Parameters: 7.6BSize: 4.2 GBVocab Size: 152064Context: 32768Created: 2024-02-25zse benchmark
Run performance benchmarks.
bash
zse benchmark <model> [options]| Option | Default | Description |
|---|---|---|
--iterations | 10 | Number of iterations |
--warmup | 2 | Warmup iterations |
--output | None | Save results to JSON file |
bash
$ zse benchmark qwen-7b.zse ZSE Benchmark Results━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Model: qwen-7b.zseCold Start: 3.9sFirst Token: 45msTokens/sec: 82.4Memory (Peak): 5.2 GBzse hardware
Detect and display hardware information.
bash
zse hardwareThis command helps you understand what models your hardware can support and diagnose GPU detection issues.
Global Options
These options are available for all commands:
| Option | Description |
|---|---|
--verbose, -v | Enable verbose output |
--quiet, -q | Suppress non-essential output |
--version | Show version and exit |
--help | Show help message |