Benchmarking Your ZSE Setup: Measuring Real Performance
Benchmarking Your ZSE Setup
Learn how to measure real performance metrics for your hardware.
Built-in Benchmark Command
zse benchmark qwen-7b.zse
Output:
┌────────────────────────────────────────────────┐
│ Benchmark Results: qwen-7b.zse │
├────────────────────────────────────────────────┤
│ Cold Start: 3.9s │
│ Throughput: 87.3 tok/s │
│ Time-to-First: 52ms │
│ Latency (p50): 11.4ms/tok │
│ Latency (p99): 18.2ms/tok │
│ GPU Memory: 5.2 GB │
└────────────────────────────────────────────────┘
Specific Benchmarks
Cold Start Only
zse benchmark model.zse --metric cold-start --runs 5
Throughput Test
zse benchmark model.zse --metric throughput \
--prompt-length 512 \
--output-length 256 \
--batch-sizes 1,4,8,16
Memory Profiling
zse benchmark model.zse --metric memory \
--context-lengths 1024,4096,8192,16384
Compare Configurations
Compare quantization types
zse benchmark model-nf4.zse model-int4.zse model-int8.zse
Compare context lengths
zse benchmark model.zse --sweep max-context 1024:16384:2x
Python Benchmarking
from zllm_zse import ZSE, benchmark
model = ZSE("qwen-7b.zse")
results = benchmark(
model,
prompts=["Explain quantum computing" * 10 for _ in range(100)],
max_tokens=256
)
print(f"Mean throughput: {results.throughput_mean:.1f} tok/s")
print(f"p99 latency: {results.latency_p99:.1f} ms/tok")
Hardware-Specific Expectations
Your mileage may vary based on PCIe bandwidth, CPU, and storage speed.