Benchmarks

Real measurements on NVIDIA H200. No inflated claims, just verified data.

All tests conducted February 2026 on io.net H200 infrastructure. Results verified on v1.4.2.

Cold Start Performance

Time from process start to first token generation. Critical for serverless and auto-scaling.

Method	Load Time	VRAM	Throughput
.zse v1.4.2 (bnb.matmul_4bit)	9.1s	5.9 GB	58.7 tok/s

5.57 GB

File Size

5.9 GB

VRAM Usage

58.7

tok/s

Method	Load Time	VRAM	Throughput
.zse v1.4.2 (bnb.matmul_4bit)	24.1s	20.9 GB	26.9 tok/s

19.23 GB

File Size

20.9 GB

VRAM Usage

26.9

tok/s

✓ Fits 24GB GPUs! Run 32B models on consumer RTX 3090/4090 cards.

Reproduce these results on your hardware with our benchmark suite.