ZSE Quantization Guide: NF4 vs INT4 vs INT8
ZSE Quantization Guide
Choosing the right quantization is key to balancing quality, speed, and memory.
Available Quantization Types
NF4 (NormalFloat4) - Default
zse convert model -o model.zse --quant nf4
Bits: 4
Quality: ★★★★☆ (best 4-bit)
Size: ~0.56GB per billion params
Use case: Most models, production deployments
NF4 uses an asymmetric quantization grid optimized for the weight distribution of neural networks.
INT4
zse convert model -o model.zse --quant int4
Bits: 4
Quality: ★★★☆☆
Size: ~0.53GB per billion params
Use case: Maximum compression, less sensitive tasks
INT8
zse convert model -o model.zse --quant int8
Bits: 8
Quality: ★★★★★ (near FP16)
Size: ~1.1GB per billion params
Use case: When quality is critical
FP16 (No Quantization)
zse convert model -o model.zse --quant fp16
Bits: 16
Quality: ★★★★★ (original)
Size: ~2GB per billion params
Use case: Fine-tuning, debugging
Quality Comparison (Qwen 7B)
Recommendations
General use: NF4 (best quality/size ratio)
Code generation: INT8 (higher precision helps)
Embeddings: INT8 or FP16
Chat/creative: NF4 is plenty