Inference
INT8 Quantization
Quick Answer
Quantizing model weights to 8-bit integers, achieving 4x memory reduction with minimal quality loss.
INT8 quantization represents weights as 8-bit integers. This achieves 4x memory reduction compared to float32. Quality impact is usually minimal. INT8 is practical for inference and training. Most inference services support INT8. Calibration determines quantization parameters. INT8 is a sweet spot for quality/efficiency trade-off. More aggressive quantization (INT4) gives more savings but higher quality loss.
Last verified: 2026-04-08