Inference

INT8 Quantization

Quick Answer

Quantizing model weights to 8-bit integers, achieving 4x memory reduction with minimal quality loss.

INT8 quantization represents weights as 8-bit integers. This achieves 4x memory reduction compared to float32. Quality impact is usually minimal. INT8 is practical for inference and training. Most inference services support INT8. Calibration determines quantization parameters. INT8 is a sweet spot for quality/efficiency trade-off. More aggressive quantization (INT4) gives more savings but higher quality loss.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →

← All glossary terms