Inference

GGUF

Quick Answer

A file format for quantized models, supporting multiple quantization levels and efficient inference.

GGUF is a format for storing quantized models with metadata. It's designed for efficient inference with CPU and GPU support. GGUF supports various quantization levels (Q4, Q5, Q8). Popular for open-source models via ollama and oobabooga. GGUF enables running quantized models efficiently. It's simpler than some alternatives. GGUF is widely used in open-source communities.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →