Deployment
Inference Endpoint
Quick Answer
A deployed model accessible via API for making predictions in production.
Inference endpoints expose models via HTTP/REST. Endpoints can be created and destroyed. Endpoints are scalable. Endpoints enable real-time inference. Endpoints abstract infrastructure. Endpoints have costs. Endpoints are practical for deployment. Endpoints are the standard interface for inference.
Last verified: 2026-04-08