Architecture

Residual Connection

Quick Answer

A shortcut path allowing gradients and features to bypass processing layers.

Residual connections (skip connections) allow the input to a layer to flow directly to the output, added to the processed output. This helps gradients flow backward during training, enabling much deeper networks. Residual connections also preserve input information when layer output is small. In transformers, they appear between attention/FFN operations and the next layer. Without residuals, very deep networks are difficult to train. Residuals are essential for both training stability and enabling deeper architectures.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →