multi-head attention

Multi-head attention is a mechanism in deep learning models, particularly in transformer architectures, that allows the model to focus on different parts of the input sequence simultaneously and capture diverse and valuable information. It achieves this by employing multiple attention heads, each independently attending to different parts of the input, and then combining their representations to obtain a comprehensive attention-based output.

Requires login.

Related Concepts (1)

attention layers

Similar Concepts

attention-based models
attention-based sequence-to-sequence models
attentional capture
attentional focus
attentional networks
cross-modal attention
crossmodal attention
divided attention
focus and attention
generative adversarial networks with attention
hierarchical attention networks
recurrent neural networks with attention
reinforcement learning with attention
selective attention
self-attention