Transformer
Paper Attention Is All You Need
- Positional Encodings: embedding the position of words in the data
- “Multi-head Attention”: enables the model to focus on other words in the input that are closely related to that word
Paper Attention Is All You Need