quick guide to understanding attention and transformers!
Rough Timeline
Might take around ~1 day at max.
best explanation of attention
The Attention Mechanism in Large Language Models - YouTube - Some Brief Idea of Embeddings and Attention
The math behind Attention: Keys, Queries, and Values matrices - YouTube - Math's and Intuition behind THE K,Q,V and MHA and Scaled Dot Product.
What are Transformer Models and how do they work? - YouTube - Putting Things Together.
Keys, Queries, and Values: The celestial mechanics of attention - YouTube - A Quick Look Again.
Attention? Attention! | Lil'Log
Cross Attention | Method Explanation | Math Explained - YouTube
Transformers from scratch | peterbloem.nl
tokenization
Let’s Build the GPT Tokenizer: A Complete Guide to Tokenization in LLMs – fast.ai
the intuition behind word embeddings
What Are Word Embeddings? - YouTube - An Introduction to Word Embeddings.
Word2vec from Scratch - Jake Tae
the intuition behind the position encoding
How do Transformer Models keep track of the order of words? Positional Encoding - YouTube
The wonderful world of positional encoding – Bocachancla 🫦🩴
Rotary Positional Embeddings Explained | Transformer - YouTube
understand the whole picture
The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time.
Attention is All You Need - Jake Tae
Some intuitions about transformers - Aryaman Arora
3Blue1brow Full Lecture Transformer
Attention is All you Need Alphaxiv Blog
Attention and Augmented Recurrent Neural Networks
The Transformer Family Version 2.0 | Lil'Log
general deep learning
Calculus on Computational Graphs: Backpropagation -- colah's blog
Deep Learning, NLP, and Representations - colah's blog
Neural Networks, Manifolds, and Topology -- colah's blog
Understanding LSTM Networks -- colah's blog
Introduction to seq2seq models - Jake Tae
Introduction to tf-idf - Jake Tae
A Brief Introduction to Recurrent Neural Networks - Jake Tae
Demystifying Entropy (And More) - Jake Tae
Recommendation Algorithm with SVD - Jake Tae LoRA - Jake Tae Likelihood and Probability - Jake Tae