🟡要搞懂transformer最高效的方式就是读paper,而不是去看那些网络水文
1️⃣ Attention Is All You Need
2️⃣ Improving Language Understanding by Generative Pre-Training
3️⃣ Language Models are Few-Shot Learners
4️⃣ Scaling Laws for Neural Language Models
5️⃣ Attention Is Not Explanation
6️⃣ Transformer Circuits