How Transformers Fail at Init: Rank and Entropy Collapse
Exact formulas for representation variance and gradients at transformer init, and trainability diagrams mapping weight variance to residual‑connection strength to avoid rank and entropy collapse. getnews.me/how-transformers-fail-at... #transformer #initialization