Recurrent Computation with Transformers by repeating layers
Add a large set of registers to allow writing to tokens that don’t align with specific tokens
Using recurrence to achieve weak to strong generalization - YouTube
Add a large set of registers to allow writing to tokens that don’t align with specific tokens
Using recurrence to achieve weak to strong generalization - YouTube