-
layer skipping
-
gating between different modules
-
sliding window attention
-
full attention
-
sigmoid attention
-
tokenformer
-
SSM variants
-
Conv
Dec 22, 20241 min read
layer skipping
gating between different modules
sliding window attention
full attention
sigmoid attention
tokenformer
SSM variants
Conv