• layer skipping

  • gating between different modules

  • sliding window attention

  • full attention

  • sigmoid attention

  • tokenformer

  • SSM variants

  • Conv