• transformers quadratic with input length
    • do well modeling interactions in a fixed window size, but can’t model beyond the supported window
  • mamba
    • linear in sequence length
    • 5x higher throughput than transformers
    • beats large transformers
  • Structured State Space Sequence Models (SSMs) (S4)

TODO

  • continue