video-generation

  • 30b Transformer
    • text to image and text to video
    • trained on O(100M) videos and O(1B) images
    • tuned with Supervised Fine Tuning
  • 13B video to audio and text to audio model
    • trained on O(1M) hours
  • Flow Matching
  • [ ]