2024-10-04 - Movie Gen A Cast of Media Foundation Models
- 30b Transformer
- text to image and text to video
- trained on O(100M) videos and O(1B) images
- tuned with Supervised Fine Tuning
- 13B video to audio and text to audio model
- trained on O(1M) hours
- Flow Matching
- [ ]