Diffusion Models Video Generation Flow Matching [2312.02116] GIVT: Generative Infinite-Vocabulary Transformers [2411.19722] JetFormer: An Autoregressive Generative Model of Raw Images and Text openreview.net/pdf?id=gojL67CfS8 [2412.01824] X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models [2412.01819] Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis [2412.01199] TinyFusion: Diffusion Transformers Learned Shallow