michal.i/o

❯

❯

❯

Generative Models

Generative Models

Jan 21, 20252 min read

Diffusion Models
Video Generation
Flow Matching - Rectified Flows

Adversarial (GANs)

Autoregressive

[2404.02905] Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
[2410.10812] HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
[2412.01819] Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
[2412.04431] Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
[2412.04332] Liquid: Language Models are Scalable Multi-modal Generators
[2412.03069] TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
[2411.18447] Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation
[2412.12095] Causal Diffusion Transformers for Generative Modeling
[2412.09607] Spectral Image Tokenizer
[2411.19722] JetFormer: An Autoregressive Generative Model of Raw Images and Text
GitHub - lucidrains/transfusion-pytorch: Pytorch implementation of Transfusion, “Predict the Next Token and Diffuse Images with One Multi-Modal Model”, from MetaAI
[ ]

Liquid

[2412.04332] Liquid: Language Models are Scalable Multi-modal Generators

Diffusion

Diffusion Transformers

[2412.16112] CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
[2412.12391] Efficient Scaling of Diffusion Transformers for Text-to-Image Generation

Patterns

Latent Space

VAE

VQVAE

[2312.02116] GIVT: Generative Infinite-Vocabulary Transformers
[2411.19722] JetFormer: An Autoregressive Generative Model of Raw Images and Text
openreview.net/pdf?id=gojL67CfS8
[2412.01824] X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
[2412.01819] Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
[2412.01199] TinyFusion: Diffusion Transformers Learned Shallow
[2412.03177] PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
[2412.06774] Visual Lexicon: Rich Image Features in Language Space

Adversarial (GANs)
Autoregressive
Liquid
Diffusion
Diffusion Transformers
Patterns
Latent Space
VAE
VQVAE

Backlinks

Vision Language Models

Graph View

Created with Quartz v4.4.0 © 2025