You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| 15) **Transfusion**: Predict the Next Token and Diffuse Images with One Multi-Modal Model |[**ArXiv 24**](https://arxiv.org/pdf/2408.11039)|
215
+
| 16) Scaling Diffusion Language Models via Adaptation from Autoregressive Models |[**ArXiv 24**](https://arxiv.org/pdf/2410.17891)|
216
+
| 17) Large Language Diffusion Models |[**ArXiv 25**](https://arxiv.org/pdf/2502.09992)|
215
217
| <h3id="baseline-video-generation-models">03 Baseline Video Generation Models</h3> ||
216
218
|**Paper**|**Link**|
217
219
| 1) **ViViT**: A Video Vision Transformer |[**ICCV 21 Paper**](https://arxiv.org/pdf/2103.15691v2.pdf), [GitHub](https://github.com/google-research/scenic)|
@@ -248,10 +250,11 @@ We hope the community member has the following characteristics:
248
250
| 21) **VideoCrafter1**: Open Diffusion Models for High-Quality Video Generation |[**ArXiv 23**](https://arxiv.org/abs/2310.19512), [GitHub](https://github.com/AILab-CVC/VideoCrafter)|
249
251
| 22) **VideoCrafter2**: Overcoming Data Limitations for High-Quality Video Diffusion Models |[**ArXiv 24**](https://arxiv.org/abs/2401.09047), [GitHub](https://github.com/AILab-CVC/VideoCrafter)|
250
252
| 23) **LVDM**: Latent Video Diffusion Models for High-Fidelity Long Video Generation |[**ArXiv 22**](https://arxiv.org/abs/2211.13221), [GitHub](https://github.com/YingqingHe/LVDM)|
251
-
| 24) **LaVie**: High-Quality Video Generation with Cascaded Latent Diffusion Models |[**ArXiv 23**](https://arxiv.org/abs/2309.15103), [GitHub](https://github.com/Vchitect/LaVie) ,[Project](https://vchitect.github.io/LaVie-project/)|
253
+
| 24) **LaVie**: High-Quality Video Generation with Cascaded Latent Diffusion Models |[**ArXiv 23**](https://arxiv.org/abs/2309.15103), [GitHub](https://github.com/Vchitect/LaVie), [Project](https://vchitect.github.io/LaVie-project/)|
252
254
| 25) **PYoCo**: Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models |[**ICCV 23 Paper**](https://arxiv.org/abs/2305.10474), [Project](https://research.nvidia.com/labs/dir/pyoco/)|
253
255
| 26) **VideoFusion**: Decomposed Diffusion Models for High-Quality Video Generation |[**CVPR 23 Paper**](https://arxiv.org/abs/2303.08320)|
254
256
| 27) **Movie Gen**: A Cast of Media Foundation Models |[**Paper**](https://ai.meta.com/static-resource/movie-gen-research-paper), [Project](https://ai.meta.com/research/movie-gen/)|
257
+
| 28) Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model|[**ArXiv 25**](https://arxiv.org/pdf/2502.10248), [Project](https://github.com/stepfun-ai/Step-Video-T2V)|
255
258
| <h3id="dataset">06 Dataset</h3> ||
256
259
| <h4id="dataset_paper">6.1 Public Datasets</h4> ||
| 25) **PYoCo**: Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models |[**ICCV 23 Paper**](https://arxiv.org/abs/2305.10474), [Project](https://research.nvidia.com/labs/dir/pyoco/)|
254
256
| 26) **VideoFusion**: Decomposed Diffusion Models for High-Quality Video Generation|[**CVPR 23 Paper**](https://arxiv.org/abs/2303.08320)|
255
257
| 27) **Movie Gen**: A Cast of Media Foundation Models |[**Paper**](https://ai.meta.com/static-resource/movie-gen-research-paper), [Project](https://ai.meta.com/research/movie-gen/)|
258
+
| 28) Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model|[**ArXiv 25**](https://arxiv.org/pdf/2502.10248), [Project](https://github.com/stepfun-ai/Step-Video-T2V)|
0 commit comments