Skip to content

Commit c831ce3

Browse files
authored
Merge pull request #377 from MING-ZCH/main
[docs] update README.md & README_zh-CN.md about video generation & diffusion transformer model-related papers and fix bugs
2 parents 91b7662 + a96a013 commit c831ce3

File tree

2 files changed

+7
-1
lines changed

2 files changed

+7
-1
lines changed

README.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,8 @@ We hope the community member has the following characteristics:
212212
| 13) **DDM**: Deconstructing Denoising Diffusion Models for Self-Supervised Learning | [**ArXiv 24**](https://arxiv.org/pdf/2401.14404v1)|
213213
| 14) Autoregressive Image Generation without Vector Quantization | [**ArXiv 24**](https://arxiv.org/pdf/2406.11838), [GitHub](https://github.com/LTH14/mar) |
214214
| 15) **Transfusion**: Predict the Next Token and Diffuse Images with One Multi-Modal Model | [**ArXiv 24**](https://arxiv.org/pdf/2408.11039)|
215+
| 16) Scaling Diffusion Language Models via Adaptation from Autoregressive Models | [**ArXiv 24**](https://arxiv.org/pdf/2410.17891)|
216+
| 17) Large Language Diffusion Models | [**ArXiv 25**](https://arxiv.org/pdf/2502.09992)|
215217
| <h3 id="baseline-video-generation-models">03 Baseline Video Generation Models</h3> | |
216218
| **Paper** | **Link** |
217219
| 1) **ViViT**: A Video Vision Transformer | [**ICCV 21 Paper**](https://arxiv.org/pdf/2103.15691v2.pdf), [GitHub](https://github.com/google-research/scenic) |
@@ -248,10 +250,11 @@ We hope the community member has the following characteristics:
248250
| 21) **VideoCrafter1**: Open Diffusion Models for High-Quality Video Generation | [**ArXiv 23**](https://arxiv.org/abs/2310.19512), [GitHub](https://github.com/AILab-CVC/VideoCrafter) |
249251
| 22) **VideoCrafter2**: Overcoming Data Limitations for High-Quality Video Diffusion Models | [**ArXiv 24**](https://arxiv.org/abs/2401.09047), [GitHub](https://github.com/AILab-CVC/VideoCrafter) |
250252
| 23) **LVDM**: Latent Video Diffusion Models for High-Fidelity Long Video Generation | [**ArXiv 22**](https://arxiv.org/abs/2211.13221), [GitHub](https://github.com/YingqingHe/LVDM) |
251-
| 24) **LaVie**: High-Quality Video Generation with Cascaded Latent Diffusion Models | [**ArXiv 23**](https://arxiv.org/abs/2309.15103), [GitHub](https://github.com/Vchitect/LaVie) ,[Project](https://vchitect.github.io/LaVie-project/) |
253+
| 24) **LaVie**: High-Quality Video Generation with Cascaded Latent Diffusion Models | [**ArXiv 23**](https://arxiv.org/abs/2309.15103), [GitHub](https://github.com/Vchitect/LaVie), [Project](https://vchitect.github.io/LaVie-project/) |
252254
| 25) **PYoCo**: Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models | [**ICCV 23 Paper**](https://arxiv.org/abs/2305.10474), [Project](https://research.nvidia.com/labs/dir/pyoco/)|
253255
| 26) **VideoFusion**: Decomposed Diffusion Models for High-Quality Video Generation | [**CVPR 23 Paper**](https://arxiv.org/abs/2303.08320)|
254256
| 27) **Movie Gen**: A Cast of Media Foundation Models | [**Paper**](https://ai.meta.com/static-resource/movie-gen-research-paper), [Project](https://ai.meta.com/research/movie-gen/)|
257+
| 28) Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model| [**ArXiv 25**](https://arxiv.org/pdf/2502.10248), [Project](https://github.com/stepfun-ai/Step-Video-T2V)|
255258
| <h3 id="dataset">06 Dataset</h3> | |
256259
| <h4 id="dataset_paper">6.1 Public Datasets</h4> | |
257260
| **Dataset Name - Paper** | **Link** |

README_zh-CN.md

+3
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,8 @@ MiniSora 开源社区定位为由社区同学自发组织的开源社区,MiniS
213213
| 13) **DDM**: Deconstructing Denoising Diffusion Models for Self-Supervised Learning | [**ArXiv 24**](https://arxiv.org/pdf/2401.14404v1)|
214214
| 14) Autoregressive Image Generation without Vector Quantization | [**ArXiv 24**](https://arxiv.org/pdf/2406.11838), [GitHub](https://github.com/LTH14/mar) |
215215
| 15) **Transfusion**: Predict the Next Token and Diffuse Images with One Multi-Modal Model | [**ArXiv 24**](https://arxiv.org/pdf/2408.11039)|
216+
| 16) Scaling Diffusion Language Models via Adaptation from Autoregressive Models | [**ArXiv 24**](https://arxiv.org/pdf/2410.17891)|
217+
| 17) Large Language Diffusion Models | [**ArXiv 25**](https://arxiv.org/pdf/2502.09992)|
216218
| <h3 id="baseline-video-generation-models">03 Baseline Video Generation Models</h3> | |
217219
| **论文** | **链接** |
218220
| 1) **ViViT**: A Video Vision Transformer | [**ICCV 21 Paper**](https://arxiv.org/pdf/2103.15691v2.pdf), [Github](https://github.com/google-research/scenic) |
@@ -253,6 +255,7 @@ MiniSora 开源社区定位为由社区同学自发组织的开源社区,MiniS
253255
| 25) **PYoCo**: Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models | [**ICCV 23 Paper**](https://arxiv.org/abs/2305.10474), [Project](https://research.nvidia.com/labs/dir/pyoco/)|
254256
| 26) **VideoFusion**: Decomposed Diffusion Models for High-Quality Video Generation| [**CVPR 23 Paper**](https://arxiv.org/abs/2303.08320)|
255257
| 27) **Movie Gen**: A Cast of Media Foundation Models | [**Paper**](https://ai.meta.com/static-resource/movie-gen-research-paper), [Project](https://ai.meta.com/research/movie-gen/)|
258+
| 28) Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model| [**ArXiv 25**](https://arxiv.org/pdf/2502.10248), [Project](https://github.com/stepfun-ai/Step-Video-T2V)|
256259
| <h3 id="dataset">06 Dataset</h3> | |
257260
| <h4 id="dataset_paper">6.1 数据集资源</h4> | |
258261
| **数据集名称 - 论文** | **链接** |

0 commit comments

Comments
 (0)