Skip to content

Commit 7e89ffd

Browse files
committed
Alpha Ver.0.1 release
1 parent 748957b commit 7e89ffd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+10799
-1
lines changed

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -150,3 +150,8 @@ cython_debug/
150150
# and can be added to the global gitignore or merged into this file. For a more nuclear
151151
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
152152
#.idea/
153+
154+
# Model files
155+
*.bin
156+
*.pth
157+
*.safetensors

README.md

+74-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,75 @@
11
# ComfyUI-AnimateAnyone-Evolved
2-
Improved AnimateAnyone implementation that allows you to use the opse image sequence and reference image to generate stylized video
2+
Improved AnimateAnyone implementation that allows you to use the opse image sequence and reference image to generate stylized video.<br>
3+
***The current goal of this project is to achieve desired pose2video result with 1+FPS on GPUs that are equal to or better than RTX 3080!🚀***
4+
5+
<br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/Test2Show-ChunLi.mp4" muted="false"></video>
6+
7+
## Currently Support
8+
- Please check **[example workflows](./_Example_Workflow/)** for usage. You can use [Test Inputs](./_Example_Workflow/_Test_Inputs/) to generate the exactly same results that I showed here. (I got Chun-Li image from [civitai](https://civitai.com/images/3034077))
9+
- Support different sampler & scheduler:
10+
- **DDIM**
11+
- 24 frames pose image sequences, `steps=20`, `context_frames=24`; Takes 835.67 seconds to generate on a RTX3080 GPU
12+
<br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/DDIM_context_frame_24.mp4" muted="false" width="320"></video>
13+
- 24 frames pose image sequences, `steps=20`, `context_frames=12`; Takes 425.65 seconds to generate on a RTX3080 GPU
14+
<br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/DDIM_context_frame_12.mp4" muted="false" width="320"></video>
15+
- **DPM++ 2M Karras**
16+
- 24 frames pose image sequences, `steps=20`, `context_frames=12`; Takes 407.48 seconds to generate on a RTX3080 GPU
17+
<br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/DPM++_2M_Karras_context_frame_12.mp4" muted="false" width="320"></video>
18+
- **LCM**
19+
- 24 frames pose image sequences, `steps=20`, `context_frames=24`; Takes 606.56 seconds to generate on a RTX3080 GPU
20+
<br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/LCM_context_frame_24.mp4" muted="false" width="320"></video>
21+
- Note:<br>*Pre-trained LCM Lora for SD1.5 does not working well here, since model is retrained for quite a long time steps from SD1.5 checkpoint, however retain a new lcm lora is feasible*
22+
- **Euler**
23+
- 24 frames pose image sequences, `steps=20`, `context_frames=12`; Takes 450.66 seconds to generate on a RTX3080 GPU
24+
<br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/Euler_context_frame_12.mp4" muted="false" width="320"></video>
25+
- **Euler Ancestral**
26+
- **LMS**
27+
- **PNDM**
28+
- Support add Lora
29+
- I did this for insert lcm lora
30+
- Support quite long pose image sequences
31+
- Tested on my RTX3080 GPU, can handle 120+ frames pose image sequences with `context_frames=24`
32+
- As long as system can fit all the pose image sequences inside a single tensor without GPU memory leak, then the main parameters will determine the GPU usage is `context_frames`, which does not correlate to the length of pose image sequences.
33+
- Current implementation is adopted from [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone),
34+
- I tried to break it down into as many modules as possible, so the workflow in ComfyUI would closely resemble the original pipeline from AnimateAnyone paper:
35+
<br>![_Example_Workflow\_Other_Imgs\AA_pipeline.png](_Example_Workflow/_Other_Imgs/AA_pipeline.png)
36+
37+
## Will Do Next
38+
- Train a LCM Lora for denoise unet (**Estimated speed up: 5X**)
39+
- Convert Model using [stable-fast](https://github.com/chengzeyi/stable-fast) (**Estimated speed up: 2X**)
40+
- Implement the compoents (Residual CFG) proposed in [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion?tab=readme-ov-file) (**Estimated speed up: 2X**)
41+
- Incorporate the implementation & Pre-trained Models from [Open-AnimateAnyone](https://github.com/guoqincode/Open-AnimateAnyone) & [AnimateAnyone](https://github.com/HumanAIGC/AnimateAnyone) once they released
42+
- Training a new Model using better dataset to improve results quality (Optional, we'll see if there is any need for me to do it ;)
43+
- Continuous research, always moving towards something better & faster🚀
44+
45+
## Install (Will add it to ComfyUI Manager Soon!)
46+
47+
1. Clone this repo into the `Your ComfyUI root directory\ComfyUI\custom_nodes\` and install dependent Python packages:
48+
```bash
49+
cd Your_ComfyUI_root_directory\ComfyUI\custom_nodes\
50+
51+
git clone https://github.com/MrForExample/ComfyUI-AnimateAnyone-Evolved.git
52+
53+
pip install -r requirements.txt
54+
```
55+
2. Download pre-trained models:
56+
- [stable-diffusion-v1-5_unet](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/unet)
57+
- [Moore-AnimateAnyone Pre-trained Models](https://huggingface.co/patrolli/AnimateAnyone/tree/main)
58+
- Above models need to be put under folder [pretrained_weights](./pretrained_weights/) as follow:
59+
```text
60+
./pretrained_weights/
61+
|-- denoising_unet.pth
62+
|-- motion_module.pth
63+
|-- pose_guider.pth
64+
|-- reference_unet.pth
65+
`-- stable-diffusion-v1-5
66+
|-- feature_extractor
67+
| `-- preprocessor_config.json
68+
|-- model_index.json
69+
|-- unet
70+
| |-- config.json
71+
| `-- diffusion_pytorch_model.bin
72+
`-- v1-inference.yaml
73+
```
74+
- Download clip image encoder (e.g. [sd-image-variations-diffusers ](https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder)) and put it under `Your_ComfyUI_root_directory\ComfyUI\models\clip_vision`
75+
- Download vae (e.g. [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse/tree/main)) and put it under `Your_ComfyUI_root_directory\ComfyUI\models\vae`

0 commit comments

Comments
 (0)