|
1 | 1 | # ComfyUI-AnimateAnyone-Evolved
|
2 |
| - Improved AnimateAnyone implementation that allows you to use the opse image sequence and reference image to generate stylized video |
| 2 | + Improved AnimateAnyone implementation that allows you to use the opse image sequence and reference image to generate stylized video.<br> |
| 3 | + ***The current goal of this project is to achieve desired pose2video result with 1+FPS on GPUs that are equal to or better than RTX 3080!🚀*** |
| 4 | + |
| 5 | +<br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/Test2Show-ChunLi.mp4" muted="false"></video> |
| 6 | + |
| 7 | +## Currently Support |
| 8 | +- Please check **[example workflows](./_Example_Workflow/)** for usage. You can use [Test Inputs](./_Example_Workflow/_Test_Inputs/) to generate the exactly same results that I showed here. (I got Chun-Li image from [civitai](https://civitai.com/images/3034077)) |
| 9 | +- Support different sampler & scheduler: |
| 10 | + - **DDIM** |
| 11 | + - 24 frames pose image sequences, `steps=20`, `context_frames=24`; Takes 835.67 seconds to generate on a RTX3080 GPU |
| 12 | + <br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/DDIM_context_frame_24.mp4" muted="false" width="320"></video> |
| 13 | + - 24 frames pose image sequences, `steps=20`, `context_frames=12`; Takes 425.65 seconds to generate on a RTX3080 GPU |
| 14 | + <br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/DDIM_context_frame_12.mp4" muted="false" width="320"></video> |
| 15 | + - **DPM++ 2M Karras** |
| 16 | + - 24 frames pose image sequences, `steps=20`, `context_frames=12`; Takes 407.48 seconds to generate on a RTX3080 GPU |
| 17 | + <br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/DPM++_2M_Karras_context_frame_12.mp4" muted="false" width="320"></video> |
| 18 | + - **LCM** |
| 19 | + - 24 frames pose image sequences, `steps=20`, `context_frames=24`; Takes 606.56 seconds to generate on a RTX3080 GPU |
| 20 | + <br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/LCM_context_frame_24.mp4" muted="false" width="320"></video> |
| 21 | + - Note:<br>*Pre-trained LCM Lora for SD1.5 does not working well here, since model is retrained for quite a long time steps from SD1.5 checkpoint, however retain a new lcm lora is feasible* |
| 22 | + - **Euler** |
| 23 | + - 24 frames pose image sequences, `steps=20`, `context_frames=12`; Takes 450.66 seconds to generate on a RTX3080 GPU |
| 24 | + <br><video controls autoplay loop src="./_Example_Workflow/_Test_Results/Euler_context_frame_12.mp4" muted="false" width="320"></video> |
| 25 | + - **Euler Ancestral** |
| 26 | + - **LMS** |
| 27 | + - **PNDM** |
| 28 | +- Support add Lora |
| 29 | + - I did this for insert lcm lora |
| 30 | +- Support quite long pose image sequences |
| 31 | + - Tested on my RTX3080 GPU, can handle 120+ frames pose image sequences with `context_frames=24` |
| 32 | + - As long as system can fit all the pose image sequences inside a single tensor without GPU memory leak, then the main parameters will determine the GPU usage is `context_frames`, which does not correlate to the length of pose image sequences. |
| 33 | +- Current implementation is adopted from [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone), |
| 34 | + - I tried to break it down into as many modules as possible, so the workflow in ComfyUI would closely resemble the original pipeline from AnimateAnyone paper: |
| 35 | + <br> |
| 36 | + |
| 37 | +## Will Do Next |
| 38 | +- Train a LCM Lora for denoise unet (**Estimated speed up: 5X**) |
| 39 | +- Convert Model using [stable-fast](https://github.com/chengzeyi/stable-fast) (**Estimated speed up: 2X**) |
| 40 | +- Implement the compoents (Residual CFG) proposed in [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion?tab=readme-ov-file) (**Estimated speed up: 2X**) |
| 41 | +- Incorporate the implementation & Pre-trained Models from [Open-AnimateAnyone](https://github.com/guoqincode/Open-AnimateAnyone) & [AnimateAnyone](https://github.com/HumanAIGC/AnimateAnyone) once they released |
| 42 | +- Training a new Model using better dataset to improve results quality (Optional, we'll see if there is any need for me to do it ;) |
| 43 | +- Continuous research, always moving towards something better & faster🚀 |
| 44 | + |
| 45 | +## Install (Will add it to ComfyUI Manager Soon!) |
| 46 | + |
| 47 | +1. Clone this repo into the `Your ComfyUI root directory\ComfyUI\custom_nodes\` and install dependent Python packages: |
| 48 | + ```bash |
| 49 | + cd Your_ComfyUI_root_directory\ComfyUI\custom_nodes\ |
| 50 | + |
| 51 | + git clone https://github.com/MrForExample/ComfyUI-AnimateAnyone-Evolved.git |
| 52 | + |
| 53 | + pip install -r requirements.txt |
| 54 | + ``` |
| 55 | +2. Download pre-trained models: |
| 56 | + - [stable-diffusion-v1-5_unet](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main/unet) |
| 57 | + - [Moore-AnimateAnyone Pre-trained Models](https://huggingface.co/patrolli/AnimateAnyone/tree/main) |
| 58 | + - Above models need to be put under folder [pretrained_weights](./pretrained_weights/) as follow: |
| 59 | + ```text |
| 60 | + ./pretrained_weights/ |
| 61 | + |-- denoising_unet.pth |
| 62 | + |-- motion_module.pth |
| 63 | + |-- pose_guider.pth |
| 64 | + |-- reference_unet.pth |
| 65 | + `-- stable-diffusion-v1-5 |
| 66 | + |-- feature_extractor |
| 67 | + | `-- preprocessor_config.json |
| 68 | + |-- model_index.json |
| 69 | + |-- unet |
| 70 | + | |-- config.json |
| 71 | + | `-- diffusion_pytorch_model.bin |
| 72 | + `-- v1-inference.yaml |
| 73 | + ``` |
| 74 | + - Download clip image encoder (e.g. [sd-image-variations-diffusers ](https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder)) and put it under `Your_ComfyUI_root_directory\ComfyUI\models\clip_vision` |
| 75 | + - Download vae (e.g. [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse/tree/main)) and put it under `Your_ComfyUI_root_directory\ComfyUI\models\vae` |
0 commit comments