Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CogVideoX1.5-5B-I2V 4090 24g Cannot run #673

Open
2 tasks done
hjj-lmx opened this issue Jan 17, 2025 · 7 comments
Open
2 tasks done

CogVideoX1.5-5B-I2V 4090 24g Cannot run #673

hjj-lmx opened this issue Jan 17, 2025 · 7 comments
Assignees

Comments

@hjj-lmx
Copy link

hjj-lmx commented Jan 17, 2025

System Info / 系統信息

ubuntu22.04

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

    quantization = int8_weight_only
    model_path = os.path.join("checkpoints", "CogVideoX1.5-5B-I2V")

    text_encoder = T5EncoderModel.from_pretrained(model_path, subfolder="text_encoder", torch_dtype=torch.bfloat16)
    quantize_(text_encoder, quantization())

    transformer = CogVideoXTransformer3DModel.from_pretrained(model_path, subfolder="transformer",
                                                              torch_dtype=torch.bfloat16)
    quantize_(transformer, quantization())

    vae = AutoencoderKLCogVideoX.from_pretrained(model_path, subfolder="vae", torch_dtype=torch.bfloat16)
    quantize_(vae, quantization())

    # Create pipeline and run inference
    self.pipe = CogVideoXImageToVideoPipeline.from_pretrained(
        model_path,
        text_encoder=text_encoder,
        transformer=transformer,
        vae=vae,
        torch_dtype=torch.bfloat16,
    )
    self.pipe.enable_model_cpu_offload()
    self.pipe.vae.enable_tiling()
    self.pipe.vae.enable_slicing()

Expected behavior / 期待表现

正常运行

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Jan 17, 2025
@zRzRzRzRzRzRzR
Copy link
Member

这个int8需要的成本比直接用BF16还高,直接用BF16运行能在24G内完成,而且不能用enable_model_cpu_offload
需要用cli demo的更节约成本的sequesual offload,运行一次大概在20分钟

@hjj-lmx
Copy link
Author

hjj-lmx commented Jan 17, 2025

这个int需要的成本比直接用BF16还高,直接用BF16运行能在24G内完成,而且不能用enable_model_cpu_offload 需要用cli demo的更节省后续卸载的成本,运行一次大概在20分钟

BF16是哪种方式,huggingface里面的的第一种demo吗?cli demo的方式是最快的吗?
CogVideoX1.5-5B-I2V和CogVideoX-5B-I2V哪一个好点

import torch
from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

prompt = "A little girl is riding a bicycle at high speed. Focused, detailed, realistic."
image = load_image(image="input.jpg")
pipe = CogVideoXImageToVideoPipeline.from_pretrained(
"THUDM/CogVideoX-5b-I2V",
torch_dtype=torch.bfloat16
)

pipe.enable_sequential_cpu_offload()
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

video = pipe(
prompt=prompt,
image=image,
num_videos_per_prompt=1,
num_inference_steps=50,
num_frames=49,
guidance_scale=6,
generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

export_to_video(video, "output.mp4", fps=8)
这种写法我在4090是也报gpu不够

@zRzRzRzRzRzRzR
Copy link
Member

I see, you need to specify the resolution, specify height and width, otherwise the sample will be 300 and 300, not the normal case. You can pay attention to our cli_demo, where every parameter is required.

@hjj-lmx
Copy link
Author

hjj-lmx commented Jan 20, 2025

我明白了,你需要指定分辨率,指定高度和宽度,否则示例将是 300 和 300,不是正常情况。你可以关注我们的 cli_demo,其中每个参数都是必需的。
是使用的 cli_demo,但是我不是使用的CogVideoX1.5-5B-I2,而是使用的CogVideoX-5B-I2V,因为CogVideoX1.5-5B-I2在4090是不能正常运行,会报错GPU内存不足

下面是我的代码
import logging
import os.path

import torch
from diffusers import CogVideoXImageToVideoPipeline, CogVideoXDPMScheduler

class CogVideoXModel:
name = "CogVideoXModel"

def __init__(self):
    model_path = os.path.join("checkpoints", "CogVideoX-5b-I2V")

    self.pipe = CogVideoXImageToVideoPipeline.from_pretrained(
        model_path,
        torch_dtype=torch.bfloat16
    )
    self.pipe.scheduler = CogVideoXDPMScheduler.from_config(self.pipe.scheduler.config, timestep_spacing="trailing")

    self.pipe.enable_sequential_cpu_offload()
    self.pipe.vae.enable_tiling()
    self.pipe.vae.enable_slicing()

def inference(self, prompt, image):
    width, height = image.size
    print(f"Width: {width}, Height: {height}")
    video = self.pipe(
        height=height,
        width=width,
        prompt=prompt,
        image=image,
        use_dynamic_cfg=True,
        num_videos_per_prompt=1,
        num_inference_steps=50,
        num_frames=81,
        guidance_scale=16,
        generator=torch.Generator().manual_seed(42),
    ).frames[0]
    return video

@zRzRzRzRzRzRzR
Copy link
Member

这两个都得指定 num_frames 和 height 和width 1.0 num_frames是49 height 480 width 720

@hjj-lmx
Copy link
Author

hjj-lmx commented Jan 20, 2025

这两个都得指定 num_frames 和 height 和width 1.0 num_frames是49 height 480 width 720
你说的是什么意思,我看文档,CogVideoX-5B-I2V说是固定高度720*480,我现在想根据传入图片的height,width来确定视频的分辨率,是否可以实现
我如果使用CogVideoX-5B-I2V,height,width传入值就报错,如果是使用CogVideoX1.5-5B-I2,在4090上根本使用不了

@zRzRzRzRzRzRzR
Copy link
Member

不能,因为CogVideoX-5B-I2V是固定的720*480,所以你的height必须传,且传为720,height必须传且为480
对于 CogVideoX-5B1.5-I2V 则height 和widht必须整除16且满足readme中那个表的要求,必须按照要求写入和传递参数。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants