CogVideoX1.5-5B-I2V 4090 24g Cannot run #673

hjj-lmx · 2025-01-17T02:15:22Z

System Info / 系統信息

ubuntu22.04

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

    quantization = int8_weight_only
    model_path = os.path.join("checkpoints", "CogVideoX1.5-5B-I2V")

    text_encoder = T5EncoderModel.from_pretrained(model_path, subfolder="text_encoder", torch_dtype=torch.bfloat16)
    quantize_(text_encoder, quantization())

    transformer = CogVideoXTransformer3DModel.from_pretrained(model_path, subfolder="transformer",
                                                              torch_dtype=torch.bfloat16)
    quantize_(transformer, quantization())

    vae = AutoencoderKLCogVideoX.from_pretrained(model_path, subfolder="vae", torch_dtype=torch.bfloat16)
    quantize_(vae, quantization())

    # Create pipeline and run inference
    self.pipe = CogVideoXImageToVideoPipeline.from_pretrained(
        model_path,
        text_encoder=text_encoder,
        transformer=transformer,
        vae=vae,
        torch_dtype=torch.bfloat16,
    )
    self.pipe.enable_model_cpu_offload()
    self.pipe.vae.enable_tiling()
    self.pipe.vae.enable_slicing()

Expected behavior / 期待表现

正常运行

The text was updated successfully, but these errors were encountered:

zRzRzRzRzRzRzR · 2025-01-17T02:27:15Z

这个int8需要的成本比直接用BF16还高，直接用BF16运行能在24G内完成，而且不能用enable_model_cpu_offload
需要用cli demo的更节约成本的sequesual offload，运行一次大概在20分钟

hjj-lmx · 2025-01-17T02:30:10Z

这个int需要的成本比直接用BF16还高，直接用BF16运行能在24G内完成，而且不能用enable_model_cpu_offload 需要用cli demo的更节省后续卸载的成本，运行一次大概在20分钟

BF16是哪种方式，huggingface里面的的第一种demo吗？cli demo的方式是最快的吗？
CogVideoX1.5-5B-I2V和CogVideoX-5B-I2V哪一个好点

import torch
from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

prompt = "A little girl is riding a bicycle at high speed. Focused, detailed, realistic."
image = load_image(image="input.jpg")
pipe = CogVideoXImageToVideoPipeline.from_pretrained(
"THUDM/CogVideoX-5b-I2V",
torch_dtype=torch.bfloat16
)

pipe.enable_sequential_cpu_offload()
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

video = pipe(
prompt=prompt,
image=image,
num_videos_per_prompt=1,
num_inference_steps=50,
num_frames=49,
guidance_scale=6,
generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

export_to_video(video, "output.mp4", fps=8)
这种写法我在4090是也报gpu不够

zRzRzRzRzRzRzR · 2025-01-20T03:16:55Z

I see, you need to specify the resolution, specify height and width, otherwise the sample will be 300 and 300, not the normal case. You can pay attention to our cli_demo, where every parameter is required.

hjj-lmx · 2025-01-20T03:24:39Z

我明白了，你需要指定分辨率，指定高度和宽度，否则示例将是 300 和 300，不是正常情况。你可以关注我们的 cli_demo，其中每个参数都是必需的。
是使用的 cli_demo，但是我不是使用的CogVideoX1.5-5B-I2，而是使用的CogVideoX-5B-I2V，因为CogVideoX1.5-5B-I2在4090是不能正常运行，会报错GPU内存不足

下面是我的代码
import logging
import os.path

import torch
from diffusers import CogVideoXImageToVideoPipeline, CogVideoXDPMScheduler

class CogVideoXModel:
name = "CogVideoXModel"

def __init__(self):
    model_path = os.path.join("checkpoints", "CogVideoX-5b-I2V")

    self.pipe = CogVideoXImageToVideoPipeline.from_pretrained(
        model_path,
        torch_dtype=torch.bfloat16
    )
    self.pipe.scheduler = CogVideoXDPMScheduler.from_config(self.pipe.scheduler.config, timestep_spacing="trailing")

    self.pipe.enable_sequential_cpu_offload()
    self.pipe.vae.enable_tiling()
    self.pipe.vae.enable_slicing()

def inference(self, prompt, image):
    width, height = image.size
    print(f"Width: {width}, Height: {height}")
    video = self.pipe(
        height=height,
        width=width,
        prompt=prompt,
        image=image,
        use_dynamic_cfg=True,
        num_videos_per_prompt=1,
        num_inference_steps=50,
        num_frames=81,
        guidance_scale=16,
        generator=torch.Generator().manual_seed(42),
    ).frames[0]
    return video

zRzRzRzRzRzRzR · 2025-01-20T09:03:55Z

这两个都得指定 num_frames 和 height 和width 1.0 num_frames是49 height 480 width 720

hjj-lmx · 2025-01-20T09:26:12Z

这两个都得指定 num_frames 和 height 和width 1.0 num_frames是49 height 480 width 720
你说的是什么意思，我看文档，CogVideoX-5B-I2V说是固定高度720*480，我现在想根据传入图片的height，width来确定视频的分辨率，是否可以实现
我如果使用CogVideoX-5B-I2V，height，width传入值就报错，如果是使用CogVideoX1.5-5B-I2，在4090上根本使用不了

zRzRzRzRzRzRzR · 2025-01-21T15:39:54Z

不能，因为CogVideoX-5B-I2V是固定的720*480，所以你的height必须传，且传为720，height必须传且为480
对于 CogVideoX-5B1.5-I2V 则height 和widht必须整除16且满足readme中那个表的要求，必须按照要求写入和传递参数。

zRzRzRzRzRzRzR self-assigned this Jan 17, 2025

zRzRzRzRzRzRzR closed this as completed Jan 20, 2025

zRzRzRzRzRzRzR reopened this Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CogVideoX1.5-5B-I2V 4090 24g Cannot run #673

CogVideoX1.5-5B-I2V 4090 24g Cannot run #673

hjj-lmx commented Jan 17, 2025

zRzRzRzRzRzRzR commented Jan 17, 2025

hjj-lmx commented Jan 17, 2025 •

edited

Loading

zRzRzRzRzRzRzR commented Jan 20, 2025

hjj-lmx commented Jan 20, 2025 •

edited

Loading

zRzRzRzRzRzRzR commented Jan 20, 2025

hjj-lmx commented Jan 20, 2025

zRzRzRzRzRzRzR commented Jan 21, 2025

CogVideoX1.5-5B-I2V 4090 24g Cannot run #673

CogVideoX1.5-5B-I2V 4090 24g Cannot run #673

Comments

hjj-lmx commented Jan 17, 2025

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

zRzRzRzRzRzRzR commented Jan 17, 2025

hjj-lmx commented Jan 17, 2025 • edited Loading

zRzRzRzRzRzRzR commented Jan 20, 2025

hjj-lmx commented Jan 20, 2025 • edited Loading

zRzRzRzRzRzRzR commented Jan 20, 2025

hjj-lmx commented Jan 20, 2025

zRzRzRzRzRzRzR commented Jan 21, 2025

hjj-lmx commented Jan 17, 2025 •

edited

Loading

hjj-lmx commented Jan 20, 2025 •

edited

Loading