-
Notifications
You must be signed in to change notification settings - Fork 231
add support for MammothModa2 model #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
HonestDeng
wants to merge
79
commits into
vllm-project:main
Choose a base branch
from
HonestDeng:add-mammoth-moda2-support
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+4,054
−3
Open
Changes from all commits
Commits
Show all changes
79 commits
Select commit
Hold shift + click to select a range
cdaef14
register MammothModa2 model in registry.py
HonestDeng f95173e
add code skeleton
HonestDeng 0c0b611
add skeleton of ar and dit stage
HonestDeng 59ba5a1
constructs ar model
HonestDeng fb513ce
capture hidden states using hook
HonestDeng 7baa5e5
add input processors
HonestDeng 4f25a05
implement DiT stage
HonestDeng a68cdc0
remove code of capturing history hidden state
HonestDeng 0e007c0
delete redundant code
HonestDeng b6c8802
implement MammothModa2ARForConditionalGeneration using qwen2
HonestDeng a3e28ad
delete useless entry
HonestDeng 20a8a87
Fix MammothModa2 processor/tokenizer in spawn workers
HonestDeng 7a40266
Fix AutoConfig mapping for Mammoth VL subconfigs
HonestDeng 890ff4c
Load config.json successfully
HonestDeng 0d535f6
Add minimal Mammoth text token step debug script
HonestDeng 7371f98
Make Mammoth token-step script fail fast on missing vLLM platform
HonestDeng e653884
Handle OmniOutput in Mammoth compute_logits
HonestDeng 8eab22b
Fix MammothModa2 wrapper load_weights prefix and AR LM compat
HonestDeng e3b7a7b
Handle vLLM passing input_ids=None in Mammoth LM
HonestDeng 392d683
Use omni AR worker in Mammoth token-step; fix logits and OmniOutput
HonestDeng 299fe59
Expose VL token ids on Mammothmoda2Config for mrope
HonestDeng 7fd44f9
Add MammothModa2 Omni pipeline runner and text decode
HonestDeng c889d5d
Add image input support to MammothModa2 Omni example
HonestDeng 2a8081b
Add MammothModa2 unified entry + t2i pipeline scaffold
HonestDeng 2ea2b78
Limit MammothModa2 AR max_model_len to reduce KV cache
HonestDeng 0c52878
Fix MammothModa2 MoE helper for 2D hidden_states
HonestDeng 0f56070
Now we can generate image, but still bugs exist
HonestDeng a614fd9
insert eol token
HonestDeng df6c532
mammoth_moda2: build DiT condition from AR hidden states
HonestDeng 4025a12
mammoth_moda2: wire condition into DiT stage
HonestDeng f4b2a2a
generation_runner: pass runtime additional_information to models
HonestDeng aa35552
mammoth_moda2: align gen token ids to available hidden states
HonestDeng 559538a
mammoth_moda2: keep additional_information serializable
HonestDeng a6e524a
mammoth_moda2: fix DiT conditioning and RoPE freqs
HonestDeng 000c8d6
examples: simplify MammothModa2 default prompt
HonestDeng 677d671
transfer height and weight params
HonestDeng ca9e6a9
delete useless logic
HonestDeng 887a10d
delete backward-compatible codes
HonestDeng d86b20a
mammoth_moda2: align ar2dit masks with upstream
HonestDeng fd21182
mammoth_moda2: add DiT CFG params and guidance
HonestDeng fac1191
examples: derive ar grid from image size
HonestDeng 386b33c
delete backward-compatible code
HonestDeng 73715eb
delete useless arguments
HonestDeng d3632c3
construct dummy run params
HonestDeng 06e7b6c
delete useless code
HonestDeng cc7c2f8
move hard-code from runner
HonestDeng 1e670d7
simplify code
HonestDeng 41f96f8
generate eoi token
HonestDeng cc4f945
simplify code in ar2dit
HonestDeng fe238e9
delete useless file
HonestDeng c750217
recover arg_utils.py
HonestDeng 27b5ce3
merge main branch
HonestDeng 2f73e5c
Fix multimodal hooks and mrope handling
HonestDeng 37e0950
delete Chinese comment
HonestDeng 30761cb
simplify code
HonestDeng 7369cc7
delete _build_dummy_mm_embeddings function
HonestDeng 5f1d9b8
change Chinese comments to English
HonestDeng d81375e
refactor example
HonestDeng 3e38344
delete useless file and rename file
HonestDeng 1e2d343
delete useless ocnfig file
HonestDeng 752b2a3
delete Chinese comment
HonestDeng f8b5849
examples: support multi-prompt t2i outputs
HonestDeng 0b71f18
Merge upstream/main
HonestDeng 85e6f66
fix bug in calling _build_model_kwargs_extra
HonestDeng dbd18a9
examples: add MammothModa2 image summary
HonestDeng 397ae64
avoid sampling gen token
HonestDeng 0aef6b6
merge main brach
HonestDeng 79022c9
compute generated_len in runner
HonestDeng 9f2377a
run pre-commit
HonestDeng 39177ef
rename mammothmoda2_dit to mammothmoda2_dit_layer
HonestDeng 6d8326c
revert unrelated change
HonestDeng f0dbd06
revert change
HonestDeng 8092dbf
Merge remote-tracking branch 'upstream/main' into add-mammoth-moda2-s…
HonestDeng a468551
Restore gpu_model_runner.py to upstream/main
HonestDeng c3f4be9
remove redundant code
HonestDeng 7e1699a
remove useless code in transport and embedding
HonestDeng e1b0e14
remove useless code in TimeEmbedding
HonestDeng ebd801e
remove useless code in RMSNorm
HonestDeng 4996bd7
remove useless code in diffusion_transformer.py
HonestDeng File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
18 changes: 18 additions & 0 deletions
18
examples/offline_inference/mammothmodal2_preview/mammoth_moda2_image_summary.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| stage_args: | ||
| - stage_id: 0 | ||
| runtime: | ||
| devices: "0" | ||
| max_batch_size: 16 | ||
| engine_args: | ||
| model_stage: ar | ||
| model_arch: MammothModa2ForConditionalGeneration | ||
| worker_cls: vllm_omni.worker.gpu_ar_worker.GPUARWorker | ||
| scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler | ||
| max_model_len: 8192 | ||
| gpu_memory_utilization: 0.5 | ||
| enforce_eager: true | ||
| trust_remote_code: true | ||
| engine_output_type: text | ||
| enable_prefix_caching: false | ||
| final_output: true | ||
| final_output_type: text | ||
147 changes: 147 additions & 0 deletions
147
examples/offline_inference/mammothmodal2_preview/run_mammothmoda2_image_summary.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| """ | ||
| Offline inference example: MammothModa2 image summarization (single AR stage). | ||
|
|
||
| Example: | ||
| uv run python examples/offline_inference/mammothmodal2_preview/run_mammothmoda2_image_summary.py \ | ||
| --model /data/datasets/models-hf/MammothModa2-Preview \ | ||
| --image /path/to/input.jpg \ | ||
| --question "Please summarize the content of this image." | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import argparse | ||
| import os | ||
| from pathlib import Path | ||
|
|
||
| from PIL import Image | ||
| from vllm import SamplingParams | ||
| from vllm.multimodal.image import convert_image_mode | ||
|
|
||
| from vllm_omni import Omni | ||
|
|
||
| DEFAULT_SYSTEM = "You are a helpful assistant." | ||
| DEFAULT_QUESTION = "Please summarize the content of this image." | ||
|
|
||
|
|
||
| def parse_args() -> argparse.Namespace: | ||
| parser = argparse.ArgumentParser(description="MammothModa2 image summarization (offline, AR only).") | ||
| parser.add_argument( | ||
| "--model", | ||
| type=str, | ||
| default="/data/datasets/models-hf/MammothModa2-Preview", | ||
| help="Path to model directory or model id.", | ||
| ) | ||
| parser.add_argument( | ||
| "--stage-config", | ||
| type=str, | ||
| default=str(Path(__file__).with_name("mammoth_moda2_image_summary.yaml")), | ||
| help="Path to stage config yaml (single-stage AR->text).", | ||
| ) | ||
| parser.add_argument( | ||
| "--image", | ||
| type=str, | ||
| required=True, | ||
| help="Path to input image.", | ||
| ) | ||
| parser.add_argument( | ||
| "--question", | ||
| type=str, | ||
| default=DEFAULT_QUESTION, | ||
| help="Question/instruction for the model.", | ||
| ) | ||
| parser.add_argument( | ||
| "--system", | ||
| type=str, | ||
| default=DEFAULT_SYSTEM, | ||
| help="System prompt.", | ||
| ) | ||
| parser.add_argument( | ||
| "--max-tokens", | ||
| type=int, | ||
| default=512, | ||
| help="Max new tokens to generate.", | ||
| ) | ||
| parser.add_argument("--temperature", type=float, default=0.2) | ||
| parser.add_argument("--top-p", type=float, default=0.9) | ||
| parser.add_argument("--seed", type=int, default=42) | ||
| parser.add_argument("--trust-remote-code", action="store_true") | ||
| parser.add_argument( | ||
| "--out", | ||
| type=str, | ||
| default="image_summary.txt", | ||
| help="Path to save output text.", | ||
| ) | ||
| return parser.parse_args() | ||
|
|
||
|
|
||
| def build_prompt(system: str, question: str) -> str: | ||
| return ( | ||
| f"<|im_start|>system\n{system}<|im_end|>\n" | ||
| "<|im_start|>user\n" | ||
| "<|vision_start|><|image_pad|><|vision_end|>" | ||
| f"{question}<|im_end|>\n" | ||
| "<|im_start|>assistant\n" | ||
| ) | ||
|
|
||
|
|
||
| def main() -> None: | ||
| args = parse_args() | ||
|
|
||
| if not os.path.exists(args.image): | ||
| raise FileNotFoundError(f"Image file not found: {args.image}") | ||
|
|
||
| os.makedirs(os.path.dirname(args.out) or ".", exist_ok=True) | ||
|
|
||
| pil_image = Image.open(args.image) | ||
| image_data = convert_image_mode(pil_image, "RGB") | ||
| prompt = build_prompt(args.system, args.question) | ||
|
|
||
| omni = Omni( | ||
| model=args.model, | ||
| stage_configs_path=args.stage_config, | ||
| trust_remote_code=args.trust_remote_code, | ||
| ) | ||
| try: | ||
| sp = SamplingParams( | ||
| temperature=float(args.temperature), | ||
| top_p=float(args.top_p), | ||
| top_k=-1, | ||
| max_tokens=int(args.max_tokens), | ||
| seed=int(args.seed), | ||
| detokenize=True, | ||
| ) | ||
| outputs = omni.generate( | ||
| [ | ||
| { | ||
| "prompt": prompt, | ||
| "multi_modal_data": {"image": image_data}, | ||
| } | ||
| ], | ||
| [sp], | ||
| ) | ||
| finally: | ||
| omni.close() | ||
|
|
||
| if not isinstance(outputs, list): | ||
| outputs = [outputs] | ||
|
|
||
| lines: list[str] = [] | ||
| for stage_outputs in outputs: | ||
| req_outputs = getattr(stage_outputs, "request_output", stage_outputs) | ||
| req_outputs = req_outputs if isinstance(req_outputs, list) else [req_outputs] | ||
| for ro in req_outputs: | ||
| text = ro.outputs[0].text if getattr(ro, "outputs", None) else str(ro) | ||
| lines.append(f"request_id: {getattr(ro, 'request_id', 'unknown')}\n") | ||
| lines.append("answer:\n") | ||
| lines.append(text.strip() + "\n") | ||
| lines.append("\n") | ||
|
|
||
| with open(args.out, "w", encoding="utf-8") as f: | ||
| f.writelines(lines) | ||
|
|
||
| print(f"[OK] Saved summary to: {args.out}") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.