Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
cdaef14
register MammothModa2 model in registry.py
HonestDeng Dec 16, 2025
f95173e
add code skeleton
HonestDeng Dec 16, 2025
0c0b611
add skeleton of ar and dit stage
HonestDeng Dec 16, 2025
59ba5a1
constructs ar model
HonestDeng Dec 17, 2025
fb513ce
capture hidden states using hook
HonestDeng Dec 17, 2025
7baa5e5
add input processors
HonestDeng Dec 17, 2025
4f25a05
implement DiT stage
HonestDeng Dec 17, 2025
a68cdc0
remove code of capturing history hidden state
HonestDeng Dec 17, 2025
0e007c0
delete redundant code
HonestDeng Dec 17, 2025
b6c8802
implement MammothModa2ARForConditionalGeneration using qwen2
HonestDeng Dec 17, 2025
a3e28ad
delete useless entry
HonestDeng Dec 17, 2025
20a8a87
Fix MammothModa2 processor/tokenizer in spawn workers
HonestDeng Dec 18, 2025
7a40266
Fix AutoConfig mapping for Mammoth VL subconfigs
HonestDeng Dec 18, 2025
890ff4c
Load config.json successfully
HonestDeng Dec 18, 2025
0d535f6
Add minimal Mammoth text token step debug script
HonestDeng Dec 18, 2025
7371f98
Make Mammoth token-step script fail fast on missing vLLM platform
HonestDeng Dec 18, 2025
e653884
Handle OmniOutput in Mammoth compute_logits
HonestDeng Dec 18, 2025
8eab22b
Fix MammothModa2 wrapper load_weights prefix and AR LM compat
HonestDeng Dec 18, 2025
e3b7a7b
Handle vLLM passing input_ids=None in Mammoth LM
HonestDeng Dec 18, 2025
392d683
Use omni AR worker in Mammoth token-step; fix logits and OmniOutput
HonestDeng Dec 19, 2025
299fe59
Expose VL token ids on Mammothmoda2Config for mrope
HonestDeng Dec 19, 2025
7fd44f9
Add MammothModa2 Omni pipeline runner and text decode
HonestDeng Dec 19, 2025
c889d5d
Add image input support to MammothModa2 Omni example
HonestDeng Dec 19, 2025
2a8081b
Add MammothModa2 unified entry + t2i pipeline scaffold
HonestDeng Dec 20, 2025
2ea2b78
Limit MammothModa2 AR max_model_len to reduce KV cache
HonestDeng Dec 20, 2025
0c52878
Fix MammothModa2 MoE helper for 2D hidden_states
HonestDeng Dec 20, 2025
0f56070
Now we can generate image, but still bugs exist
HonestDeng Dec 21, 2025
a614fd9
insert eol token
HonestDeng Dec 21, 2025
df6c532
mammoth_moda2: build DiT condition from AR hidden states
HonestDeng Dec 22, 2025
4025a12
mammoth_moda2: wire condition into DiT stage
HonestDeng Dec 22, 2025
f4b2a2a
generation_runner: pass runtime additional_information to models
HonestDeng Dec 22, 2025
aa35552
mammoth_moda2: align gen token ids to available hidden states
HonestDeng Dec 22, 2025
559538a
mammoth_moda2: keep additional_information serializable
HonestDeng Dec 22, 2025
a6e524a
mammoth_moda2: fix DiT conditioning and RoPE freqs
HonestDeng Dec 22, 2025
000c8d6
examples: simplify MammothModa2 default prompt
HonestDeng Dec 22, 2025
677d671
transfer height and weight params
HonestDeng Dec 22, 2025
ca9e6a9
delete useless logic
HonestDeng Dec 22, 2025
887a10d
delete backward-compatible codes
HonestDeng Dec 22, 2025
d86b20a
mammoth_moda2: align ar2dit masks with upstream
HonestDeng Dec 22, 2025
fd21182
mammoth_moda2: add DiT CFG params and guidance
HonestDeng Dec 23, 2025
fac1191
examples: derive ar grid from image size
HonestDeng Dec 23, 2025
386b33c
delete backward-compatible code
HonestDeng Dec 23, 2025
73715eb
delete useless arguments
HonestDeng Dec 23, 2025
d3632c3
construct dummy run params
HonestDeng Dec 23, 2025
06e7b6c
delete useless code
HonestDeng Dec 23, 2025
cc7c2f8
move hard-code from runner
HonestDeng Dec 23, 2025
1e670d7
simplify code
HonestDeng Dec 23, 2025
41f96f8
generate eoi token
HonestDeng Dec 23, 2025
cc4f945
simplify code in ar2dit
HonestDeng Dec 23, 2025
fe238e9
delete useless file
HonestDeng Dec 23, 2025
c750217
recover arg_utils.py
HonestDeng Dec 24, 2025
27b5ce3
merge main branch
HonestDeng Dec 24, 2025
2f73e5c
Fix multimodal hooks and mrope handling
HonestDeng Dec 24, 2025
37e0950
delete Chinese comment
HonestDeng Dec 24, 2025
30761cb
simplify code
HonestDeng Dec 25, 2025
7369cc7
delete _build_dummy_mm_embeddings function
HonestDeng Dec 25, 2025
5f1d9b8
change Chinese comments to English
HonestDeng Dec 25, 2025
d81375e
refactor example
HonestDeng Dec 25, 2025
3e38344
delete useless file and rename file
HonestDeng Dec 25, 2025
1e2d343
delete useless ocnfig file
HonestDeng Dec 26, 2025
752b2a3
delete Chinese comment
HonestDeng Dec 26, 2025
f8b5849
examples: support multi-prompt t2i outputs
HonestDeng Dec 26, 2025
0b71f18
Merge upstream/main
HonestDeng Dec 26, 2025
85e6f66
fix bug in calling _build_model_kwargs_extra
HonestDeng Dec 26, 2025
dbd18a9
examples: add MammothModa2 image summary
HonestDeng Dec 26, 2025
397ae64
avoid sampling gen token
HonestDeng Dec 26, 2025
0aef6b6
merge main brach
HonestDeng Dec 27, 2025
79022c9
compute generated_len in runner
HonestDeng Dec 27, 2025
9f2377a
run pre-commit
HonestDeng Dec 27, 2025
39177ef
rename mammothmoda2_dit to mammothmoda2_dit_layer
HonestDeng Dec 27, 2025
6d8326c
revert unrelated change
HonestDeng Dec 27, 2025
f0dbd06
revert change
HonestDeng Dec 27, 2025
8092dbf
Merge remote-tracking branch 'upstream/main' into add-mammoth-moda2-s…
HonestDeng Dec 28, 2025
a468551
Restore gpu_model_runner.py to upstream/main
HonestDeng Dec 29, 2025
c3f4be9
remove redundant code
HonestDeng Dec 29, 2025
7e1699a
remove useless code in transport and embedding
HonestDeng Dec 29, 2025
e1b0e14
remove useless code in TimeEmbedding
HonestDeng Dec 29, 2025
ebd801e
remove useless code in RMSNorm
HonestDeng Dec 29, 2025
4996bd7
remove useless code in diffusion_transformer.py
HonestDeng Dec 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
stage_args:
- stage_id: 0
runtime:
devices: "0"
max_batch_size: 16
engine_args:
model_stage: ar
model_arch: MammothModa2ForConditionalGeneration
worker_cls: vllm_omni.worker.gpu_ar_worker.GPUARWorker
scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler
max_model_len: 8192
gpu_memory_utilization: 0.5
enforce_eager: true
trust_remote_code: true
engine_output_type: text
enable_prefix_caching: false
final_output: true
final_output_type: text
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
"""
Offline inference example: MammothModa2 image summarization (single AR stage).

Example:
uv run python examples/offline_inference/mammothmodal2_preview/run_mammothmoda2_image_summary.py \
--model /data/datasets/models-hf/MammothModa2-Preview \
--image /path/to/input.jpg \
--question "Please summarize the content of this image."
"""

from __future__ import annotations

import argparse
import os
from pathlib import Path

from PIL import Image
from vllm import SamplingParams
from vllm.multimodal.image import convert_image_mode

from vllm_omni import Omni

DEFAULT_SYSTEM = "You are a helpful assistant."
DEFAULT_QUESTION = "Please summarize the content of this image."


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="MammothModa2 image summarization (offline, AR only).")
parser.add_argument(
"--model",
type=str,
default="/data/datasets/models-hf/MammothModa2-Preview",
help="Path to model directory or model id.",
)
parser.add_argument(
"--stage-config",
type=str,
default=str(Path(__file__).with_name("mammoth_moda2_image_summary.yaml")),
help="Path to stage config yaml (single-stage AR->text).",
)
parser.add_argument(
"--image",
type=str,
required=True,
help="Path to input image.",
)
parser.add_argument(
"--question",
type=str,
default=DEFAULT_QUESTION,
help="Question/instruction for the model.",
)
parser.add_argument(
"--system",
type=str,
default=DEFAULT_SYSTEM,
help="System prompt.",
)
parser.add_argument(
"--max-tokens",
type=int,
default=512,
help="Max new tokens to generate.",
)
parser.add_argument("--temperature", type=float, default=0.2)
parser.add_argument("--top-p", type=float, default=0.9)
parser.add_argument("--seed", type=int, default=42)
parser.add_argument("--trust-remote-code", action="store_true")
parser.add_argument(
"--out",
type=str,
default="image_summary.txt",
help="Path to save output text.",
)
return parser.parse_args()


def build_prompt(system: str, question: str) -> str:
return (
f"<|im_start|>system\n{system}<|im_end|>\n"
"<|im_start|>user\n"
"<|vision_start|><|image_pad|><|vision_end|>"
f"{question}<|im_end|>\n"
"<|im_start|>assistant\n"
)


def main() -> None:
args = parse_args()

if not os.path.exists(args.image):
raise FileNotFoundError(f"Image file not found: {args.image}")

os.makedirs(os.path.dirname(args.out) or ".", exist_ok=True)

pil_image = Image.open(args.image)
image_data = convert_image_mode(pil_image, "RGB")
prompt = build_prompt(args.system, args.question)

omni = Omni(
model=args.model,
stage_configs_path=args.stage_config,
trust_remote_code=args.trust_remote_code,
)
try:
sp = SamplingParams(
temperature=float(args.temperature),
top_p=float(args.top_p),
top_k=-1,
max_tokens=int(args.max_tokens),
seed=int(args.seed),
detokenize=True,
)
outputs = omni.generate(
[
{
"prompt": prompt,
"multi_modal_data": {"image": image_data},
}
],
[sp],
)
finally:
omni.close()

if not isinstance(outputs, list):
outputs = [outputs]

lines: list[str] = []
for stage_outputs in outputs:
req_outputs = getattr(stage_outputs, "request_output", stage_outputs)
req_outputs = req_outputs if isinstance(req_outputs, list) else [req_outputs]
for ro in req_outputs:
text = ro.outputs[0].text if getattr(ro, "outputs", None) else str(ro)
lines.append(f"request_id: {getattr(ro, 'request_id', 'unknown')}\n")
lines.append("answer:\n")
lines.append(text.strip() + "\n")
lines.append("\n")

with open(args.out, "w", encoding="utf-8") as f:
f.writelines(lines)

print(f"[OK] Saved summary to: {args.out}")


if __name__ == "__main__":
main()
Loading