Skip to content

Commit 4e5e737

Browse files
wasnertobiasclaude
andcommitted
Logos: Skip calibration retry for unsupported architectures and upgrade transformers
When vLLM fails with "does not recognize this architecture", skip TP escalation retries since more GPUs cannot fix an unsupported model type. Also upgrade transformers in the Dockerfile to support newer architectures like gemma4 out of the box. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent e5674ec commit 4e5e737

2 files changed

Lines changed: 11 additions & 3 deletions

File tree

logos/logos-workernode/Dockerfile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,9 @@ RUN set -e \
147147
&& if [ -n "$NCCL_PIP_SPEC" ]; then \
148148
pip install --no-cache-dir --force-reinstall --no-deps "$NCCL_PIP_SPEC"; \
149149
fi \
150-
# 4. FlashInfer (install AFTER torch so it picks up the correct version)
150+
# 4. Upgrade transformers for latest model architecture support
151+
&& pip install --no-cache-dir --upgrade transformers \
152+
# 5. FlashInfer (install AFTER torch so it picks up the correct version)
151153
&& pip install --no-cache-dir --no-deps flashinfer-python; \
152154
fi
153155

logos/logos-workernode/logos_worker_node/calibration.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -500,9 +500,12 @@ def calibrate_model(
500500
stop_vllm(proc)
501501
time.sleep(_VRAM_SETTLE_S)
502502

503+
error_detail = str(exc)
504+
if log_tail:
505+
error_detail = f"{error_detail}\n{log_tail}"
503506
partial.error = (
504507
f"Model failed to start with KV cache {kv_bytes_str} on "
505-
f"tp={tp}: {exc}"
508+
f"tp={tp}: {error_detail}"
506509
)
507510
logger.warning(" %s", partial.error)
508511
return partial
@@ -862,7 +865,10 @@ def auto_calibrate_models(
862865
result = _try_calibrate(plan, **cal_kwargs)
863866

864867
# tp escalation: if calibration failed, retry with doubled tp
865-
while not result.success and tp * 2 <= max_tp:
868+
# Skip escalation for errors that cannot be solved by more GPUs
869+
# (e.g. unsupported model architecture).
870+
_fatal = "does not recognize this architecture" in (result.error or "")
871+
while not result.success and not _fatal and tp * 2 <= max_tp:
866872
next_tp = tp * 2
867873
logger.info(
868874
" %s failed with tp=%d — retrying with tp=%d",

0 commit comments

Comments
 (0)