Skip to content

Add device-level failure #6514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 9, 2025
Merged

Add device-level failure #6514

merged 14 commits into from
Apr 9, 2025

Conversation

yangw-dev
Copy link
Contributor

@yangw-dev yangw-dev commented Apr 8, 2025

add device level failure to indicate failure of a commit

vercel: demo

if base commit has failures:

  • failure_repot column: indicate the failure base commit
  • for models: redender Failure -> RightSight Value
    if new commit has failure:
    - failure_repot column: indicate the failure on new commit
    - for models: redender leftSide value -> rightside
    if lcommit === rcommit:
  • failure_repot column: indicate the failure
  • for models: redender Failure

UI Screenshot:
image

UI demo with tooltip
image

next step:
add job-level failure

Copy link

vercel bot commented Apr 8, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
torchci ✅ Ready (Inspect) Visit Preview Apr 8, 2025 10:09pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 8, 2025
@yangw-dev yangw-dev requested a review from huydhn April 8, 2025 03:35
@yangw-dev yangw-dev marked this pull request as ready for review April 8, 2025 03:35
@yangw-dev yangw-dev changed the title Ad failure Add device-level failure Apr 8, 2025
@yangw-dev yangw-dev requested a review from guangy10 April 8, 2025 04:06
@huydhn
Copy link
Contributor

huydhn commented Apr 8, 2025

IMO, having a new FAILURE_REPORT column is not worth the vertical space cost because the column supposes to be empty most of the time. Is this possible to include this information in an existing column? For example, in this dashboard, putting a ⚠ into the model name column like ⚠ meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8 and put the failure report in the title like https://stackoverflow.com/questions/28492167/display-text-under-cursor. What do you think?

@yangw-dev
Copy link
Contributor Author

IMO, having a new FAILURE_REPORT column is not worth the vertical space cost because the column supposes to be empty most of the time. Is this possible to include this information in an existing column? For example, in this dashboard, putting a ⚠ into the model name column like ⚠ meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8 and put the failure report in the title like https://stackoverflow.com/questions/28492167/display-text-under-cursor. What do you think?

yah, i think it's fine to remove the Failure_report column, and I just simple add an icon with tooltip

@yangw-dev yangw-dev merged commit e409582 into main Apr 9, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants