Skip to content

Conversation

vfdev-5
Copy link
Contributor

@vfdev-5 vfdev-5 commented Jul 24, 2025

This is a draft of multi-host tutorial, based on this gist: https://gist.github.com/vfdev-5/70f695e462443685a0922e79ce0ee899 and Chris Jones' mnist_xla.py code.

cc @melissawm

@vfdev-5 vfdev-5 force-pushed the docs-learn-multi-host-tpu branch from 4527a9f to 8c0d217 Compare July 24, 2025 09:54
Copy link
Contributor

@melissawm melissawm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @vfdev-5 ! A few very straightforward comments and one question (should we use TensorBoard or XProf for profiling?0

0 Training finished!
```

#### Profiler logs in TensorBoard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we want to use XProf instead of TensorBoard, but we should confirm.

@melissawm
Copy link
Contributor

Hello @pgmoka @bhavya01 - would you mind taking a look for correctness and scope of this tutorial? If you are happy with the general idea, we can remove this from draft and address any other feedback. Thank you!

@melissawm
Copy link
Contributor

Hi folks - gentle ping. If you have any feedback, we're happy to address. Thanks!

@vfdev-5 vfdev-5 force-pushed the docs-learn-multi-host-tpu branch from 8c0d217 to aac84ec Compare August 28, 2025 08:18
@vfdev-5 vfdev-5 force-pushed the docs-learn-multi-host-tpu branch from aac84ec to 9f8c996 Compare August 28, 2025 08:21
@vfdev-5 vfdev-5 marked this pull request as ready for review August 28, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants