Performance regression (10-15%) in LLAMA 8B and BERT training with torch-xla v2.8.0 compared to v2.8.0-rc3

## 🐛 Bug

We are seeing about 10 - 15% reduction in performance for llama 8B and BERT training moving from torch-xla v2.8.0-rc3 to torch-xla v2.8.0. The problem was narrowed down to https://github.com/pytorch/xla/compare/v2.8.0-rc3...v2.8.0, especially #9547. Building the wheels after reverting change https://github.com/pytorch/xla/commit/ad76b20951eef0cafb02acda9e73e4d3cd3a12a5 has restored the performance back.

_Note: This is an additional issue that we have observed after resolving the logging issue #9569 ._

## To Reproduce

Steps to reproduce the behavior:

1. Install latest Neuron torch-neuronx + torch-xla + torch + torchvision, replace torch-xla and torch with 2.8.0
2. Run https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/bert.html#hf-bert-pretraining-tutorial
3. Compare performance against a run with v2.8.0-rc3


## Expected behavior

Performance on par with torch-xla v2.8.0-rc3 on all models.
## Environment

 - Reproducible on XLA backend [CPU/TPU/CUDA]: Neuron
 - torch_xla version: 2.8.0




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance regression (10-15%) in LLAMA 8B and BERT training with torch-xla v2.8.0 compared to v2.8.0-rc3 #9605

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance regression (10-15%) in LLAMA 8B and BERT training with torch-xla v2.8.0 compared to v2.8.0-rc3 #9605

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions