Skip to content

Enable Optimizer Storing & Fix incomplete updates to Sharded EBC attributes in resharding #2911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

aporialiao
Copy link
Member

Summary:
Previously the dynamic sharding unit test was incomplete in truly verifying that a resharded EBC has all the attributes updated correctly. I ran into these issues when trying to enable optimizer state storing and DMP interface in D73049934

Main changes:

  1. Add in dynamic sharding unit test's are_sharded_ebc_modules_identical the private attributes for ShardedEmbeddingCollection. This method will only compare primitive types or primitive reference types and tensors, i.e. Class types will be skipped.

    1. This helped identify the gaps in current DS implementation - namely module_sharding_plan, _embedding_dims, _uncombined_embedding_names, _uncombined_embedding_dims not being updated correctly to reflect the new shard placements & order
  2. Add in updates to module_sharding_plan, _embedding_dims, _uncombined_embedding_names, _uncombined_embedding_dims in reshard API for Sharded EBC.

  3. Add in call to update Optimizer. The diff splits are not ideal, but the full optimizer unit test will be added in D73049934

Differential Revision: D73530909

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73530909

aporialiao added a commit to aporialiao/torchrec that referenced this pull request Apr 23, 2025
…ibutes in resharding (pytorch#2911)

Summary:

Previously the dynamic sharding unit test was incomplete in truly verifying that a resharded EBC has all the attributes updated correctly. I ran into these issues when trying to enable optimizer state storing and DMP interface in D73049934

Main changes:
1. Add in dynamic sharding unit test's `are_sharded_ebc_modules_identical` the private attributes for ShardedEmbeddingCollection. This method will only compare primitive types or primitive reference types and tensors
     1. This helped identify the gaps in current DS implementation - namely `module_sharding_plan`, `_embedding_dims`, `_uncombined_embedding_names`, `_uncombined_embedding_dims` not being updated correctly to reflect the new shard placements & order

2. Add in updates to `module_sharding_plan`, `_embedding_dims`, `_uncombined_embedding_names`, `_uncombined_embedding_dims` in reshard API for Sharded EBC.

3. Add in call to update Optimizer. The diff splits are not ideal, but the full optimizer unit test will be added in D73049934

Differential Revision: D73530909
aporialiao added a commit to aporialiao/torchrec that referenced this pull request Apr 23, 2025
…ibutes in resharding (pytorch#2911)

Summary:

Previously the dynamic sharding unit test was incomplete in truly verifying that a resharded EBC has all the attributes updated correctly. I ran into these issues when trying to enable optimizer state storing and DMP interface in D73049934

Main changes:
1. Add in dynamic sharding unit test's `are_sharded_ebc_modules_identical` the private attributes for ShardedEmbeddingCollection. This method will only compare primitive types or primitive reference types and tensors
     1. This helped identify the gaps in current DS implementation - namely `module_sharding_plan`, `_embedding_dims`, `_uncombined_embedding_names`, `_uncombined_embedding_dims` not being updated correctly to reflect the new shard placements & order

2. Add in updates to `module_sharding_plan`, `_embedding_dims`, `_uncombined_embedding_names`, `_uncombined_embedding_dims` in reshard API for Sharded EBC.

3. Add in call to update Optimizer. The diff splits are not ideal, but the full optimizer unit test will be added in D73049934

Differential Revision: D73530909
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73530909

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73530909

…ibutes in resharding (pytorch#2911)

Summary:
Pull Request resolved: pytorch#2911

Previously the dynamic sharding unit test was incomplete in truly verifying that a resharded EBC has all the attributes updated correctly. I ran into these issues when trying to enable optimizer state storing and DMP interface in D73049934

Main changes:
1. Add in dynamic sharding unit test's `are_sharded_ebc_modules_identical` the private attributes for ShardedEmbeddingCollection. This method will only compare primitive types or primitive reference types and tensors
     1. This helped identify the gaps in current DS implementation - namely `module_sharding_plan`, `_embedding_dims`, `_uncombined_embedding_names`, `_uncombined_embedding_dims` not being updated correctly to reflect the new shard placements & order

2. Add in updates to `module_sharding_plan`, `_embedding_dims`, `_uncombined_embedding_names`, `_uncombined_embedding_dims` in reshard API for Sharded EBC.

3. Add in call to update Optimizer. The diff splits are not ideal, but the full optimizer unit test will be added in D73049934

Differential Revision: D73530909
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73530909

aporialiao added a commit to aporialiao/torchrec that referenced this pull request Apr 23, 2025
…ibutes in resharding (pytorch#2911)

Summary:

Previously the dynamic sharding unit test was incomplete in truly verifying that a resharded EBC has all the attributes updated correctly. I ran into these issues when trying to enable optimizer state storing and DMP interface in D73049934

Main changes:
1. Add in dynamic sharding unit test's `are_sharded_ebc_modules_identical` the private attributes for ShardedEmbeddingCollection. This method will only compare primitive types or primitive reference types and tensors
     1. This helped identify the gaps in current DS implementation - namely `module_sharding_plan`, `_embedding_dims`, `_uncombined_embedding_names`, `_uncombined_embedding_dims` not being updated correctly to reflect the new shard placements & order

2. Add in updates to `module_sharding_plan`, `_embedding_dims`, `_uncombined_embedding_names`, `_uncombined_embedding_dims` in reshard API for Sharded EBC.

3. Add in call to update Optimizer. The diff splits are not ideal, but the full optimizer unit test will be added in D73049934

Differential Revision: D73530909
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants