-
Notifications
You must be signed in to change notification settings - Fork 60
Expand distributed indexing, match numpy indexing scheme #938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…y slice-indexing. UNTESTED
…sition in the index_proxy
…ays (#937) * Create ci.yaml * Update ci.yaml * Update ci.yaml * Create CITATION.cff * Update CITATION.cff * Update ci.yaml different python and pytorch versions * Update ci.yaml * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Delete pre-commit.yml * Update ci.yaml * Update CITATION.cff * Update tutorial.ipynb delete example with different split axis * Delete logo_heAT.pdf Removal of old logo * ht.nonzero() returns tuple of 1-D arrays instead of n-D arrays * Updated documentation and Unit-tests * replace x.larray with local_x * Code fixes * Fix return type of nonzero function and gout value * Made sure DNDarray meta-data is available to the tuple members * Transpose before if-branching + adjustments to accomodate it * Fixed global shape assignment * Updated changelog Co-authored-by: mtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Coquelin <[email protected]> Co-authored-by: Markus Goetz <[email protected]> Co-authored-by: Claudia Comito <[email protected]>
…pe and new split axis
…oltz-analytics/heat into 914_adv-indexing-outshape-outsplit
|
Thank you for the PR! |
|
Thank you for the PR! |
|
Thank you for the PR! |
mrfh92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approve just to let CI matrix run.
No further checking done (!)
|
Thank you for the PR! |
|
Thank you for the PR! |
|
Thank you for the PR! |
|
Thank you for the PR! |
|
Thank you for the PR! |
| @@ -879,6 +882,641 @@ def fill_diagonal(self, value: float) -> DNDarray: | |||
|
|
|||
| return self | |||
|
|
|||
| def __process_key( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__process_key and __process_scalar_key do not use self, so they should not be declared inside the DNDArray.
Possibly move to indexing.py?
heat/core/dndarray.py
Outdated
| @@ -879,6 +882,641 @@ def fill_diagonal(self, value: float) -> DNDarray: | |||
|
|
|||
| return self | |||
|
|
|||
| def __process_key( | |||
| arr: DNDarray, | |||
| key: Union[Tuple[int, ...], List[int, ...]], | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to define a key type to make type definitions easier.
Index = int | slice | Ellipsis | None
Indexer = Index | tuple[Index, ...]And the apply it everywhere.
| def __getitem__(self, key: Union[int, Tuple[int, ...], List[int, ...]]) -> DNDarray: | ||
| def __process_key( | ||
| arr: "DNDarray", | ||
| key: tuple[int, ...] | list[int], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should define an index, indexer type for shorter type hints
Index = int | slice | Ellipsis | None
Indexer = Index | tuple[Index, ...]| key = kst + slices + kend | ||
| else: | ||
| key = key + [slice(None)] * (self.ndim - len(key)) | ||
| from .types import bool as ht_bool, uint8 as ht_uint8 # avoid circulars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need a better solution, not sure what performance impact this could have over the long run.
| for i in range(len(key[: self.split + 1])): | ||
| if self.__key_is_singular(key, i, self_proxy): | ||
| new_split = None if i == self.split else new_split - 1 | ||
| def _normalize_index_component(comp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for the defining the function here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be moved to the sanitation module.
| if isinstance(key, DNDarray): | ||
| key = _normalize_index_component(key) | ||
| elif isinstance(key, (list, tuple)): | ||
| key = type(key)(_normalize_index_component(k) for k in key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double check if key is a tuple, and the normalization function just returns the list/tuple in most cases. Logic could be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key also might always be a tuple? Need to actually check the entries of the first element.
Description
DRAFT
Intro: main points from Numpy array indexing scheme
Heat: indexing massive, memory-distributed arrays
process-local indexing
distributed indexing
Memory footprint
Scaling behaviour
Issue/s resolved: #914 #918
Changes proposed:
Type of change
Memory requirements
Performance
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
yes / no
skip ci