-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Shorten text repr for DataTree
#10139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* only show max 12 children in text ``DataTree`` repr * use ``set_options(display_max_rows=12)`` to configure this setting * always include last item in ``DataTree`` * insert "..." before last item in ``DataTree``
Thank you for working on this @jsignell !
This seems good! But how much better is it than before?
This seems a little ugly from a UI perspective. |
Looks nice @jsignell! Some thoughts:
Same feelings and no strong opinion on this. I'm slightly leaning towards a separate option, though, in order to avoid constantly toggling the value of
Could we instead compute the location of the "..." placeholder so that it is more in the "middle", likely showing more than one trailing node? It is probably better from a UX perspective (and more consistent with other reprs such as Dataset and
I agree. Maybe adding "..." placeholder rows in the html repr too and reuse the same logic for computing their location?
Would it be easy to customize the number of items in the section title rather than adding a full note? E.g., something like:
or just
|
I find that in the text repr the "..." placeholder could be a little more visible. E.g, something like:
What I also find disturbing in this example is that I don't know how easy / hard it can be implemented, but it would be nice if we keep showing all hierarchical levels in the truncated repr, e.g.,
Or if possible truncate the tree at smart locations such that it doesn't "break" the hierarchy:
|
import numpy as np
import xarray as xr
number_of_files = 700
number_of_groups = 5
number_of_variables= 10
datasets = {}
for f in range(number_of_files):
for g in range(number_of_groups):
# Create random data
time = np.linspace(0, 50 + f, 1 + 1000 * g)
y = f * time + g
# Create dataset:
ds = xr.Dataset(
data_vars={
f"temperature_{g}{i}": ("time", y)
for i in range(number_of_variables // number_of_groups)
},
coords={"time": ("time", time)},
).chunk()
# Prepare for xr.DataTree:
name = f"file_{f}/group_{g}"
datasets[name] = ds
dt = xr.DataTree.from_dict(datasets) %timeit dt._repr_html_()
# 32.9 s ± 797 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) (main)
# 540 ms ± 3.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) (PR)
%timeit dt.__repr__()
# 2.47 s ± 24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) (main)
# 7.55 ms ± 29.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) (PR) Quite the improvement for me, thank you @jsignell ! |
Thank you all for looking this over! Yeah I should have made it more clear what the improvement is over main. Like @Illviljan mentioned it's ~300x faster for the text repr and ~60x faster for the html repr for this case. The difference in speed up might be partially due to the different logic. In the text repr I am limiting the display to 12 nodes and in the html I am limiting each node to 12 children . The computation time will depend on which of those choices we converge on. The node limit should be more of a fixed time and the children limit will depend more on the shape of the tree.
Yeah I agree that feels like the right call.
Yeah I like that idea.
👍
Yep I can change that.
I like that idea a lot. I'll try to get that idea working. |
* create new option to control max number of children * consolidate logic to counting number of children per node * show first n/2 children then ... then last n/2 children
xarray/core/options.py
Outdated
@@ -222,6 +226,8 @@ class set_options: | |||
* ``True`` : to always expand indexes | |||
* ``False`` : to always collapse indexes | |||
* ``default`` : to expand unless over a pre-defined limit (always collapse for html style) | |||
display_max_children : int, default: 6 | |||
Maximum number of children to display for each node in a DataTree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maximum number of children to display for each node in a DataTree | |
Maximum number of children to display for each node in a DataTree. |
xarray/core/formatting_html.py
Outdated
children_html = [] | ||
for i, (n, c) in enumerate(children.items()): | ||
if ( | ||
MAX_CHILDREN is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't MAX_CHILDREN
always an int
? It's defined like that in OPTIONS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was thinking originally that it should be nullable but I didn't see any other nullable options so then I was worried that that might be an antipattern. The case for null is to allow arbitrarily large trees, which obviously has performance implications that triggered this work in the first place. All that is to say: yes. It is always an int. I'll change the code to make that explicit.
Co-authored-by: Illviljan <[email protected]>
The mypy failures look like the same ones that are on main. I need to look into the doctests. |
@jsignell https://github.com/max-sixty/pytest-accept may help if you need to recreate a bunch of them |
Thank! The doctest failures didn't end up being related to this PR so I opened #10230 |
Demo
Here is what the repr actually looks like:
Here is the html repr:
Notes
This logic is configurable via
xr.set_options(display_max_rows=)
I decide to reuse that option rather than making a new one since it felt functionally kind of similar to limiting the number of variables or coords shown on a dataset (which is whatdisplay_max_rows
is otherwise used for). I can imagine making a newdisplay_max_children
setting though and I'd likely toggle the default 12 down to 6 or something.I wasn't quite sure what the right approach was so I tried two different ideas. It might make sense to consolidate on one approach though
In the text repr:
In the html repr:
Checklist
whats-new.rst