Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to adaptor backend #298

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

nathanjmcdougall
Copy link
Contributor

To set us up for #263 and #254.

At @machow's suggestion here: #263 (comment)
I definitely agree this is definitely a nicer way of doing things.

@@ -0,0 +1,21 @@
MIT License
Copy link
Contributor Author

@nathanjmcdougall nathanjmcdougall Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically complying with MIT requires distributing the license

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

Copy link
Contributor Author

@nathanjmcdougall nathanjmcdougall Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this probably needs to be moved out of tests because I'm not sure if that gets included in the wheel build (I think it doesn't).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests directory does get included in the wheel build! I'm not entirely sure where this file should be, but it might be easier to not vendor in this package, if possible 😄

from abc import abstractmethod
from typing import TYPE_CHECKING, Any, ClassVar, TypeAlias, overload

from typing_extensions import Self

This comment was marked as resolved.

@isabelizimm
Copy link
Collaborator

isabelizimm commented Aug 20, 2024

I would prefer to not vendor in all of databackend; @machow, wdyt of putting the latest GH release on PyPI? It looks like there is just v0.0.1 on PyPI due to some sort of auth-y error in CI releasing 0.0.2? We can maybe pair through this together?

@nathanjmcdougall nathanjmcdougall force-pushed the feature/move-to-adaptor-backend branch from 43ebacc to 549679a Compare August 20, 2024 22:20
@isabelizimm
Copy link
Collaborator

isabelizimm commented Aug 27, 2024

Okay, we are able to use databackend==0.0.3 now, rather than vendoring it all in 🎉

@nathanjmcdougall nathanjmcdougall force-pushed the feature/move-to-adaptor-backend branch from 90355be to fe6092f Compare August 27, 2024 22:40
@nathanjmcdougall
Copy link
Contributor Author

Cool, I've rebased away the commit with the vendored code and added the new dependency.

@isabelizimm isabelizimm requested a review from machow December 16, 2024 22:59
@isabelizimm
Copy link
Collaborator

isabelizimm commented Dec 16, 2024

paired with @machow, he'll be taking over this PR and associated ones (#254 and #263) for review/next steps!

@nathanjmcdougall
Copy link
Contributor Author

@machow hey, it would be good if you could review this PR sometime. No great rush but I'd really love to get #249.

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so sorry for the long wait 😓 -- this is looking really good! I think the 3 small things I thought could benefit from a little tweaking are..

  • It seems like _Adaptor._d should not be typed as a class variable
  • _Adaptor.write_json() is overriden in _DfAdaptor in a surprising way (it returns a string), in order to work for data previews. I wonder if the _DFAdaptor version could be renamed to .to_json() to reflect its doing a different job.
  • save_data() is fed an object pulled from _Adaptor._d but then re-creates the adaptor. Could we allow save_data() to take an adaptor, to prevent this round tripping?

Thanks again for all the work you put into this! If it's useful, I'm happy to pick this up and finish -- especially since so much time has passed, and I've picked up the context again / probably owe it to folks 😭

_DataFrame: TypeAlias = _PandasDataFrame


class _AbstractPandasFrame(AbstractBackend):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the file _adaptors.py starts with an underscore, it seems okay for the contents to not use an underscore (e.g. AbstractPandasFrame).

(though also totally okay to punt this, since it's all internal; especially if other PRs are building on this one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with that.



class _Adaptor:
_d: ClassVar[Any]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the use of ClassVar right here? It seems like _d is not a class variable (it's set on the instance).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah also agreed. I can't recall why I did that, it might have been some misguided thoughts about pyright behaviour.

self._d = data

@overload
def write_json(self, file: str) -> None: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior was modified to sometimes return a string, which seems to violate command-query separation (maybe to reflect the implementation in the pandas adaptor?). Can we revert so that the adaptor refactors, but does not extend the original behavior?

edit: see comment in _DFAdaptor.write_json()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, good idea.

def write_json(self, file: str) -> None: ...
@overload
def write_json(self, file: None) -> str: ...
def write_json(self, file: str | None = None) -> str | None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I understand -- this looks like an override of the original .write_json() method, but its job seems different (it's used for the data preview). Its job looks more like the added .shape or .columns properties.

Can we do this?:

  • rename .write_json() here to something else (maybe .to_json())
  • change the parent annotation for write_json() to always return None (not sometimes a str)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See what I've done in this commit:

daa4239

pins/drivers.py Outdated
@@ -135,6 +129,8 @@ def save_data(
# as argument to board, and then type dispatchers for explicit cases
# of saving / loading objects different ways.

adaptor = _create_adaptor(obj)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we allow obj to be either _Adaptor | Any (the typing is redundant, but maybe a helpful signal)? Then, if obj is not an adaptor, this line could create one.

I'm guessing keeping the original save_data behavior is useful for testing, but it'd be nice not to have to roundtrip by calling it on save_data(_Adaptor._d, ...) in this board method:

https://github.com/rstudio/pins-python/pull/298/files#diff-36792b1eedbe5453d2c6b58286ab65eed0c9286e94e2695a736edff37164e885R757

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you've done that here:

d0fa9c9

I will:

  • Write a test case for this
  • Refactor to take advantage of this

@nathanjmcdougall
Copy link
Contributor Author

nathanjmcdougall commented Apr 1, 2025

I'll take a look at this later today - but I'm happy for you to pick it up if you prefer. Thank you for taking the time to revisit the project :)

@machow
Copy link
Collaborator

machow commented Apr 1, 2025

I can take a quick pass right now, since I've got the basic pins stuff in mind!

@machow
Copy link
Collaborator

machow commented Apr 1, 2025

Ah, we're both pushing to this PR -- I'll leave it to you, thanks for working on this! I noticed that a doctest test fails now, and seems to be related to a more recent release of fsspec.

I'm guessing the ffspec github backend doesn't like the two // in this path anymore: pins/tests/pins-compat/df_csv/20220214T163720Z-9bfad//df_csv.csv

If it's useful, we can always ignore that for now, and handle as a separate from this PR

@nathanjmcdougall
Copy link
Contributor Author

Sorry yeah unfortunately I think we both started working on it at the same time.

Sounds good - I'm in favour of fixing it in another PR first so after I've addressed the issues here I'll work on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants