Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle nullable types and empty partitions before Dask-ML predict #799

Closed
wants to merge 9 commits into from

Conversation

sarahyurick
Copy link
Collaborator

@VibhuJawa

Re-opening #783 here.

@codecov-commenter
Copy link

codecov-commenter commented Sep 22, 2022

Codecov Report

Merging #799 (929538e) into main (94f96a0) will decrease coverage by 0.34%.
The diff coverage is 36.36%.

@@            Coverage Diff             @@
##             main     #799      +/-   ##
==========================================
- Coverage   74.88%   74.54%   -0.35%     
==========================================
  Files          71       71              
  Lines        3588     3630      +42     
  Branches      748      759      +11     
==========================================
+ Hits         2687     2706      +19     
- Misses        771      786      +15     
- Partials      130      138       +8     
Impacted Files Coverage Δ
dask_sql/physical/rel/custom/predict.py 56.06% <33.33%> (-33.23%) ⬇️
dask_sql/physical/rel/custom/create_model.py 82.53% <60.00%> (-2.21%) ⬇️
dask_sql/_version.py 34.00% <0.00%> (+1.44%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sarahyurick sarahyurick marked this pull request as ready for review September 23, 2022 20:58
@sarahyurick sarahyurick changed the title [WIP] Handle nullable types and empty partitions before Dask-ML predict Handle nullable types and empty partitions before Dask-ML predict Sep 23, 2022
@@ -183,7 +184,16 @@ def convert(self, rel: "LogicalPlan", context: "dask_sql.Context") -> DataContai

delayed_model = [delayed(model.fit)(x_p, y_p) for x_p, y_p in zip(X_d, y_d)]
model = delayed_model[0].compute()
model = ParallelPostFit(estimator=model)
if "sklearn" in model_class:
output_meta = np.array([])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VibhuJawa I wanted to ask your opinion on this check?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should handle this in https://github.com/dask-contrib/dask-sql/pull/832/files PR and follow dask/dask-ml#912 for a clean fix. I dont think we should fix it here.

@sarahyurick
Copy link
Collaborator Author

OK, I've moved these changes into #832.

@sarahyurick sarahyurick closed this Oct 5, 2022
@sarahyurick sarahyurick deleted the null_empty_predict branch May 26, 2023 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants