Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add task phash #1199

Closed
wants to merge 13 commits into from
Closed

feat: add task phash #1199

wants to merge 13 commits into from

Conversation

be-marc
Copy link
Member

@be-marc be-marc commented Nov 5, 2024

task = tsk("spam")
resampling = rsmp("cv", folds = 3)
resampling$instantiate(task)

resample(tsk("pima"), lrn("classif.rpart"), resampling)

#> Error: DataBackend did not return the queried rows correctly: 3067 requested, 501 received

Would solve errors like this where the resampling tries to access rows that are not present because the resampling was instantiated on a larger task. In this PR, benchmark() and resample() check if the resampling was instantiated on the task. This is done with a Task$phash that excludes the features.

Would come with an overhead especially in fselect due to the additional hash operations

@be-marc be-marc marked this pull request as draft November 5, 2024 07:54
@be-marc
Copy link
Member Author

be-marc commented Nov 5, 2024

If we want this, we need to check #1198

@be-marc
Copy link
Member Author

be-marc commented Nov 5, 2024

The error message has been extended with the sentence "The resampling was probably instantiated on a different task". I will close this PR for now.

@be-marc be-marc closed this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant