Converters for Resampling <-> data.table #1162

mllg · 2024-09-04T07:29:26Z

No description provided.

jemus42 · 2024-09-04T12:27:58Z

Observations from implementing this for my use case where the as.data.table(Resampling) is stored as CSV:

I needed to ensure the dt is keyed appropriately (data.table::fread(..., key = c("set", "iteration"))) for the join(?) to work (x[list("train"), list(ids = list(row_id)), by = "iteration"]$ids), to avoid a warning. Maybe asserting keys is useful?
assert_factor(x$set, any.missing = FALSE) might be a little strict, given that it should suffice for set to be a character and assert_set_equal(unique(x$set), c("train", "test"))? When taking the detour to a CSV file set could end up being either character or factor I guess.

berndbischl · 2024-09-05T12:25:03Z

@mllg can you please describe what this supposed does or solve?

berndbischl · 2024-09-05T12:34:57Z

i mean, i can see what happens in the code. i think this is good, if we test and doc this well

berndbischl · 2024-09-05T12:37:19Z

i would add a small feature like this:
a) if we go from a Resample (instantiated) to a table, this is easy.
b) if we read from a table, we go to a ResamplingCustom (if the user gives no extra type)
and we perform some very basic validity checks
c) if the user provides the type (we should at least for now support HO, subsamp, CV, rep-CV) we convert into that type (not custom) and perform a few more validity checks

jemus42 · 2024-10-25T11:11:17Z

Motivation here was something I do for the survival benchmark:

Load tasks, then either

Instantiate resamplings on tasks and write resamplings to disk as portable format for future reproducibility (Resampling > data.table -> CSV)

or, if resamplings are already stored on disk:

Load resampling from CSV, convert to Resampling, and use those for the benchmark.

That way the the stored resamplings are always "the truth", which is nice because I found that instantiating resamplings with a set seed in a loop over the tasks was not so smart because excluding a task then changes the RNG state for subsequent instantiations.
Also, if you git commit the CSVs of the resamplings you immediately see in the diff if something unexpected happens here.

sebffischer · 2024-12-20T11:36:30Z

@berndbischl @mllg What's the status here?

mllg added 2 commits September 4, 2024 09:29

Converters for Resampling <-> data.table

8d58c54

Merge branch 'main' into resampling_converters

52101aa

be-marc self-assigned this Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converters for Resampling <-> data.table #1162

Converters for Resampling <-> data.table #1162

mllg commented Sep 4, 2024

jemus42 commented Sep 4, 2024

berndbischl commented Sep 5, 2024

berndbischl commented Sep 5, 2024

berndbischl commented Sep 5, 2024

jemus42 commented Oct 25, 2024

sebffischer commented Dec 20, 2024

Converters for Resampling <-> data.table #1162

Are you sure you want to change the base?

Converters for Resampling <-> data.table #1162

Conversation

mllg commented Sep 4, 2024

jemus42 commented Sep 4, 2024

berndbischl commented Sep 5, 2024

berndbischl commented Sep 5, 2024

berndbischl commented Sep 5, 2024

jemus42 commented Oct 25, 2024

sebffischer commented Dec 20, 2024