-
Notifications
You must be signed in to change notification settings - Fork 38
Closed
Description
Hello,
I'm experiencing an issue with the load distribution of parallel jobs executed with future_map. In particular, I observe that the first time I call future_map the workload is happening on 2-3 workers only, while on consecutive runs of the same call the workload is shared evenly across the workers.
I tried to narrow it down into a reprex:
library(future)
library(tictoc)
plan(multisession, workers = 10)
tic()
res <- purrr::map(
.x = 1:1e6, .f = ~.x +1
)
toc()
# 1.92 sec not in paralle
tic()
furrr::future_map(
.x = 1:1e6, .f = ~.x +1
)
toc()
# 3.462 sec on first run
microbenchmark::microbenchmark(
{
furrr::future_map(
.x = 1:1e6, .f = ~.x +1
)
},
times = 20
)
# 1.2 secs on average on consecutive runs
In my "real-world" applications, where there is also a considerable amount of data to be passed to the workers, this tends to be more extreme.
This might be related to this previous issue:
#3
I'm working on the following system:
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS
Any thoughts appreciated.
Best,
Maximilian
Metadata
Metadata
Assignees
Labels
No labels