Skip to content

Unbalanced Load distribution on first run #247

@mfakaehler

Description

@mfakaehler

Hello,

I'm experiencing an issue with the load distribution of parallel jobs executed with future_map. In particular, I observe that the first time I call future_map the workload is happening on 2-3 workers only, while on consecutive runs of the same call the workload is shared evenly across the workers.
I tried to narrow it down into a reprex:

library(future)
library(tictoc)

plan(multisession, workers = 10)

tic()
res <- purrr::map(
  .x = 1:1e6, .f = ~.x +1
)
toc()
# 1.92 sec not in paralle

tic()
furrr::future_map(
  .x = 1:1e6, .f = ~.x +1
)
toc()
# 3.462 sec on first run

microbenchmark::microbenchmark(
  {
    furrr::future_map(
      .x = 1:1e6, .f = ~.x +1
    )
  },
  times = 20
)
# 1.2 secs on average on consecutive runs

In my "real-world" applications, where there is also a considerable amount of data to be passed to the workers, this tends to be more extreme.

This might be related to this previous issue:
#3

I'm working on the following system:

R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Any thoughts appreciated.

Best,
Maximilian

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions