Skip to content

Performance regression in future/globals when finding globals in a list of data frames #303

@DavisVaughan

Description

@DavisVaughan

CC @HenrikBengtsson

We've gotten a few reports of performance issues appearing in furrr over the years (#267, #268).

Nothing has changed in furrr, so I suspected future or globals has had changes.

Here's a furrr example of a suspected regression:

With CRAN future

library(future)
library(furrr)

# 50,000 instances of `mtcars`
xs <- rep(list(mtcars), 50000)

# Two workers
plan(multisession, workers = 2)

# Should send 25,000 to each worker

# CRAN future 1.69.0
system.time({
  future_map(xs, identity)
})
#>    user  system elapsed 
#>   5.420   0.336   5.985

# Old future 1.33.2
system.time({
  future_map(xs, identity)
})
#>    user  system elapsed 
#>   2.153   0.237   2.739

Yes there's a lot of data shuffling going on here, but users do this quite a bit anyways (despite our warnings against it) and it's generally much faster than this.

It also gets much worse if you add tibbles into the mix.

library(future)
library(furrr)
library(tibble)

# 50,000 instances of `mtcars` but as a tibble
xs <- rep(list(as_tibble(mtcars)), 50000)

# Two workers
plan(multisession, workers = 2)

# Should send 25,000 to each worker

# CRAN future 1.69.0
system.time({
  future_map(xs, identity)
})
#>    user  system elapsed 
#>  10.451   0.523  11.175

# Old future 1.33.2
system.time({
  future_map(xs, identity)
})
#>    user  system elapsed 
#>   3.761   0.302   4.297

Ironically, if we naively try and strip out furrr, then CRAN future is actually faster.

So this is probably going to require some investigation to see if we can create a pure future call that shows the slowdown

library(future)

x1 <- rep(list(mtcars), 25000)
x2 <- rep(list(mtcars), 25000)

# Two workers
plan(multisession, workers = 2)

# CRAN future 1.69.0
system.time({
  f1 <- future(
    lapply(x, FUN = identity),
    globals = as.FutureGlobals(list(x = x1))
  )
  f2 <- future(
    lapply(x, FUN = identity),
    globals = as.FutureGlobals(list(x = x2))
  )
  value(list(f1, f2))
})
#>    user  system elapsed 
#>   0.547   0.216   0.999

# Old future 1.33.2
system.time({
  f1 <- future(
    lapply(x, FUN = identity),
    globals = as.FutureGlobals(list(x = x1))
  )
  f2 <- future(
    lapply(x, FUN = identity),
    globals = as.FutureGlobals(list(x = x2))
  )
  value(list(f1, f2))
})
#>    user  system elapsed 
#>   2.485   0.245   3.009

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions