-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Dear furrr developer,
Thank you for maintaining this package, I recently tried this package and found that future_map has no obvious speed up effects compared to map function in simple linear regression setting. Here I tried a toy example and plot how the running time change along the workers
benchmark
require(dplyr); require(furrr); require(purrr); require(tidyr)
# Create some large dataset
Data <- as_tibble(mtcars)
Data <- vctrs::vec_rep(Data, 50000)
Data$ID <- vctrs::vec_rep_each(1:50000, nrow(mtcars))
NestedData <- Data %>%
nest(.by = ID)
map_vec = vector(mode = "numeric", length = 10)
future_map_vec = vector(mode = "numeric", length = 10)
for (i in seq(10)) {
future::plan(multisession, workers = i)
stamp1 = Sys.time()
xx <- mutate(NestedData, data2 = map(data, identity))
map_vec[i] = Sys.time() - stamp1
stamp2 = Sys.time()
xx <- mutate(NestedData, data2 = future_map(data, identity))
future_map_vec[i] = Sys.time() - stamp2
}plots
I also noticed although the workers is set, the htop command in command line interface did not show that number of CPUs are utilized. I am currently not clear about the details of future_map implementation, but the cpu utilization makes me wonder if the slowness is due to I/O bottleneck. If so, this might indicate there are some improvement space (for example, avoid unnecessary file creating/copying/writing)?
Given that future_map's performance is not as satisfying in this attempt, may I ask if you could share some wisdom on the application scenario of future_map where there is a significant speed up?
This might be a repeat of issue #41, #234, #252
Thanks!
session info
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] shiny_1.7.5.1 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.0 readr_2.1.4 tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0 tidyr_1.3.0
[10] purrr_1.0.2 furrr_0.3.1 future_1.33.0 dplyr_1.1.3 