Understanding memory usage and performance of `furrr::future_apply`

I am facing some issues parallelizing processes with `furrr::future_apply`.

This is the setting I am having issues with:

```
    rm(list=ls(all=TRUE))
    
    require(future)
    require(furrr)
    require(dplyr)
    require(readr)
    require(parallel)
    set.seed(123)
    
    # fake data
    my_list <-   replicate(1000000, rnorm(1000), simplify = FALSE)
    
    # function to parallelize
    f_to_parallelize <- function(x){
      
      y <- sum(x)
      
      return(y)
      
    }
    
    # plans to test
    plan(sequential)
    #plan(multisession, workers=2)
    #plan(multisession, workers=6)
    #plan(multisession, workers=15)
    
    l <- future_walk(my_list, f_to_parallelize)
    
```

When I profile memory and time for these 4 plans this is what I get:

![mem_prof_sum](https://github.com/DavisVaughan/furrr/assets/84732332/0f3b4b1c-e066-494e-b81d-5c1e7b8ce473)

I have launched 4 different jobs from R studio server, while I was profiling all memory used for processes with my user in a separate job to get data for the graph.

This is the outpu of my `sessionInfo())` of the parallelization jobs:

> R version 4.2.2 Patched (2022-11-10 r83330)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04.6 LTS
> Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
> locale:
> \[1\] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8  
> \[4\] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8  
> \[7\] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C  
> \[10\] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C  
> attached base packages:
> \[1\] parallel  stats     graphics  grDevices utils     datasets  methods  
> \[8\] base  
> other attached packages:
> \[1\] readr_2.1.2   dplyr_1.1.0   furrr_0.2.3   future_1.24.0
> loaded via a namespace (and not attached):
> \[1\] rstudioapi_0.13   parallelly_1.30.0 magrittr_2.0.2    hms_1.1.1  
> \[5\] tidyselect_1.2.0  R6_2.5.1          rlang_1.1.1       fansi_1.0.2  
> \[9\] globals_0.14.0    tools_4.2.2       utf8_1.2.2        cli_3.6.0  
> \[13\] ellipsis_0.3.2    digest_0.6.29     tibble_3.1.6      lifecycle_1.0.3  
> \[17\] crayon_1.5.0      tzdb_0.2.0        purrr_1.0.1       vctrs_0.5.2  
> \[21\] codetools_0.2-18  glue_1.6.2        compiler_4.2.2    pillar_1.7.0  
> \[25\] generics_0.1.2    listenv_0.8.0     pkgconfig_2.0.3

Is this behavior normal? I did not expected the steep increase in memory for all the plans, other than the increase in time when I increase the number of workers.

I also tested the `sys.sleep(1)` function in parallel, and I got the result I expected, time decreases as I increase workers.

What I am trying to parallelize is far more complex than this, i.e. a series of nested wrapped functions that do some training for some time series models and inference writing a csv and not returning anything.

I fill like I am losing something very simple but yet I cannot wrap my head around it, what concerns me the most is the memory increase, as it would be a very memory intensive function.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding memory usage and performance of `furrr::future_apply` #260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Understanding memory usage and performance of furrr::future_apply #260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Understanding memory usage and performance of `furrr::future_apply` #260