Lots of backward-compatible improvements and extensions ahead
Below are some
Support for controlling R options and environment variables via future()
[future#480]
Termination of futures, e.g. suspend(f)
. Since not all backends can support termination, the design should be such that failed termination attempts must not affect the workflow and its results
[future#93, future#FR213, future.batchtools#FR27, parallelly#33]
Restarting of failed or terminated futures, e.g. reset(f)
and v <- value(f)
. Also automatic restarting of futures when parallel worker dies
[future#FR188, future#205, future.callr#FR11, parallelly#32]
Troubleshooting: Improve debugging of future that produce errors, e.g. capture call frames in parallel workers and let user inspect them via tools such as utils::browser()
and utils::recover()
[future#253]
Performance: Add built-in speed and memory profiling tools to help developers and users identify bottle necks and improve overall performance
[future#59, future#142, future#FR437]
Marshaling: Make it possible to parallelize also some of the object types that are currently “non-exportable”, e.g. objects by packages such as DBI, keras, ncdf4, rstan, sparklyr, xgboost, and xml2 [Project: marshal and R Consortium ISC Marshaling and Serialization in R Working Group]
Resources: Make it possible to declare “resources” that a future requires in order to be resolved, e.g. minimal memory requirements, access to the local file system, a specific operating system, internet access, sandboxed, etc. This will also make it possible to work with semi-persistent “sticky” globals
[doFuture#FR63, future#FR181, future#FR301, future#FR346, future#FR430, future#FR450, parallelly#18]
Multiple backends: Send different futures to different backends, based on which parallel backend can provide the requested resources
Scheduling: Add generic support for queueing such that futures can be queued on the local computer, on a remote scheduler, or in peer-to-peer scheduler among collaborators [future#FR256, future.apply#FR63, future.batchtools#FR23]
Flattening nested parallelism: Process nested futures via a common future queue instead of via nested future plans [future#FR361]
Random number generation: Support for custom, alternative RNGs other than the built-in L’Ecuyer-CMRG algorithm.
Develop a future.mapreduce package to provide an essential, infrastructure Map-Reduce Future API, e.g. load balancing, chunking, parallel random number generation (RNG), and early stopping. Packages such as future.apply, furrr, and doFuture are currently implementing their own version of these features. Consolidating these essentials into a shared utility package will guarantee a consistent behavior for all together with faster deployment of bug fixes and improvements [future.apply#20, future.apply#FR32, future.apply#FR44, future.apply#59, future.apply#FR60]
Develop more efficient chunking of large objects in map-reduce calls. This can be achieved by introducing a mechanism or a syntax to declare that part of the code should be processed prior to being parallelized, e.g. further subsetting of and validation of input data before distributing them to parallel workers
Early stopping, e.g. have future_lapply(X, FUN)
terminate active futures as soon as one of the FUN(X[[i]])
calls produces an error
[future#FR213, future.apply#FR75]
Automatic map-reduce: Support for merging of many futures into chunks of futures to be processed more efficiently with less total overhead. This could make fs <- lapply(X, function(x) future(FUN(x))
and vs <- value(fs)
automatically as efficient as future_lapply(X, FUN)
removing the need for custom implementations
Robustness of foreach: Implement bug fixes and outstanding foreach feature requests in the doFuture adapter until resolved upstream. For example, add withDoRNG()
and %dofuture%
.
[doFuture#61] (solved in doFuture 0.13.0 and doFuture 1.0.0)
Generic support for progress updates for common map-reduce functions, e.g. y <- lapply(X, slow_fcn) %progress% TRUE
and y <- X |> future_map(slow_fcn) %progress% TRUE
[progressr#FR85, progressr#113]
Below is a list of future backends that are worked on or in the plans:
future.clustermq - a backend to resolve futures on a HPC cluster via the clustermq package [future#FR204, future#FR267]
future.redis - a backend to resolve futures via a Redis queue using the redux package [future#FR151]
future.rrq - a backend to resolve futures via a Redis queue using the rrq package [future#FR151]
future.aws.lambda - a backend to resolve futures via the Amazon AWS Lambda service. A group of several people is actively working on this since the end of 2020 [future#FR423]
future.aws.batch - a backend to resolve futures via the Amazon AWS Batch service. We have working group of several people actively working on this [future#FR423]
future.google.cloud.functions - a backend to resolve futures via the Google Cloud Functions service. Work on this will start when a stable future.aws.lambda prototype has been established
future.p2p - a backend to resolve futures via a peer-to-peer network of trusted collaborators
future.sandbox - a backend to resolve untrusted futures via locked-down, sandboxed Linux containers (Docker, Singularity, …) without access to the hosts file system or network. This can already be partially achieved by low-level container setups via parallelly
future.sparklyr - a backend to resolve futures in Spark via the sparklyr package [future#FR286, sparklyr#FR1935]
Internationalization (i18n): Make all error and warning messages translatable via R’s gettext()
framework. Invite community to provide message translations.
Consider relicensing code to a permissive license, where possible