How to parallelize for loops in R

Home » How to parallelize for loops in R

Updated Feb 26th, 2023.

Introduction to Parallel Computing in R

Parallel computing is a technique that enables us to tackle large computational tasks by dividing them into smaller, manageable subtasks and executing them simultaneously on different processors or on processor cores of a single CPU. In R, this can be especially beneficial, as complex computations can take hours, or even days, to complete.

The Benefits of For Loops Parallel Computing in R

Traditional base R for-loops are highly inefficient because they only utilize a single core of the computer to carry out the iterations. A parallel for loop, on the other hand, is one in which the statements within the loop can be executed simultaneously on separate cores, processors, or threads.

Libraries to Parallelize For Loops in R

library(parallel): support for parallel computation, including random-number generation.
library(foreach): a high-level interface for parallel computation as a series of iterations.
library(doParallel): a parallel backend for foreach.

Running Parallel For loop Calculation in R

I used the following code to set up backend for parallel computing, to export my objects and reference classes needed for computation to the worker nodes, and to execute the loop using foreach().

Setup backend to use multiple cores for running a parallel for loop.

library(parallel)
library(foreach)
library(doParallel)

totalCores = detectCores(logical = FALSE)

if(.Platform$OS.type == "windows" ) {
  cl <- makeCluster(totalCores[1]/2, type = "PSOCK")
} else {
  makeCluster(totalCores[1]-1, type = "FORK")
}

On Windows the backend type is PSOCK. On UNIX-based systems FORK is usually more efficient.

Export the objects and reference classes needed for computation to the worker nodes using clusterExport().

clusterExport(cl, c("img3", "nuclei", "computeFeatures", "computeFeatures.moment", "computeFeatures.shape", "computeFeatures.basic", "computeFeatures.haralick"))
registerDoParallel(cl)

Perform parallel for loop calculation in R using foreach().

data <- foreach(i = 1:dim(img3)[3], .packages = "parallel", .combine = cbind) %dopar% {
  computed_ft <- computeFeatures(nuclei, img3[,,i], xname = "Pt_", refnames = "_")
  cbind(computed_ft[,12])
}

The Limitations of Base R for Parallel Computing

The base R functions are not designed to take advantage of multiple processors, and the overhead involved in using these functions can sometimes outweigh the benefits of parallel processing.

CPU and memory usage during for loop calculation in R

For loop execution time depending on number of cores

When the number of workers is too high, they may compete for the same resources, such as memory, CPU, or I/O bandwidth. This can lead to reduced performance as the workers are waiting for access to the resources they need.

The switch to FORK backend statistically significantly improved the performance of the multithreaded for loop run in R on M1 processor.

For loop execution time depending on backend type on M1 processor

For loop execution time on Mac in parallel in R

Conclusion

Parallelisation is a powerful tool for making complex computations in R more efficient. By dividing the task into smaller subtasks, one can make use of multiple cores in our computer to work on the problem simultaneously. However, it’s important to consider the number of workers to use, as resource contention, overhead, and operating system limitations can all impact performance.

February 5, 2023

Parallel computing in R R language

Maxim Bespalov

Tags:

parallel computing in R lang R language Run Parallel For Loop in R

Comments

Your email address will not be published. Required fields are marked *

Comment *

Name

Website

Post Comment

How to parallelize for loops in R

Introduction to Parallel Computing in R

Running Parallel For loop Calculation in R

The Limitations of Base R for Parallel Computing

Conclusion

How to build your own ChatGPT web app ↗

GPT-4 does data analysis of a pasted dataset ↗

10 Best Practices for Effective Data Visualization: Simplicity ↗

Comments

Leave a Reply