Updated Feb 26th, 2023.
Introduction to Parallel Computing in R
Parallel computing is a technique that enables us to tackle large computational tasks by dividing them into smaller, manageable subtasks and executing them simultaneously on different processors or on processor cores of a single CPU. In R, this can be especially beneficial, as complex computations can take hours, or even days, to complete.
The Benefits of For Loops Parallel Computing in R
Traditional base R for-loops are highly inefficient because they only utilize a single core of the computer to carry out the iterations. A parallel for loop, on the other hand, is one in which the statements within the loop can be executed simultaneously on separate cores, processors, or threads.
Libraries to Parallelize For Loops in R
library(parallel): support for parallel computation, including random-number generation.library(foreach): a high-level interface for parallel computation as a series of iterations.library(doParallel): a parallel backend forforeach.
Running Parallel For loop Calculation in R
I used the following code to set up backend for parallel computing, to export my objects and reference classes needed for computation to the worker nodes, and to execute the loop using foreach().
- Setup backend to use multiple cores for running a parallel for loop.
library(parallel)
library(foreach)
library(doParallel)
totalCores = detectCores(logical = FALSE)
if(.Platform$OS.type == "windows" ) {
cl <- makeCluster(totalCores[1]/2, type = "PSOCK")
} else {
makeCluster(totalCores[1]-1, type = "FORK")
}
On Windows the backend type is PSOCK. On UNIX-based systems FORK is usually more efficient.
- Export the objects and reference classes needed for computation to the worker nodes using
clusterExport().
clusterExport(cl, c("img3", "nuclei", "computeFeatures", "computeFeatures.moment", "computeFeatures.shape", "computeFeatures.basic", "computeFeatures.haralick"))
registerDoParallel(cl)
- Perform parallel for loop calculation in R using
foreach().
data <- foreach(i = 1:dim(img3)[3], .packages = "parallel", .combine = cbind) %dopar% {
computed_ft <- computeFeatures(nuclei, img3[,,i], xname = "Pt_", refnames = "_")
cbind(computed_ft[,12])
}
The Limitations of Base R for Parallel Computing
The base R functions are not designed to take advantage of multiple processors, and the overhead involved in using these functions can sometimes outweigh the benefits of parallel processing.


When the number of workers is too high, they may compete for the same resources, such as memory, CPU, or I/O bandwidth. This can lead to reduced performance as the workers are waiting for access to the resources they need.
The switch to FORK backend statistically significantly improved the performance of the multithreaded for loop run in R on M1 processor.


Conclusion
Parallelisation is a powerful tool for making complex computations in R more efficient. By dividing the task into smaller subtasks, one can make use of multiple cores in our computer to work on the problem simultaneously. However, it’s important to consider the number of workers to use, as resource contention, overhead, and operating system limitations can all impact performance.
Comments
Leave a Reply
Your email address will not be published. Required fields are marked *