Visualise your experimental data

How to parallelize for loops in R

Home » How to parallelize for loops in R

Updated Feb 26th, 2023.

Introduction to Parallel Computing in R

Parallel computing is a technique that enables us to tackle large computational tasks by dividing them into smaller, manageable subtasks and executing them simultaneously on different processors or on processor cores of a single CPU. In R, this can be especially beneficial, as complex computations can take hours, or even days, to complete.

The Benefits of For Loops Parallel Computing in R

Traditional base R for-loops are highly inefficient because they only utilize a single core of the computer to carry out the iterations. A parallel for loop, on the other hand, is one in which the statements within the loop can be executed simultaneously on separate cores, processors, or threads.

Libraries to Parallelize For Loops in R

  • library(parallel): support for parallel computation, including random-number generation.
  • library(foreach): a high-level interface for parallel computation as a series of iterations.
  • library(doParallel): a parallel backend for foreach.

Running Parallel For loop Calculation in R

I used the following code to set up backend for parallel computing, to export my objects and reference classes needed for computation to the worker nodes, and to execute the loop using foreach().

  1. Setup backend to use multiple cores for running a parallel for loop.
library(parallel)
library(foreach)
library(doParallel)

totalCores = detectCores(logical = FALSE)

if(.Platform$OS.type == "windows" ) {
  cl <- makeCluster(totalCores[1]/2, type = "PSOCK")
} else {
  makeCluster(totalCores[1]-1, type = "FORK")
}

On Windows the backend type is PSOCK. On UNIX-based systems FORK is usually more efficient.

  1. Export the objects and reference classes needed for computation to the worker nodes using clusterExport().
clusterExport(cl, c("img3", "nuclei", "computeFeatures", "computeFeatures.moment", "computeFeatures.shape", "computeFeatures.basic", "computeFeatures.haralick"))
registerDoParallel(cl)
  1. Perform parallel for loop calculation in R using foreach().
data <- foreach(i = 1:dim(img3)[3], .packages = "parallel", .combine = cbind) %dopar% {
  computed_ft <- computeFeatures(nuclei, img3[,,i], xname = "Pt_", refnames = "_")
  cbind(computed_ft[,12])
}

The Limitations of Base R for Parallel Computing

The base R functions are not designed to take advantage of multiple processors, and the overhead involved in using these functions can sometimes outweigh the benefits of parallel processing.

CPU and memory usage during for loop calculation in R
For loop execution time depending on number of cores

When the number of workers is too high, they may compete for the same resources, such as memory, CPU, or I/O bandwidth. This can lead to reduced performance as the workers are waiting for access to the resources they need.

The switch to FORK backend statistically significantly improved the performance of the multithreaded for loop run in R on M1 processor.

For loop execution time depending on backend type on M1 processor
For loop execution time on Mac in parallel in R

Conclusion

Parallelisation is a powerful tool for making complex computations in R more efficient. By dividing the task into smaller subtasks, one can make use of multiple cores in our computer to work on the problem simultaneously. However, it’s important to consider the number of workers to use, as resource contention, overhead, and operating system limitations can all impact performance.

How to build your own ChatGPT web app ↗

I will show you how to create a web app that would be running ChatGPT-3.5-turbo model under the hood. It will look like this ChatGPT-based chat but the code bel

GPT-4 does data analysis of a pasted dataset ↗

I was wondering as to whether ChatGPT can analyse the dataset if I copy-pasted it in chat’s text input field. One of the Gapminder datasets is “Mini” at Kaggle.

10 Best Practices for Effective Data Visualization: Simplicity ↗

updated March 8th, 2023 This is a long read on best practices in data visualisation, which will be periodically updated. I will try to supplement each post with


Posted
February 5, 2023
by
Maxim Bespalov

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *