Running R on multiple cores, Mac OS

If you do something computationally intensive, such as fitting a hierarchical/mixed effects model with random slopes in the lme4 package, you might find that R takes hours and sometimes even days just to tell you that it didn’t converge. In my struggles with R, I figured out this way to run several models at a time on several CPU cores. Here is how I did it.

When invoked from R.app, R runs on just one CPU at a time in Mac OS. But if you run R from the command line, you can assign different R processes to different cores:

  1. Open Terminal. (Macintosh HD>Applications>Utilities>Terminal.app).
  2. Start screen by typing screen at the command prompt.
  3. Start R by typing R at the command prompt on the screen emulated terminal. You might have to hit space to get to the prompt itself–check the screen manual for more.
  4. Paste in your R commands from wherever you keep them. Alternatively, run an R script using the source() command. Here’s a small example:
  5. setwd("/blah/blah/blah/place_you_want_your_output/")
    exp = read.csv("your_dataframe.csv") #Make sure it's in the working directory
    library(lme4)
    Sys.time() #this tells you when R started running the model
    model1<-lmer(blah blah); Sys.time(); save(model1, file = "model1.Rda") #this is your huge fully crossed model.

  6. Since R can take a while to fit an lmer model (I've had models run for 91 hours before failing to converge!), you might want to let R run in the background while you are doing other things. Running R in screen allows you to do that. Disconnect from the screen while R is running by hitting Ctrl+A and then Ctrl+D.
  7. You can reconnect to the R screen by entering screen -R at the command line.

 

 

(These instructions were current as of R 2.14 on Mac OS 10.6.8, and my iMac has a 3.06 GHz Intel Core i3 processor and 4 GB of 1333 MHz of RAM. If you know that something has changed, please tell me!)

Once your .Rda file is saved, you can open it in R to inspect the model using summary(model1). If you get a message about non-convergence, use the model you did get to decide which random slopes to remove. Here is how to decide:

 


sort(sapply(ranef(model1)$subject, sd))
sort(sapply(ranef(model1)$word, sd))

 

Take the random effect term with smallest standard deviation out of the model and try running the model again.

Since there is a chance that your next model won't converge, either, you can run multiple instances of R on the same Mac by repeating the steps in 1-6 for different models. When you run the screen -R command, you'll see that you have multiple screens running; connect to each of them separately by using the screen ID number you see.

You can of course connect to your Mac remotely using SSH and connect to the R-running screens to check on whether the models are still running, or use top to check how much CPU % your various instances of R are using.

Comments Off on Running R on multiple cores, Mac OS

Filed under R, tutorials

Comments are closed.