Being consistent in how you write your R code is important for readability. While this helps other scholars read your code, you’ll also have to come back to your old code at some point, and you’ll want to be able to quickly and easily figure out what you were thinking years ago. Lots of people and organizations have written their own style guides, and you can see a few of them below:
You don’t need to exactly follow every single rule in these lists. What’s important is to develop a set of practices that works for you, that you can stick with across all of your code. The rest of this document illustrates my personal style guide. It’s similar to those above, but not identical.
I begin R scripts with a header block that provides important information about the script’s purpose, followed by some basic housekeeping tasks. If you’re working in .Rmd
notebooks instead of .R
scripts, you’ll need to either put these in code chunks, or wrap them in HTML comments (<!-- -->
) instead as the text portion of RMarkdown notesbooks does not reocognize #
as a comment character (it’s used to define headings). Sidenote: I’m often very bad about keeping the updated data up to date, which is yet another reason why version control software can be so important. If you work primarily in .Rmd
notebooks instead of .R
scripts, you’ll have to modify these practices slightly.
###############################
## author: Rob Williams ##
## contact: [email protected] ##
## project: 787 labs ##
## created: June 12, 2017 ##
## updated: August 24, 2017 ##
###############################
###################
## R style guide ##
###################
## clear environment
rm(list = ls())
## set working directory
setwd("~/Dropbox/UNC/TA/787 Fall 2017/Lab/Lab 1")
## set seed for replication
set.seed(4245)
Next I have a section where I load packages that the script requires to run. The ####
at the end of the comment line creates a header that you can use to quickly navigate the script in RStudio (code chunk titles function the same way in notebooks.
## load packages ####
library(ggmap)
Use <-
instead of =
for variable assignment to avoid confusion with arguments to functions.
z <- 3
Consistency in naming objects lets you quickly distinguish between different types of objects. I use underscores _
for variables and periods .
for functions. Some programming languages use CamelCase for certain object types, but it can be difficult to read with longer variable names, so you should avoid it. Additionally, RStudio will highlight other instances of example_variable
after you highlight one, but won’t for example.function
.
## R style examples ####
## use underscores in variable names
my_variable <- 17
## use periods in function names
my.function <- function(x) {
print(x)
}
One final guideline for naming objects is to never use existing object names. Given the way R is programmed, you can overwrite base functions or values, which may cause problems later on.
T
T <- 'yellow'
T
Place spaces around all operators.
x <- (352 + 7) / 45
The exception I make is for exponentiation.
x_sq <- x^2
Commas are followed by a space.
fake_data <- c(1, 2, x, x_sq)
Even when indexing multidimensional objects.
fake_mat <- cbind(c(1, 2, 3), c(4, 5, 6))
fake_mat[, 1]
fake_mat[1, ]
fake_mat[2, 2]
Left parentheses that are not part of a function call (loops, custom functions, conditionals) have a leading space.
if (x <= 10) print('this is an arbitrary comparison')
Opening curly braces should never go on their own line; closing curly braces should always go on their own line, unless they are followed by an else
statement. An else
statement should never go on its own line. I leave blank lines before and after the contents within curly braces, but this is personal preference.
for (i in 1:5) {
print(i)
}
for (i in 1:5) {
if (i %% 2 == 0) {
print(paste(i, 'is even'))
} else {
print(paste(i, 'is odd'))
}
}
RStudio will soft-wrap your code if it goes over the line-length of your script pane, but it’s better to limit lines to 80 characters to aid readability across programs. If a line of code exceeds 80 characters, you can move to a new line after any operator. Rstudio will even helpfully indent the next line. You can turn on a margin line in RStudio to help keep yourself to this limit (preferences>code>display>show margin).
states <- c('Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California',
'Colorado', 'Connecticut', 'Delaware', 'District of Columbia',
'Florida', 'Georgia', 'Hawaii')
I don’t actually care about tabs vs. spaces, but the spaces people are right. Luckily, RStudio has your back here, and actually inserts two spaces whenver you hit the ‘tab’ key (open up an R script in TextEdit or Notepad sometime to check this out). What is important is keeping your indentations consistent within nested curly braces. Again, RStudio has a helpful feature you can turn on give you a visual guidline of what your indentation should be (preferences \(\rightarrow\) code \(\rightarrow\) display \(\rightarrow\) show indent guides).
for (i in 1:10) {
for (j in 1:5) {
i*j
}
}
Be sure to comment your code! You want others and future-you to be able to actually understand what everything does. I put descriptive comments on their own line above the commands they refer to, and notes-to-myself type comments after the command they refer to.
## load non-state actor data from Cunningham et al., 2009
NSA <- read.delim('http://privatewww.essex.ac.uk/~ksg/data/nsa_v3.4_21November2013.asc')
## subset relevant observations
NSA <- NSA[8:11, ] # Iran
I always close with an end of script block to let the reader know that nothing has been omitted.
###################
## end of script ##
###################
Take a script that you’ve written previously and apply this style guide to it. You don’t have to follow my style exactly, but your code should be internally consistent and you should think about developing your own style that you can use when writing code going forward. While you’ll submit an RMarkdown notebook for most labs, this time just submit your original script file, and the updated version with a consistent style throughout.