Lab 2a. R & RStudio

Published

August 1, 2022

This Lab Accompanies the Following Lecture

Ignorance more frequently begets confidence than does knowledge.

— Charles Darwin

My mind seems to have become a kind of machine for grinding general laws out of large collections of facts.

— Charles Darwin

Download and Install R and RStudio

R and RStudio are separate programs and need to be installed individually. Follow the instructions on the Posit website.

RStudio can be seen as the vehicle body, seats, dashboard, and all other bells and whistles you might find in a car. R is the engine. RStudio does not work without R. The analyses, graphics, etc. are done with R (running inside RStudio) and not RStudio.

For those of you who will be using your personal laptops, please ensure that you install R and RStudio before the Lab on Monday afternoon. If you have any issues with the installation, please let us know so that we can assist you. We will continue with the instructions below during the Lab session once you have installed R and RStudio.

Setting up the Workspace

General Settings

Before we start using RStudio (which is a code editor and environment that runs R) let’s first set it up properly. Find the ‘Tools’ (‘Preferences’) menu item, navigate to ‘Global Options’ (‘Code Editing’) and select the tick boxes as shown in Figure 1 below.

Figure 1: RStudio preferences.

Customising Appearance

RStudio is highly customisable. Under the Appearance tab under ‘Tools’/‘Global Options’ you can see all of the different themes that come with RStudio. We recommend choosing a theme with a black background (e.g. Chaos) as this will be easier on your eyes and your computer. It is also good to choose a theme with a sufficient amount of contrast between the different colours used to denote different types of objects/values in your code. Refer to Figure 2.

Figure 2: Appearance settings.

The Rproject

A very nifty way of managing workflow in RStudio is through the built-in functionality of the Rproject. We do not need to install any packages or change any settings to use these. Creating a new project is a very simple task, as well. For this course we will be using the Intro_R_Workshop.Rproj file you downloaded with the course material so that we are all running identical projects. This will prevent a lot of issues by ensuring we are doing things by the same standard. Better yet, an Rproject integrates seamlessly into version control software (e.g. GitHub) and allows for instant world class collaboration on any research project. To initialise the ‘Intro_R_Workshop’ project on your machine please find where you saved Intro_R_Workshop.Rproj file and click on it. We will cover the concepts and benefits of an Rproject more as we move through the course.

Installing Packages

The most common functions used in R are contained within the base package; this makes R useful ‘out of the box.’ However, there is extensive additional functionality that is being expanded all the time through the use of packages. Packages are simply collections of code called functions that automate complex mathematical or statistical tasks. One of the most useful features of R is that users are continuously developing new packages and making them available for free. You can find a comprehensive list of available packages on the CRAN website. There are currently (2022-04-29) 18907 packages available for R!

If the thought of searching for and finding R packages is daunting, a good place to start is the R Task View page. This page curates collections of packages for general tasks you might encounter, such as Experimental Design, Meta-Analysis, or Multivariate Analysis. Go and have a look for yourself, you might be surprised to find a good explanation of what you need.

After clicking ‘Tools’/‘Install Packages’, type in the package name tidyverse in the ‘Packages’ text box (note that it is case sensitive) and select the Install button. The Console will run the code needed to install the package, and then provide some commentary on the installation of the package and any of its dependencies (i.e., other R packages needed to run the required package).

The installation process makes sure that the functions within the packages contained within the tidyverse are now available on your computer, but to avoid potential conflicts in the names of functions, it will not load these automatically. To make R ‘know’ about these functions in a particular session, you need either to load the package via ticking the checkbox for that package in the Packages tab, or execute:

library(tidyverse)

To prepare ourselves for the week ahead, let us also install the following packages. Here I demonstate the command line approach to achieve the same thing that can be done via the menu:

install.packages("rmarkdown")
install.packages("tidyverse")
install.packages("bindrcpp")
install.packages("ggpubr")
install.packages("magrittr")
install.packages("boot")
install.packages("ggsn")
install.packages("scales")
install.packages("maps")
install.packages("ggmap")
install.packages("lubridate")
install.packages("bindrcpp")

Since we will develop the habit of doing all of our analyses from R scripts, it is best practice to simply list all of the libraries to be loaded right at the start of your script. Comments may be used to remind your future-self (to quote Hadley Wickham) what those packages are for.

Copying code from RStudio

Here you saw RStudio execute the R code needed to install (using install.packages()) and load (using library()) the package, so if you want to include these in one of your programs, just copy the text it executes. Note that you need only install the current version of a package once, but it needs to be loaded at the beginning of each R session.

Lab 2
  1. Why is it best practice to include packages you use in your R program explicitly?

The Panes of RStudio

RStudio has four main panes each in a quadrant of your screen ((RStudio_panes?)): Source Editor 🅐, Console 🅑, Workspace Browser 🅒 (and History), and Plots 🅓 (and Files, Packages, Help). These can also be adjusted under the ‘Preferences’ menu. Note that there might be subtle differences between RStudio installations on different operating systems. We will discuss each of the panes in turn.

Figure 3: RStudio window panes.

Source Editor 🅐

Generally we will want to write programs longer than a few lines. The Source Editor can help you open, edit and execute these programs. Let us open a simple program:

  1. Use Windows Explorer (Finder on Mac) and navigate to the file BONUS/the_new_age.R.

  2. Now make RStudio the default application to open .R files (right click on the file Name and set RStudio to open it as the default if it isn’t already)

  3. Now double click on the file – this will open it in RStudio in the Source Editor in the top left pane.

Note .R files are simply standard text files and can be created in any text editor and saved with a .R (or .r) extension, but the Source editor in RStudio has the advantage of providing syntax highlighting, code completion, and smart indentation. You can see the different colours for numbers and there is also highlighting to help you count brackets (click your cursor next to a bracket and push the right arrow and you will see its partner bracket highlighted). We can execute R code directly from the Source Editor. Try the following (for Windows machines; for Macs replace Ctrl with Cmd):

  • Execute a single line (Run icon or Ctrl+Enter). Note that the cursor can be anywhere on the line and one does not need to highlight anything — do this for the code on line 2
  • Execute multiple lines (Highlight lines with the cursor, then Run icon or Ctrl+Enter) — do this for line 3 to 6
  • Execute the whole script (Source icon or Ctrl+Shift+Enter)

Now, try changing the x and/or y axis labels on line 18 and re-run the script.

Now let us save the program in the Source Editor by clicking on the file symbol (note that the file symbol is greyed out when the file has not been changed since it was last saved).

At this point, it might be worth thinking a bit about what the program is doing. R requires one to think about what you are doing, not simply clicking buttons like in some other software systems which shall remain nameless for now… Scripts execute sequentially from top to bottom. Try and work out what each line of the program is doing and discuss it with your neighbour. Note, if you get stuck, try using R’s help system; accessing the help system is especially easy within RStudio — see if you can figure out how to use that too.

The #

The hash (#) tells R not to run any of the text on that line to the right of the symbol. This is the standard way of commenting R code; it is VERY good practice to comment in detail so that you can understand later what you have done.

Console 🅑

This is where you can type code that executes immediately. This is also known as the command line. Throughout the notes, we will represent code for you to execute in R as a different font.

Type it in!

Although it may appear that one could copy code from this PDF into the Console, you really shouldn’t. The first reason is that you might unwittingly copy invisible PDF formatting errors into R, which will make the code fail. But more importantly, typing code into the Console yourself gives you the practice you need, and allows you to make (and correct) your own errors. This is an invaluable way of learning and taking shortcuts now will only hurt you in the long run.

Entering code in the command line is intuitive and easy. For example, we can use R as a calculator by typing into the Console (and pressing Enter after each line). Note that the output for every line of calculation (e.g. 6 * 3) is indicated by R>, as we see here:

6 * 3
[1] 18
R> [1] 18
5 + 4
[1] 9
R> [1] 9
2 ^ 3
[1] 8
R> [1] 8

Note that spaces are optional around simple calculations.

We can also use the assignment operator <- to assign any calculation to a variable so we can access it later (the = sign would work, too, but it’s bad practice to use it… and we’ll talk about this as we go):

a <- 2
b <- 7
a + b
[1] 9
R> [1] 9

To type the assignment operator (<-) push the following two keys together: alt -. There are many keyboard shortcuts in R and we will introduce them as we go along.

Spaces are also optional around assignment operators. It is good practice to use single spaces in your R scripts, and the alt - shortcut will do this for you automagically. Spaces are not only there to make the code more readable to the human eye, but also to the machine. Try this:

d<-2
d < -2
[1] FALSE

Note that the first line of code assigns d a value of 2, whereas the second statement asks R whether this variable has a value less than 2. When asked, it responds with FALSE. If we hadn’t used spaces, how would R have known what we meant?

Another important question here is, is R case sensitive? Is A the same as a? Figure out a way to check for yourself.

Lab 2 (continue)
  1. What are the values after each hashed statement in the following? Use the RStudio Console to determine these values:
mass <- 48
age <- 78
mass <- mass * 2.0 # mass? 
age <- age - 17 # age? m
mass_index <- mass / age # mass_index?
Lab 2 (continue)
  1. Use R to calculate some simple mathematical expressions entered. Assign the value of 40 to x and assign the value of 23 to y. Make z the value of x - y Display z in the console.

We can create a vector in R by using the combine c() function:

apples <- c(5.3, 3.8, 4.5)

A vector is a one-dimensional array (i.e., a list of numbers), and this is the simplest form of data used in R (you can think of a single value in R as just a very short vector). We’ll talk about more complex (and therefore more powerful) types of data structures as we go along.

If you want to display the value of apples type:

apples
[1] 5.3 3.8 4.5
R> [1] 5.3 3.8 4.5

Finally, there are default functions in R for nearly all basic statistical analyses, including mean() and sd() (standard deviation):

mean(apples)
[1] 4.533333
R> [1] 4.533333
sd(apples)
[1] 0.7505553
R> [1] 0.7505553
Variable names

It is best not to use c as the name of a value or array. Why? What other words might not be good to use?

Or try this:

round(sd(apples), 2)
[1] 0.75
R> [1] 0.75
Lab 2 (continue)
  1. What did we do above? What can you conclude from those functions?

(Lab 2 continues in Lab 2b.)

RStudio supports the automatic completion of code using the Tab key. For example, type the three letters app and then the Tab key. What happens?

The code completion feature also provides brief in-line help for functions whenever possible. For example, type mean() and press the Tab key.

The RStudio Console automagically maintains a ‘history’ so that you can retrieve previous commands, a bit like your Internet browser or Google. On a blank line in the Console, press the up arrow, and see what happens.

If you wish to review a list of your recent commands and then select a command from this list you can use Ctrl+Up to review the list (Cmd+Up on the Mac). If you prefer a ‘bird’s eye’ overview of the R command history, you may also use the RStudio History pane (see below).

The Console title bar has a few useful features:

  1. It displays the current R working directory (more on this later)

  2. It provides the ability to interrupt R during a long computation (a stop sign will appear whilst code is running)

  3. It allows you to minimise and maximise the Console in relation to the Source pane using the buttons at the top-right or by double-clicking the title bar)

Environment and History Panes 🅒

The Environment pane is very useful as it shows you what objects (i.e., dataframes, arrays, values and functions) you have in your environment (workspace). You can see the values for objects with a single value and for those that are longer R will tell you their class. When you have data in your environment that have two dimensions (rows and columns) you may click on them and they will appear in the Source Editor pane like a spreadsheet.

You can then go back to your program in the Source Editor by clicking its tab or closing the tab for the object you opened. Also in the Environment is the History tab, where you can see all of the code executed for the session. If you double-click a line or highlight a block of lines and then double-click those, you can send it to the Console (i.e., run them).

Typing the following into the Console will list everything you’ve loaded into the Environment:

ls()
[1] "a"          "age"        "apples"     "b"          "d"         
[6] "mass"       "mass_index"
R> [1] "a"        "apples"   "b"        "pkgs_lst" "url"

What do we have loaded into our environment? Did all of these objects come from one script, or more than one? How can we tell where an object was generated?

Files, Plots, Packages, Help, and Viewer Panes 🅓

The last pane has a number of different tabs. The Files tab has a navigable file manager, just like the file system on your operating system. The Plot tab is where graphics you create will appear. The Packages tab shows you the packages that are installed and those that can be installed (more on this just now). The Help tab allows you to search the R documentation for help and is where the help appears when you ask for it from the Console.

Methods of getting help from the Console include…

?mean

…or:

help(mean)

We will go into this in more detail in the next session.

To reproduced Figure 4 in the Plot tab, simply copy and paste the following code into the Console:

library(tidyverse)
x <- seq(0, 2, by = 0.01)
y <- 2 * sin(2 * pi * (x - 1/4))
ggplot() +
  geom_point(aes(x = x, y = y), shape = 21, col = "indianred", fill = "steelblue")+
  theme_linedraw()
Figure 4: An easy figure.

Reuse

Citation

BibTeX citation:
@online{j._smit2022,
  author = {J. Smit, Albertus},
  title = {Lab 2a. {R} \& {RStudio}},
  date = {2022-08-01},
  url = {http://tangledbank.netlify.app/BDC334/02a-r_rstudio.html},
  langid = {en}
}
For attribution, please cite this work as:
J. Smit A (2022) Lab 2a. R & RStudio. http://tangledbank.netlify.app/BDC334/02a-r_rstudio.html.