Associated material
Module: Module 01 - Introducing R and
Rstudio
Readings:
Before we start
This web site provides two kinds of materials – module notes and zoom
notes. The module pages cover the content of each online session in
detail. They often provide additional in-depth discussion and examples.
You can use the module notes to revise and extend lesson content on your
own at any time. The zoom notes contain an outline of the material
covered in each module and coding exercises that you can work through
after the lesson to solidify your understanding and build your skills.
We can work through these exercises together in each week’s face to face
practical session.
At the top right hand corner of each notes page is a “Code” button.
This toggles showing/hiding the code used to create the page as an
Rmarkdown file (we cover Rmarkdown more in-depth in module/zoom notes 4). When working
through the zoom notes, you can use this button to hide the exercise
solutions if you prefer to tackle them first on your own.
R and RStudio
R is a programming language. RStudio is an integrated development
environment (IDE) which provides a graphical interface (among other
things) for interacting with R.
The RStudio screen is comprised of panels.
Panels of RStudio
- console
- source
- environment/history
- file/plot/help/viewer
Scripts
Scripts are text files that contain R code. Scripts provide a
persistent record of the steps we perform in our data analysis, and can
be used to recreate what we have done without having to retype.
Scripts can be either:
R Script - plain text document that contains R commands
RMarkdown - a plain text document that has text areas and code areas.
RMarkdown files are processed/compiled to an output format such as html
or pdf which contains formatted text, code, and the results of the code
embedded in the document. We’ll cover this in more depth Module/Handout
4.
R Syntax
General syntax info:
- R is case sensitive (e.g.
weight
and
Weight
are different entities)
- Spaces are ignored in assignments and expressions. For example
8+3
and 8 + 3
are functionally equivalent.
However, generous use of white space is encouraged to improve
accessibility.
- Variable and function names in R conventionally use snake
case. All letters are lower case, and words are separated by
underscore (e.g.
annual_mean_rainfall
).
Mathematical operators
Addition: +
Subtraction: -
Multiplication: *
Division: /
Exponentiation: ^
Modulo (remainder): %%
# Addition
1 + 2
## [1] 3
# Subtraction
3 - 6
## [1] -3
# Multiplication
4 * 2
## [1] 8
# Division
12 / 3
## [1] 4
# Exponentiation
2 ** 5
## [1] 32
# Modulo
5 %% 2
## [1] 1
Data types
R has three primary types of data:
- numeric
- characters
- boolean
Numeric data can be integer (whole numbers) or “double” (having a
decimal part).
Character data items are comprised of one or more letters, numbers,
or (permitted) symbols. In the vernacular, these are called “strings”.
Strings are defined in R by enclosing them in quotation marks
(e.g. "Emperor Penguin"
).
Booleans are logical data. They can take on only the values
TRUE
or FALSE
, and are used for logical
operations.
R functions (see discussion below) accept input data of specific
type(s). Applying a function or operation to the “wrong” type of data
will cause an error.
Variables
Variables are named objects in which we store data. The stored values
can be referenced/used later in our code.
Variable names must:
- Start with a letter
- Contain no non-alphanumeric symbols except . (full stop) and _
(underscore)
Variable names should always be descriptive of their contents. This
will help you remember what data they hold when they are used in the
code.
Assignment operators
In R <-
and =
are both used for
assignment. <-
is used to assign a value to a variable.
=
is used inside a function call to assign a value to a
function input.
The right-hand side of an assignment statement is evaluated first.
The result is then assigned to the variable on the left-hand side. Thus
the statement x <- x * 2
means “First multiply the
current value of x by 2. Take the result of that operation and assign it
as the new value of x.”
Note that variables only contain the last thing that was
assigned to them. Assigning a value to a variable overwrites
any existing value.
Functions
Functions are encapsulated, named chunks of code. We
call a function by typing its name, followed by (). We
provide any required data inputs (called function arguments in
this context) by placing them between the round brackets. When a
function is called, R executes the code which it encapsulates and
returns the result.
For example, if we want to find the square root of a number, we can
use the sqrt
function and provide the number as input:
sqrt(64)
## [1] 8
Many functions require multiple pieces of input data. Multiple
arguments are placed between the round brackets, separated by
commas.
Each function argument has a name. Arguments can be identified by
name, or by ordinal position within the round brackets.
# by position
round(3.142, 1)
## [1] 3.1
# by name
round(x = 3.142, digits = 1)
## [1] 3.1
To see what arguments a function requires, call the args
function, passing in the function name:
args(round)
## function (x, digits = 0)
## NULL
R contains many, many, powerful pre-defined functions. In one of our
later modules, we will learn how to write our own custom functions.
Comparison Operators
Comparison operators evaluate to either TRUE
or
FALSE
.
Comparators:
- equality:
==
- not equal to:
!=
- greater than:
>
- greater than or equal to:
>=
- less than:
<
- less than or equal to:
<=
Logical operators
We’ll cover these in more depth in the Selecting and Filtering Module.
Complex Data
Vectors
A vector is a complex data structure that holds multiple data
elements. Vectors are homogeneous – all their elements
must be of the same data type.
To create a vector use function c()
, the
combine function. Items are separated by commas.
some_letters <- c("a", "b", "c")
some_letters
## [1] "a" "b" "c"
some_numbers <- c(2, 4, 6)
some_numbers
## [1] 2 4 6
We can see the data type of a vector using typeof
typeof(some_letters)
## [1] "character"
typeof(some_numbers)
## [1] "double"
Subsetting by index
To access individual items in a vector use []
, the
index operator, as shown below. Place the item’s
ordinal position between the square brackets. Select multiple items by
placing a vector of positions between the square brackets.
some_numbers
## [1] 2 4 6
# pull out the second item
some_numbers[2]
## [1] 4
some_letters
## [1] "a" "b" "c"
# pull out items 1 and 3
some_letters[c(1,3)]
## [1] "a" "c"
We will see examples of more complex data structures and their use in
later modules.
Getting Help
To access the R documentation for a function, call
help()
passing in the function name, or type ?
followed by the name of the function (no intervening space).
Language documentation tends to be terse, and targeted to the
advanced user. Frequently you will want more detailed explanation than
is available from the built-in help. We recommend Google (my most common
search is “how to … in R”) or one of the many useful text books
available through the University library (see for example, https://otago.primo.exlibrisgroup.com/permalink/64OTAGO_INST/qef3lj/alma9926179377401891)
Projects
Projects within RStudio provide a mechanism to organise your work. A
Project is a directory on your computer where you gather related code,
documentation, data, and outputs so that everything needed to recreate
an analysis is “bundled together”.
To create a new Project:
- File -> New Project -> New Directory -> New Project
- Choose where you want it to live and give it a name
- Click “Create Project”
It is useful to add some subfolders to your Project folder. A common
organisation is to have separate subfolders for your scripts, input
files, output files, and documentation.
You can create subfolders by using the “New Folder” button in the
Files panel or using the dir.create()
function, as shown
below.
dir.create("scripts")
dir.create("data")
dir.create("data")
dir.create("outputs")
Module 01 Exercises
Create a vector of 5 numbers and assign it to a
variable.
Find out the length
of the vector.
Divide the entire vector by 2 and store the result into a
variable called div_2. Explain what happens when you perform a
mathematical operation on a vector.
Calculate the minimum, maximum, mean, and standard deviation for
the div_2 vector. Round each result to 2 decimal
places.
Create and assign a vector of at least 4 animal names into
animals.
Compute the number of characters in each item. Use only one line
of code.
Extract the first and fourth animal into a new variable.
Remove the third animal from your original animals vector. There
are multiple syntactically legal ways to achieve this in R, but some are
more elegant than others. (Hint: what does using a negative index
do?)
Create a vector that has three copies of this updated animals
vector. Use only one line of code.
Combine your animal and number vectors together into a new
variable called coerced. Run typeof
on this
vector. Explain why the types of some elements have been
changed.
Example solutions
my_numbers <- c(12, 63, 3, 7, 84)
length(my_numbers)
## [1] 5
div_2 <- my_numbers / 2
div_2
## [1] 6.0 31.5 1.5 3.5 42.0
# When a mathematical operator is applied to a vector, the operation is
# performed on each individual vector element, and the whole set of new values
# is returned.
# Note the range of syntactic options. Strive for a balance between clarity and
# parsimony.
# minimum
min_number <- min(div_2)
round(min_number, digits = 2)
## [1] 1.5
# maximum
max_number <- max(div_2)
round(max_number, 2)
## [1] 42
# mean
round(mean(div_2), digits = 2)
## [1] 16.9
# standard deviation
round(sd(div_2), 2)
## [1] 18.57
animals <- c("lion", "tiger", "snake", "beetle", "turtle")
animals
## [1] "lion" "tiger" "snake" "beetle" "turtle"
nchar(animals)
## [1] 4 5 5 6 6
two_animals <- animals[c(1,4)]
two_animals
## [1] "lion" "beetle"
animals <- animals[-3]
animals
## [1] "lion" "tiger" "beetle" "turtle"
animals3 <- c(animals, animals, animals)
animals3
## [1] "lion" "tiger" "beetle" "turtle" "lion" "tiger" "beetle" "turtle"
## [9] "lion" "tiger" "beetle" "turtle"
my_numbers
## [1] 12 63 3 7 84
typeof(my_numbers)
## [1] "double"
animals
## [1] "lion" "tiger" "beetle" "turtle"
typeof(animals)
## [1] "character"
combined <- c(my_numbers, animals)
typeof(combined)
## [1] "character"
combined
## [1] "12" "63" "3" "7" "84" "lion" "tiger" "beetle"
## [9] "turtle"
# The numeric elements have been coerced to type character so that
# the vector remains homogeneous.

Comments
The
#
symbol denotes a comment. R will ignore all text following the#
on the same line. Comments should explain the logic of the code. This helps other people understand your code; it also helps you remember your thinking when you revisit your own code in the future. Thorough commenting is so essential to software development that most professionals write their comments first, then fill in the code following the structure defined by the comments.