Basic principle: STOP whenever you have an **error message**: it’s useless to continue!

A large majority of commands use an arrow “<-”. The following line

a <- b

means that the software will put value b inside variable a.

You need to install a package only once, but you need to activate it each time you start a new R session. The hashtag is use to append comments to the code.

```
if (!require("tidyverse")) install.packages('tidyverse') # This line to install, if it has not already been done.
library(tidyverse) # This line to activate. Note: quotes are unnecessary here.
```

R works in one particular folder. You can fix it in the Files pane in RStudio. Or you can use the setwd() function. To see what is the current working directory, type getwd().

Two major items in R: the functions that you are going to use (like in Excel: sum(), min(), etc.) and the variables that you will manipulate. There is a **MAJOR** difference between the two! In terms of code, there is only one small (but important!) difference: functions work with *round* brackets () and data variables work with *square* brackets [].

For a function, for instance the square root function sqrt(), there is always an argument inside the round brackets: it is the element on which the function will work. sqrt(5) will produce the square root of five. For a variable, the numbers inside the square brackets will relate to indexing (more on that below).

This is usually done directly in the user interface, or with packages like *openxlsx* or *readxl* (to import Excel files) with the function read.xlsx() or read_excel(). The basic case:test_data <- read.xlsx(“MyFile.xlsx”) or test_data <- read_excel(“MyFile.xlsx”).

This stores your data into the test_data variable. This assumes that the Excel file “MyFile.xlsx” exists in your working directory.

You can create data from scratch, using the colon operator for instance.

`1:10`

` [1] 1 2 3 4 5 6 7 8 9 10`

`3:17`

` [1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17`

More generally, the c() function concatenates and encapsulates numbers (or text):

`c(2,5,7)`

`[1] 2 5 7`

`c(1:6,12:20)`

` [1] 1 2 3 4 5 6 12 13 14 15 16 17 18 19 20`

`c("R", " is ", "awesome")`

`[1] "R" " is " "awesome"`

Another way to replicate data is to use row-bind and column-bind functions rbind() and cbind().

`rbind(c(2,5,7),c(3,1,8)) `

```
[,1] [,2] [,3]
[1,] 2 5 7
[2,] 3 1 8
```

`cbind(c(2,5,7),c(3,1,8)) `

```
[,1] [,2]
[1,] 2 3
[2,] 5 1
[3,] 7 8
```

You can also fill in matrices:

```
m <- matrix(1:20, nrow = 4)
m2 <- matrix(1:20, nrow = 4, byrow = T) # Two ways to fill: by row or by column
m
```

```
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
```

`m2`

```
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
```

R is great to generate random data.

`runif(10) # uniform distribution: 10 samples`

` [1] 0.06688124 0.82541667 0.28137797 0.11063758 0.53751033 0.38189383 0.08916188 0.86453049 0.61392825 0.68673409`

`rnorm(20) # Gaussian distribution (parameters could be specified, see online manual): 20 data points`

```
[1] 1.58870173 0.86578773 0.13029713 -0.97993171 -1.64267374 -0.40144518 0.43771547 -0.16043997 -1.27166377 1.09629147
[11] 0.51176613 1.41954010 -0.06755030 -0.14813283 0.40070418 0.36982084 0.70984827 0.49217370 0.69688766 0.02829686
```

Datasets often mix text and numbers. R can do that too, with data frames. Let’s create one with the data.frame() function. We use the round() function which rounds up numbers.

```
nb_gender <- 7 # Number of people of each gender
Gender <- rep(c("Male"),nb_gender) # nb_gender men in total
Weight <- rnorm(nb_gender, mean = 70, sd = 8) %>% round() # in kilos
Height <- rnorm(nb_gender, mean = 178, sd = 10) %>% round() # in cm
Age <- rnorm(nb_gender, mean = 40, sd = 7) %>% round()
data <- data.frame(Gender,Weight,Height,Age) # data with only men
Gender <- rep(c("Female"),nb_gender) # nb_gender women in total
Weight <- rnorm(nb_gender, 60, sd = 8) %>% round() # in kilos
Height <- rnorm(nb_gender, 167, sd = 10) %>% round() # in cm
Age <- rnorm(nb_gender, mean = 40, sd = 7) %>% round()
data <- rbind(data, data.frame(Gender,Weight,Height,Age)) # grouping women with men
data
```

You can use rownames() or colnames() to get or set the names of rows or columns: colnames(data).

You can obtain the dimension of a matrix or data frame with the dim() function: dim(data). (Nb rows and nb columns). Each dimension can be obtained separately with nrow() and ncol() For vectors, the number of elements can be found with the length() function.

`dim(data) # Be careful with this one`

`[1] 14 4`

`nrow(data) # Number of rows`

`[1] 14`

`ncol(data) # Number of columns`

`[1] 4`

`length(3:35) # Number of elements (best used for a vector)`

`[1] 33`

In R, it is usefulto perform tests. For instance, given the sequence 1:12, we want to know which values are strictly greater than 6. The simple command 1:12>6 will provide the answer: the statement is false for the first six elements (1 to 6) and true for the last six (7 to 12).

`1:12>6`

` [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE`

Accessing the values of a variable can be done with the square brackets [] thanks to indexing. For instance, the value in the third row and second column of data is data[3,2].

When columns have names, it is possible to use it to isolate a particular column with the dollar $ operator:

`data$Age`

` [1] 42 29 34 37 45 38 45 40 54 46 36 35 44 28`

Another way to proceed is to omit to specify the row numbers: since Height is the third column of data, then the result is the same with data[,3]. This give you all of the third column. Likewise, data[3,] will return all of the third row.

`data[,3] # Third column`

` [1] 165 184 188 177 158 180 174 168 154 169 151 171 173 163`

`data[3,] # Third row`

You can extract data with boolean vectors! For instance, if we want to select the people who are older than 42 years old: simple!

`data$Age>42`

` [1] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE`

will provide the corresponding indices. To extract the data, you just need to select the right rows and all columns:

`data[data$Age>42,]`

Only the TRUE rows are kept. As we will see, the filter() function of the tidyverse does just that.

Writing on data frames, vectors, or matrices can be done with the arrow operator:

```
data[3,2] <- 99
data[c(7,9),3] <- 166 # Replace 2 cells at a time! Seventh and ninth row on the third column.
data[c(6,8),3] <- c(199,177) # Same, but with 2 different values.
data # CHECK where the new values are!
```

Unlike in Excel, the data is not directly shown in R. You have to ask for it! To see the content of a variable, you have to type its name and press ENTER.

The head() function shows the first 6 lines and the tail() function shows the last 6 lines.

`head(data, 8) # First n lines, with n = 6 by default`

The summary() function very often gives useful (statistical) information

`summary(data) # Descriptive statistics`

```
Gender Weight Height Age
Male :7 Min. :48.00 Min. :151.0 Min. :28.00
Female:7 1st Qu.:59.25 1st Qu.:165.2 1st Qu.:35.25
Median :65.50 Median :170.0 Median :39.00
Mean :68.07 Mean :171.9 Mean :39.50
3rd Qu.:71.75 3rd Qu.:177.0 3rd Qu.:44.75
Max. :99.00 Max. :199.0 Max. :54.00
```

The best package for date management is lubridate. Dates can be converted using the as.Date(), and years, months and days can be retrieved using the year(), month() and day() functions.

```
if(!require(lubridate)){install.packages("lubridate")}
library(lubridate)
d <- as.Date("2000-04-08")
year(d) # Gives the year
```

`[1] 2000`

`month(d) # Gives the month`

`[1] 4`

`day(d) # Gives the day`

`[1] 8`

`make_date(year = 2017, month = 6, day = 12) # Creates a date with specified YMD`

`[1] "2017-06-12"`

Filtering items is incredibly easy via filter(). The %in% operator can be useful when testing for several values.

`filter(data, Age > 42) # All people older than 42`

`filter(data, Gender == "Male", Weight > 70) # All guys heavier than 70 `

`filter(diamonds, cut %in% c("Fair", "Good")) # Diamonds with Fair or Good cut`

`filter(diamonds, color %in% c("E", "F")) # Diamonds with E or F color`