Basic principle: STOP whenever you have an error message: it’s useless to continue!
A large majority of commands use an arrow “<-”. The following line

a <- b

means that the software will put value b inside variable a.

1 - BASICS

Packages

You need to install a package only once, but you need to activate it each time you start a new R session. The hashtag is use to append comments to the code.

if (!require("tidyverse")) install.packages('tidyverse') # This line to install, if it has not already been done.
library(tidyverse)                                       # This line to activate. Note: quotes are unnecessary here.

Working directory

R works in one particular folder. You can fix it in the Files pane in RStudio. Or you can use the setwd() function. To see what is the current working directory, type getwd().

Variables vs functions

Two major items in R: the functions that you are going to use (like in Excel: sum(), min(), etc.) and the variables that you will manipulate. There is a MAJOR difference between the two! In terms of code, there is only one small (but important!) difference: functions work with round brackets () and data variables work with square brackets [].
For a function, for instance the square root function sqrt(), there is always an argument inside the round brackets: it is the element on which the function will work. sqrt(5) will produce the square root of five. For a variable, the numbers inside the square brackets will relate to indexing (more on that below).

Importing data

This is usually done directly in the user interface, or with packages like openxlsx or readxl (to import Excel files) with the function read.xlsx() or read_excel(). The basic case:test_data <- read.xlsx(“MyFile.xlsx”) or test_data <- read_excel(“MyFile.xlsx”).
This stores your data into the test_data variable. This assumes that the Excel file “MyFile.xlsx” exists in your working directory.

2 - CREATING DATA

Simple sequences

You can create data from scratch, using the colon operator for instance.

1:10
 [1]  1  2  3  4  5  6  7  8  9 10
3:17
 [1]  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17

More generally, the c() function concatenates and encapsulates numbers (or text):

c(2,5,7)
[1] 2 5 7
c(1:6,12:20)
 [1]  1  2  3  4  5  6 12 13 14 15 16 17 18 19 20
c("R", " is ", "awesome")
[1] "R"       " is "    "awesome"

Another way to replicate data is to use row-bind and column-bind functions rbind() and cbind().

rbind(c(2,5,7),c(3,1,8)) 
     [,1] [,2] [,3]
[1,]    2    5    7
[2,]    3    1    8
cbind(c(2,5,7),c(3,1,8)) 
     [,1] [,2]
[1,]    2    3
[2,]    5    1
[3,]    7    8

You can also fill in matrices:

m <- matrix(1:20, nrow = 4) 
m2 <- matrix(1:20, nrow = 4, byrow = T) # Two ways to fill: by row or by column
m
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20
m2
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20

R is great to generate random data.

runif(10) # uniform distribution: 10 samples
 [1] 0.06688124 0.82541667 0.28137797 0.11063758 0.53751033 0.38189383 0.08916188 0.86453049 0.61392825 0.68673409
rnorm(20) # Gaussian distribution (parameters could be specified, see online manual): 20 data points
 [1]  1.58870173  0.86578773  0.13029713 -0.97993171 -1.64267374 -0.40144518  0.43771547 -0.16043997 -1.27166377  1.09629147
[11]  0.51176613  1.41954010 -0.06755030 -0.14813283  0.40070418  0.36982084  0.70984827  0.49217370  0.69688766  0.02829686

Dataframes

Datasets often mix text and numbers. R can do that too, with data frames. Let’s create one with the data.frame() function. We use the round() function which rounds up numbers.

nb_gender <- 7                                              # Number of people of each gender
Gender <- rep(c("Male"),nb_gender)                          # nb_gender men in total
Weight <- rnorm(nb_gender, mean = 70, sd = 8) %>% round()   # in kilos
Height <- rnorm(nb_gender, mean = 178, sd = 10) %>% round() # in cm
Age <- rnorm(nb_gender, mean = 40, sd = 7)  %>% round()  
data <- data.frame(Gender,Weight,Height,Age)                # data with only men
Gender <- rep(c("Female"),nb_gender)                        # nb_gender women in total
Weight <-  rnorm(nb_gender, 60, sd = 8)  %>% round()        # in kilos
Height <-  rnorm(nb_gender, 167, sd = 10)  %>% round()      # in cm
Age <- rnorm(nb_gender, mean = 40, sd = 7)  %>% round()  
data <- rbind(data, data.frame(Gender,Weight,Height,Age))   # grouping women with men
data

You can use rownames() or colnames() to get or set the names of rows or columns: colnames(data).

Dimensions

You can obtain the dimension of a matrix or data frame with the dim() function: dim(data). (Nb rows and nb columns). Each dimension can be obtained separately with nrow() and ncol() For vectors, the number of elements can be found with the length() function.

dim(data)  # Be careful with this one
[1] 14  4
nrow(data) # Number of rows
[1] 14
ncol(data) # Number of columns
[1] 4
length(3:35) # Number of elements (best used for a vector)
[1] 33

Boolean (TRUE/FALSE) data

In R, it is usefulto perform tests. For instance, given the sequence 1:12, we want to know which values are strictly greater than 6. The simple command 1:12>6 will provide the answer: the statement is false for the first six elements (1 to 6) and true for the last six (7 to 12).

1:12>6
 [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

3 - HANDLING DATA IN PURE R

Extracting data

Accessing the values of a variable can be done with the square brackets [] thanks to indexing. For instance, the value in the third row and second column of data is data[3,2].
When columns have names, it is possible to use it to isolate a particular column with the dollar $ operator:

data$Age
 [1] 42 29 34 37 45 38 45 40 54 46 36 35 44 28

Another way to proceed is to omit to specify the row numbers: since Height is the third column of data, then the result is the same with data[,3]. This give you all of the third column. Likewise, data[3,] will return all of the third row.

data[,3] # Third column
 [1] 165 184 188 177 158 180 174 168 154 169 151 171 173 163
data[3,] # Third row

You can extract data with boolean vectors! For instance, if we want to select the people who are older than 42 years old: simple!

data$Age>42
 [1] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE

will provide the corresponding indices. To extract the data, you just need to select the right rows and all columns:

data[data$Age>42,]

Only the TRUE rows are kept. As we will see, the filter() function of the tidyverse does just that.

Writing / Replacing values

Writing on data frames, vectors, or matrices can be done with the arrow operator:

data[3,2] <- 99
data[c(7,9),3] <- 166        # Replace 2 cells at a time! Seventh and ninth row on the third column.
data[c(6,8),3] <- c(199,177) # Same, but with 2 different values. 
data                         # CHECK where the new values are!

Seeing data

Unlike in Excel, the data is not directly shown in R. You have to ask for it! To see the content of a variable, you have to type its name and press ENTER.
The head() function shows the first 6 lines and the tail() function shows the last 6 lines.

head(data, 8) # First n lines, with n = 6 by default

The summary() function very often gives useful (statistical) information

summary(data) # Descriptive statistics
    Gender      Weight          Height           Age       
 Male  :7   Min.   :48.00   Min.   :151.0   Min.   :28.00  
 Female:7   1st Qu.:59.25   1st Qu.:165.2   1st Qu.:35.25  
            Median :65.50   Median :170.0   Median :39.00  
            Mean   :68.07   Mean   :171.9   Mean   :39.50  
            3rd Qu.:71.75   3rd Qu.:177.0   3rd Qu.:44.75  
            Max.   :99.00   Max.   :199.0   Max.   :54.00  

Date management

The best package for date management is lubridate. Dates can be converted using the as.Date(), and years, months and days can be retrieved using the year(), month() and day() functions.

if(!require(lubridate)){install.packages("lubridate")}
library(lubridate)
d <- as.Date("2000-04-08")
year(d)  # Gives the year
[1] 2000
month(d) # Gives the month
[1] 4
day(d)   # Gives the day
[1] 8
make_date(year = 2017, month = 6, day = 12) # Creates a date with specified YMD
[1] "2017-06-12"

4 - HANDLING DATA WITH THE TIDYVERSE

Manipulation

Filtering items is incredibly easy via filter(). The %in% operator can be useful when testing for several values.

filter(data, Age > 42)                        # All people older than 42
filter(data, Gender == "Male", Weight > 70)   # All guys heavier than 70 
filter(diamonds, cut %in% c("Fair", "Good"))  # Diamonds with Fair or Good cut
filter(diamonds, color %in% c("E", "F"))      # Diamonds with E or F color