1 Introduction

This notebook presents the first empirical study of the paper Forking paths in empirical studies available on SSRN.
The idea of the paper is to decompose any empirical analysis into a series of steps, which are pompously called mapppings in the paper.
Each step matters and can have an important impact on the outcome of the study.
Because of that, we argue that the researcher should, if possible, keep track of all the possible modelling options and present the distribution of outcomes (e.g., t-statistics or p-values) across all configurations.

Below, some code chunks are shown (because they are possibly insightful), others are not (especially for plots).
It is easy to access all code content by clicking on the related buttons.

2 Data

First, we test the data importation & load the libraries.
The data, downloaded from Amit Goyalโ€™s website is stored on a Github repo.
It was used in the follow-up paper A Comprehensive Look at the Empirical Performance of Equity Premium Prediction II.
Because we do not want to download the data multiple times, we do it only once, for three sheets (monthly, quarterly & yearly data).

library(tidyverse)   # Data wrangling & plotting
library(readxl)      # To read excel files
library(zoo)         # For data imputation
library(DescTools)   # For winsorization
library(sandwich)    # HAC estimator
library(lmtest)      # Statistical inference
library(furrr)       # Parallel computing
library(viridis)     # Color palette
library(patchwork)   # Graph layout
library(xtable)      # LaTeX exports
library(reshape2)    # List management
library(stabledist)  # For stable distributions
library(ptsuite)     # For tail estimation

loadWorkbook_url <- function(sheet) { # Function that downloads the data from online file
    url = "https://github.com/shokru/coqueret.github.io/blob/master/files/misc/PredictorData2021.xlsx?raw=true"
    temp_file <- tempfile(fileext = ".xlsx")
    download.file(url = url, destfile = temp_file, mode = "wb", quiet = TRUE)
    read_excel(temp_file, sheet = sheet)
}

data_month <- loadWorkbook_url(1)    # Dataframe for monthly data
data_quarter <- loadWorkbook_url(2)  # Dataframe for quarterly data
data_year <- loadWorkbook_url(3)     # Dataframe for annual data

3 Mappings

Next, we code the modules. They correspond to the \(f_j\) mappings in the paper.
The chaining of mappings follows the scheme below: