1 Introduction

This notebook presents the first empirical study of the paper Forking paths in empirical studies available on SSRN.
The paper has had several rounds of review, which is why some of the material may not be present in some versions of the article. We provide everything for the sake of completeness.


The idea of the paper is to decompose any empirical analysis into a series of steps, which are pompously called mapppings in the paper.
Each step matters and can have an important impact on the outcome of the study.
Because of that, we argue that the researcher should, if possible, keep track of all the possible modelling options and present the distribution of outcomes (e.g., t-statistics or p-values) across all configurations.

Below, most code chunks are hidden to ease readability.
It is easy to access all code content by clicking on the related buttons.

2 Data

First, we test the data importation & load the libraries.
The data, downloaded from Amit Goyalโ€™s website is stored on a Github repo.
It was used in the follow-up paper A Comprehensive Look at the Empirical Performance of Equity Premium Prediction II.
Because we do not want to download the data multiple times, we do it only once, for three sheets (monthly, quarterly & yearly data).

library(tidyverse)   # Data wrangling & plotting
library(readxl)      # To read excel files
library(zoo)         # For data imputation
library(DescTools)   # For winsorization
library(tictoc)      # CPU time estimation
library(sandwich)    # HAC estimator
library(lmtest)      # Statistical inference
library(furrr)       # Parallel computing
library(xtable)      # LaTeX exports
library(patchwork)   # Plot combination
library(reshape2)    # List management

loadWorkbook_url <- function(sheet) { # Function that downloads the data from online file
  url = "https://github.com/shokru/coqueret.github.io/blob/master/files/misc/PredictorData2021.xlsx?raw=true"
  temp_file <- tempfile(fileext = ".xlsx")
  download.file(url = url, destfile = temp_file, mode = "wb", quiet = TRUE)
  read_excel(temp_file, sheet = sheet)
}

data_month <- loadWorkbook_url(1)    # Dataframe for monthly data
data_quarter <- loadWorkbook_url(2)  # Dataframe for quarterly data
data_year <- loadWorkbook_url(3)     # Dataframe for annual data

3 Mappings

Next, we code the modules. They correspond to the \(f_j\) mappings in the paper.
The chaining of mappings follows the scheme below: