Time-series & cryptocurrencies

WARNING: this notebook relies on several R packages. Whenever one is called (for instance via a call such as “library(torch)”), this implies that the package has been installed, e.g., via “install.packages(”torch”)”. Hence, make sure the following packages are installed!

# install.packages("tidyverse") 
# install.packages(c("forecast", "quantmod", "crypto2", "RcppRoll", "rugarch", "fGarch"))
# install.package(c("fable", "tsibble", "feasts", "keras", "highcharter", "plotly"))

library(tidyverse)

Crash course on time-series

Definitions & notations

Time-series are random variables indexed by time. Usually, they are written $x_t$, or $X_t$ or $y_t$ or similar notations of this kind. The index $t$ refers to time. The essential question in time-series is to determine the law of motion of the process, as well as its characteristics.

Most often, the process integrates some form of randomness, which is usually written $e_t$, $\epsilon_t$ or $\varepsilon_t$. There are two common simplifying assumptions that are made with respect to these quantities, which are sometimes referred to as innovations or residuals:

they can be normally distribution (with Gaussian law);
they are independent from each-other.

NOTE: in almost all models, innovations have zero means, $\mathbb{E}[e_t]=0$, hence in the case of Gaussian innovations, only the variance $\sigma^2$ is specified. We will work under these (standard) assumptions below. Let’s generate a series of $e_t and plot it below.

N <- 120
sigma <- 0.2
x <- rnorm(N, mean = 0, sd = sigma)
data.frame(t =1:N, x = x) |>
  ggplot(aes(x = t, y = x)) + geom_line()

mean(x)

[1] 0.02531184

sd(x)

[1] 0.2129533

Examples

The random walk

The simplest case is the white noise: \[x_t= e_t,\]

but this is not very interesting. A second important case is linked to the notion of random walk: \[x_t=x_{t-1}+e_t,\] and, iterating, we see that we have $x_t=\sum_{s=0}^t e_s$ - where we assume a starting point at $t=0$. Notably, we have \[\mathbb{E}[x_t]=\mathbb{E}\left[\sum_{s=0}^t e_s\right]=\sum_{s=0}^t\mathbb{E}\left[ e_s\right]=0.\]

Moreover, if the innovations are independent,

\[\mathbb{V}[x_t]=\mathbb{V}\left[\sum_{s=0}^t e_s\right]=\sum_{s=0}^t\mathbb{V}\left[ e_s\right]=t\sigma^2.\]

Hence, the variance increases with time - linearly. If perturbations are not independent, then it’s a bit more complicated and the variance will depend on all auto-covariances.

The covariance between two points in time is given by:

\[\mathbb{C}ov[x_t,x_{t+h}]=t\sigma^2, \quad h \ge 0.\] This is because all innovation terms are independent, hence the only ones that are non-null are the $t$ terms of the form $\mathbb{C}ov[x_s,x_s]=\sigma$.

Hence, we see that both the variance and auto-covariance depend on $t$, say, the present time. This makes analysis complicated, because, depending on the moment we consider, the distribution of $x_t$ will change.

Let’s simulate one trajectory.

set.seed(42) # This freezes the random number generator
x <- cumsum(rnorm(N, mean = 0, sd = sigma))
tibble(t = 1:N, x = x) |>
  ggplot(aes(x = t, y = x)) + geom_line()

The auto-regressive model

Let us introduce a small modification to the above process and consider: \[x_t=a+\rho x_{t-1}+e_t, \quad a \in \mathbb{R}, \quad |\rho|< 1.\] This process is called the Auto-Regressive model of order one and is written AR(1). Iterating, we have: \[x_t = a + \rho (a + \rho x_{t-2}+e_{t-1})+e_{t}\] and, until infinity… \[x_t=a \sum_{k=0}^\infty\rho^k+ \sum_{k=0}^\infty \rho^ke_{t-k}=\frac{a}{1-\rho}+ \sum_{k=0}^\infty \rho^ke_{t-k},\] which explains why we need $|\rho|<1$.

Again, we have

\[\mathbb{E}[x_t]=\frac{a}{1-\rho}, \] and when innovations are independent, \[\mathbb{V}[x_t]=\mathbb{V}\left[\sum_{k=0}^\infty \rho^ke_{t-k} \right]=\sum_{k=0}^\infty \mathbb{V}\left[\rho^ke_{t-k} \right]=\frac{\sigma^2}{1-\rho^2}.\]

In this case, we see that the mean and variance do not depend on $t$! (obvious given the infinite series)

Moreover,

\[\mathbb{C}ov[x_t,x_{t+h}]=\mathbb{C}ov\left[\sum_{k=0}^\infty \rho^ke_{t-k} ,\sum_{j=0}^\infty \rho^je_{t-j+h} \right]=\sum_{k=0}^\infty \sum_{j=0}^\infty \mathbb{C}ov\left[ \rho^ke_{t-k}, \rho^je_{t-j+h} \right]\] Again, we need to focus on the cases when the time indices are equal, so \[\mathbb{C}ov[x_t,x_{t+h}]=\sum_{k=0}^\infty \sum_{j=h}^\infty \mathbb{C}ov\left[ \rho^ke_{t-k}, \rho^{j-h}e_{t-j} \right]=\sum_{k=0}^\infty \sum_{j=h}^\infty \rho^{k+j-h}\mathbb{C}ov[e_{t-k},e_{t-j} ]\]

Thus, for $j=k$, we get \[\mathbb{C}ov[x_t,x_{t+h}]=\sigma \sum_{j=h}^\infty \rho^{2j-h}=\sigma^2 \rho^h \sum_{j=0}^\infty \rho^{2j}=\sigma^2\frac{\rho^h}{1-\rho^2}.\] Again, this does not depend on the time index $t$. If innovations are Gaussian, as Gaussian distributions are entirely defined by their first two moments, this means that the full (unconditional) distribution of the process is invariant. This very convenient property is called stationarity, and comes in various flavors (have a look on the net!). It is very suitable for analysis because now we know that the object that we study (the underlying distribution) is stable in time.

The above quantity, which measures the link between different realizations of the same variable (at different points in time) is very important. Often, the covariance is scaled by the variance to yield the auto-correlation of the process. Relatedly, there is the auto-correlogram, which is defined, for stationary series, as

\[\gamma(h)=\mathbb{C}orr[x_t,x_{t+h}].\] For the AR(1), $\gamma(h)=\rho^h$.

More generally, autoregressive models of higher orders (AR(p)) have memory that goes back further in time: \[x_t=a+\sum_{k=1}^p\rho_k x_{t-k}+e_t, \quad a \in \mathbb{R},\]

Even more generally, there are ARMA(p,q) processes, defined as \[x_t=a+\sum_{k=1}^p\rho_k x_{t-k} +\sum_{j=1}^q\delta_je_{t-j}+e_t, \quad a \in \mathbb{R},\]

but more in these is left to your curiosity.

Let’s simulate two such processes. We refer to the ARIMA page for details on the implementation. The order is $(p,d,q)$, like in the ARMA(p,q) process (the $d$ is for integration, a notion out of the scope of this course).

x_05 <- arima.sim(list(order = c(1,0,0), ar=.05), n = N)
x_95 <- arima.sim(list(order = c(1,0,0), ar=.95), n = N)
tibble(time = 1:N, x_05 = x_05, x_95 = x_95) |>
  pivot_longer(-time, names_to = "process", values_to = "values") |>
  ggplot(aes(x = time, y = values, color = process)) + geom_line() +
  theme_bw() +
  scale_color_manual(values = c("#DD9911", "#0066AA"))

One of the processes seems to be just white noise, the other evolves more slowly and oscillates much less.

Let’s have a look at the auto-correlogram of the series.

library(forecast) # For the ggAcf() function
ggAcf(x_95) + theme_bw() + 
  geom_function(fun = function(x) exp(x*log(0.95)), color = "#11DD33")

The values we extract from the sample are not exactly those predicted from the model. This comes from size of the sample, which is too small. As it increases, we would obtain a perfect fit (you can try!).

ARCH models

In practice simple autoregressive models can be too limited because some variables are not exactly stationary. To illustrate this statement, let’s have a look at a volatility index, the VIX. NOTE: usually, the volatility is the standard deviation of past returns. The VIX is slightly different, as it is based on forward-looking option prices. Still, it is linked (empirically) to the traditional volatility.

library(quantmod)
library(highcharter)
min_date <- "2000-01-01"                 # Starting date
max_date <- "2023-01-27"                 # Ending date
getSymbols("^VIX", src = 'yahoo',        # The data comes from Yahoo Finance
           from = min_date,              # Start date
           to = max_date,                # End date
           auto.assign = TRUE, 
           warnings = FALSE)

[1] "^VIX"

VIX <- VIX |> as.data.frame() |>
  rownames_to_column("date") |>
  mutate(date = as.Date(date))
head(VIX,7)                                   # Have a look at the result!

date	VIX.Open	VIX.High	VIX.Low	VIX.Close	VIX.Adjusted
2000-01-03	24.36	26.15	23.98	24.21	24.21
2000-01-04	24.94	27.18	24.80	27.01	27.01
2000-01-05	27.98	29.00	25.85	26.41	26.41
2000-01-06	26.68	26.71	24.70	25.73	25.73
2000-01-07	25.14	25.17	21.72	21.72	21.72
2000-01-10	21.89	22.49	21.36	21.71	21.71
2000-01-11	21.98	22.63	21.69	22.50	22.50

vix_chart <- VIX |> select(VIX.Close)
rownames(vix_chart) <- VIX |> pull(date)
highchart(type = "stock") %>%
    hc_title(text = "Evolution of the VIX") %>%
    hc_add_series(as.xts(vix_chart)) %>%
    hc_tooltip(pointFormat = '{point.y:.3f}')

Hence, we see that the series exhibits some moments of extreme activity and its distribution is not the same through time. This is referred to as “clusters of volatility”.

Therefore, we also need to introduce models that allow for this type of feature. In 1982, Robert Engle proposed such a model, called ARCH, which was generalized in 1986 to GARCH models. As we show below, there is a direct link with auto-regressive processes!

The idea is to work with the innovations, $e_t$ and to specify them a bit differently, i.e., like this: $e_t=\sigma_t z_t$, where $\sigma_t$ is going to be a changing standard deviation and $z_t$ is a simple white noise. What matters is that the vol term is modelled as \[\sigma^2_t = a+\sum_{j=1}^pd_j\sigma^2_{t-j} + \sum_{k=1}^qb_ke^2_{t-k},\] hence, the value of $\sigma_t^2$ depends both on its past and on the past of squared innovations.

References

For time-series

There many resources available online. We single out two below:

For cryptocurrencies

Application to cryptocurrencies

For simplicity, we will work with daily data. Higher frequencies can be obtained outside the crypto space via the highfrequency package. Or playing with dedicated APIs (e.g. with packages: binance/coinmarketcap).

Fetching data

We will use the crypto2 package below. If need be: install it.

library(lubridate)
library(crypto2)

First, let’s have a look at the list of available coins… which are numerous!

coins <- crypto_list()

You can have a look at the info for the coins via the commented code below.

c_info <- crypto_info(coin_list = coins, limit = 30)

❯ Scraping crypto info

Scraping  https://web-api.coinmarketcap.com/v1/cryptocurrency/info?id=1,2,3,4,5,6,8,10,13,16,18,25,35,42,43,45,52,53,56,61,64,66,67,69,72,74,77,78,80,83  with  142  characters!
❯ Processing crypto info

Next, we can download historical quotes.

coin_symb <- c("BTC", "ETH", "DOGE", "USDT", "BNB", "USDC", "XRP")
coin_hist <- crypto_history(coins |> filter(symbol %in% coin_symb),
                            start_date = "20170101",
                            end_date = "20230325")

❯ Scraping historical crypto data
❯ Processing historical crypto data

coin_hist <- coin_hist |>  # Timestamps are at midnight, hence we add a day.
  mutate(date = 1 + as.Date(as.POSIXct(timestamp, origin="1970-01-01")))

And plot them. NOTE: mind the log-scale for the y-axis.
Also, we use plotly below for a better experience.

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

g <- coin_hist |>
  ggplot(aes(x = date, y = close, color = name)) + geom_line() +
  scale_y_log10() + theme_bw() + xlab("")
ggplotly(g)

Now, the problem with some of these series is that they are NOT stationary. At the beginning of the sample, their values are much lower to those at th end of the sample. This is one of the reasons why in finance, we look at returns: \[r_t=\frac{p_t-p_{t-1}}{p_{t-1}},\] which are simply relative price changes.
Let’s compute them below. We recall that prices are gathered in the close column.

coin_hist <- coin_hist |> 
  group_by(name) |>
  mutate(return = close / lag(close) - 1 )

And have a look at their distributions.

coin_hist |>
  ggplot(aes(x = return, fill = name)) + geom_histogram() +
  facet_wrap(vars(name), scales = "free") + theme_bw() +
  theme(axis.title = element_blank())

There seem to be outliers (to the left or right of the most common values)!

Estimations

Ok, so now let’s try to link what we have seen in the theoretical part with the data. This step is called estimation. Basically, if we think that the data follows $x_t=bx_{t-1}+e_t$, we need to find $b$!

Returns

First, let’s have a look a autocorrelograms (of returns).

crypto_acf <-
  coin_hist |> 
  na.omit() %>%
  split(.$name) %>% 
  map(~acf(.$return, plot = F)) %>% 
  map_dfr(
    ~data.frame(lag = .$lag, 
                acf = .$acf,
                ci = qnorm(0.975)/sqrt(.$n.used)
    ), .id = "name") 
crypto_acf |> 
  filter(lag > 0, lag < 6) |>
  ggplot(aes(x = lag, y = acf, fill = name)) + 
  geom_col(position = "dodge") + theme_bw()

But recall that some of the coins are stable!
Overall, however, there does not seem to be a strong memory pattern. Hence, daily returns appear to be mostly white noise.

Realized volatility

Then, let’s have a look at volatility. It’s measured as

\[v_t=\sqrt{\frac{1}{T-1}\sum_{s=1}^T(r_{t-s}-\bar{r})^2},\] where $\bar{r}$ is the mean return in the sample of $T$ observations. Below, we use an optimized (C++-based function) to compute it via rolling standard deviation over 20 days.

library(RcppRoll)
coin_hist <- coin_hist |>
  group_by(name) |>
  na.omit() |>
  mutate(real_vol_20 = roll_sd(return, n = 20, fill = NA, na.rm = T))

coin_hist |>
  filter(name == "Bitcoin") |>
  pivot_longer(all_of(c("close", "real_vol_20")), names_to = "series", values_to = "value") |>
  ggplot(aes(x = date, y = value, color = series)) + geom_line() + theme_bw() +
  facet_wrap(vars(series), nrow = 2, scales = "free")# + scale_y_log10()

Warning: Removed 19 rows containing missing values (`geom_line()`).

Now let’s have a look a the autocorrelation of the volatility: it’s highly persistent.

crypto_acf <-
  coin_hist |> 
  na.omit() %>%
  split(.$name) %>% 
  map(~acf(.$real_vol_20, plot = F)) %>% 
  map_dfr(
    ~data.frame(lag = .$lag, 
                acf = .$acf,
                ci = qnorm(0.975)/sqrt(.$n.used)
    ), .id = "name") 
crypto_acf |> 
  filter(lag > 0, lag < 15) |>
  ggplot(aes(x = lag, y = acf, fill = name)) + 
  geom_col(position = "dodge", alpha = 0.8) + theme_bw() +
  scale_fill_viridis_d(option = "magma", end = 0.9, begin = 0.1)

GARCH estimation

There are many packages for GARCH estimation. We propose two: ruGARCH and fGARCH.

library(rugarch)
library(fGarch)

In fGARCH, the model is:

\[s_t = \omega + \alpha e_{t-1} + \beta s_{t-1}.\] In the output, there are some additional parameters that allow to propose a more general model:

shape is the shape parameter of the conditional distribution.
skew is the skewness parameter of the conditional distribution.
delta a numeric value, the exponent delta of the variance recursion.

garchFit(formula = ~garch(1,1), 
         data = coin_hist |> 
           filter(name == "Bitcoin") |> 
           na.omit() |>
           pull(return))


Series Initialization:
 ARMA Model:                arma
 Formula Mean:              ~ arma(0, 0)
 GARCH Model:               garch
 Formula Variance:          ~ garch(1, 1)
 ARMA Order:                0 0
 Max ARMA Order:            0
 GARCH Order:               1 1
 Max GARCH Order:           1
 Maximum Order:             1
 Conditional Dist:          norm
 h.start:                   2
 llh.start:                 1
 Length of Series:          2255
 Recursion Init:            mci
 Series Scale:              0.04020018

Parameter Initialization:
 Initial Parameters:          $params
 Limits of Transformations:   $U, $V
 Which Parameters are Fixed?  $includes
 Parameter Matrix:
                     U           V     params includes
    mu     -0.56577614   0.5657761 0.05657761     TRUE
    omega   0.00000100 100.0000000 0.10000000     TRUE
    alpha1  0.00000001   1.0000000 0.10000000     TRUE
    gamma1 -0.99999999   1.0000000 0.10000000    FALSE
    beta1   0.00000001   1.0000000 0.80000000     TRUE
    delta   0.00000000   2.0000000 2.00000000    FALSE
    skew    0.10000000  10.0000000 1.00000000    FALSE
    shape   1.00000000  10.0000000 4.00000000    FALSE
 Index List of Parameters to be Optimized:
    mu  omega alpha1  beta1 
     1      2      3      5 
 Persistence:                  0.9 


--- START OF TRACE ---
Selected Algorithm: nlminb 

R coded nlminb Solver: 

  0:     3074.0720: 0.0565776 0.100000 0.100000 0.800000
  1:     3072.8588: 0.0565791 0.0899618 0.104616 0.799261
  2:     3071.6500: 0.0565807 0.0890091 0.113466 0.805848
  3:     3070.5888: 0.0565831 0.0786451 0.117301 0.806547
  4:     3070.1457: 0.0565886 0.0738245 0.119106 0.816350
  5:     3069.6613: 0.0565955 0.0699463 0.111191 0.823052
  6:     3069.5469: 0.0566059 0.0697466 0.103940 0.831416
  7:     3069.3578: 0.0566291 0.0619124 0.102089 0.839009
  8:     3069.3234: 0.0566293 0.0626464 0.102319 0.839517
  9:     3069.3101: 0.0566402 0.0629910 0.101604 0.839091
 10:     3069.3094: 0.0566405 0.0631617 0.101564 0.839207
 11:     3069.3080: 0.0566462 0.0630812 0.101398 0.839201
 12:     3069.3068: 0.0566616 0.0631353 0.101201 0.839447
 13:     3069.3051: 0.0566998 0.0628995 0.100884 0.839757
 14:     3069.3042: 0.0567443 0.0628645 0.100771 0.840032
 15:     3069.3036: 0.0567915 0.0628381 0.100667 0.840032
 16:     3069.2974: 0.0580855 0.0628640 0.100099 0.840539
 17:     3069.2968: 0.0582250 0.0625147 0.100020 0.840897
 18:     3069.2966: 0.0583666 0.0626168 0.100177 0.840714
 19:     3069.2966: 0.0583788 0.0626341 0.100165 0.840629
 20:     3069.2965: 0.0583851 0.0626616 0.100168 0.840648
 21:     3069.2965: 0.0583978 0.0626412 0.100120 0.840684
 22:     3069.2962: 0.0586564 0.0626042 0.100145 0.840705
 23:     3069.2962: 0.0589148 0.0627866 0.100026 0.840590
 24:     3069.2961: 0.0589246 0.0626123 0.0999245 0.840894
 25:     3069.2961: 0.0589191 0.0626087 0.100001 0.840823
 26:     3069.2961: 0.0589111 0.0626286 0.0999955 0.840802
 27:     3069.2961: 0.0589041 0.0626419 0.100010 0.840779
 28:     3069.2961: 0.0589050 0.0626392 0.100005 0.840783

Final Estimate of the Negative LLH:
 LLH:  -4178.012    norm LLH:  -1.852777 
          mu        omega       alpha1        beta1 
0.0023679905 0.0001012283 0.1000053730 0.8407829550 

R-optimhess Difference Approximated Hessian Matrix:
                 mu        omega        alpha1         beta1
mu     -1773500.423     -1209415     -2088.352     -3757.959
omega  -1209414.780 -30217659139 -23901412.544 -36761870.026
alpha1    -2088.352    -23901413    -33060.998    -37498.318
beta1     -3757.959    -36761870    -37498.318    -52055.985
attr(,"time")
Time difference of 0.007368088 secs

--- END OF TRACE ---


Time to Estimate Parameters:
 Time difference of 0.04829717 secs


Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(1, 1), data = pull(na.omit(filter(coin_hist, 
    name == "Bitcoin")), return)) 

Mean and Variance Equation:
 data ~ garch(1, 1)
<environment: 0x11b9f8000>
 [data = pull(na.omit(filter(coin_hist, name == "Bitcoin")), return)]

Conditional Distribution:
 norm 

Coefficient(s):
        mu       omega      alpha1       beta1  
0.00236799  0.00010123  0.10000537  0.84078296  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)    
mu     0.0023680   0.0007512    3.152  0.00162 ** 
omega  0.0001012   0.0000178    5.686 1.30e-08 ***
alpha1 0.1000054   0.0149331    6.697 2.13e-11 ***
beta1  0.8407830   0.0207521   40.516  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log Likelihood:
 4178.012    normalized:  1.852777 

Description:
 Sun Apr 23 18:44:01 2023 by user:

Now, with rugarch…

spec <- ugarchspec()
ugarchfit(data = coin_hist |> 
            filter(name == "Bitcoin") |> 
            na.omit() |>
            pull(return), 
          spec = spec)


*---------------------------------*
*          GARCH Model Fit        *
*---------------------------------*

Conditional Variance Dynamics   
-----------------------------------
GARCH Model : sGARCH(1,1)
Mean Model  : ARFIMA(1,0,1)
Distribution    : norm 

Optimal Parameters
------------------------------------
        Estimate  Std. Error  t value Pr(>|t|)
mu      0.002355    0.000744  3.16358 0.001558
ar1    -0.507177    0.565593 -0.89672 0.369871
ma1     0.492086    0.571224  0.86146 0.388985
omega   0.000101    0.000018  5.67344 0.000000
alpha1  0.099950    0.014962  6.68036 0.000000
beta1   0.840838    0.020820 40.38513 0.000000

Robust Standard Errors:
        Estimate  Std. Error  t value Pr(>|t|)
mu      0.002355    0.000817   2.8817 0.003955
ar1    -0.507177    0.302363  -1.6774 0.093469
ma1     0.492086    0.304918   1.6138 0.106564
omega   0.000101    0.000034   3.0127 0.002589
alpha1  0.099950    0.027631   3.6173 0.000298
beta1   0.840838    0.034147  24.6240 0.000000

LogLikelihood : 4178.258 

Information Criteria
------------------------------------
                    
Akaike       -3.7005
Bayes        -3.6852
Shibata      -3.7005
Hannan-Quinn -3.6949

Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
                        statistic   p-value
Lag[1]                      4.894 2.695e-02
Lag[2*(p+q)+(p+q)-1][5]     8.024 8.541e-09
Lag[4*(p+q)+(p+q)-1][9]    10.105 8.312e-03
d.o.f=2
H0 : No serial correlation

Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
                        statistic p-value
Lag[1]                     0.1105  0.7396
Lag[2*(p+q)+(p+q)-1][5]    1.9035  0.6414
Lag[4*(p+q)+(p+q)-1][9]    2.8555  0.7826
d.o.f=2

Weighted ARCH LM Tests
------------------------------------
            Statistic Shape Scale P-Value
ARCH Lag[3]     1.078 0.500 2.000  0.2992
ARCH Lag[5]     2.446 1.440 1.667  0.3808
ARCH Lag[7]     2.733 2.315 1.543  0.5654

Nyblom stability test
------------------------------------
Joint Statistic:  0.9403
Individual Statistics:              
mu     0.28365
ar1    0.16215
ma1    0.16032
omega  0.09247
alpha1 0.17441
beta1  0.13575

Asymptotic Critical Values (10% 5% 1%)
Joint Statistic:         1.49 1.68 2.12
Individual Statistic:    0.35 0.47 0.75

Sign Bias Test
------------------------------------
                   t-value   prob sig
Sign Bias           0.3367 0.7364    
Negative Sign Bias  0.3295 0.7418    
Positive Sign Bias  0.6874 0.4919    
Joint Effect        1.8233 0.6099    


Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
  group statistic p-value(g-1)
1    20     264.5    3.604e-45
2    30     295.7    5.775e-46
3    40     304.1    8.974e-43
4    50     326.2    1.325e-42


Elapsed time : 0.162442

The results are not exactly the same!

Forecasting

Of course, when it comes to cryptos, one of the most interesting (and hardest) topic is prediction. To show how tough this is, we will rely on the fable package.
It’s highly integrated and performs many analyses automatically. Below, we use an ARIMA model for the forecast, see the reference for the parameters: https://fable.tidyverts.org/reference/ARIMA.html Beyond ARIMA, the methods available are ETS (exponential smoothing state space), TSLM (time-series linear model, $y_t=a+bx_t$), NNETAR (neural network autoregression, $y_t=f(y_{t-1})$).

Below, we leave the function pick the optimal orders via the function pdq(). But orders can be specified (see below).

library(fable)
library(tsibble)
library(feasts)
coin_hist |>
  na.omit() |>
  filter(symbol == "ETH") |>
  tsibble(index = date) |>
  model(
    arima = ARIMA(return ~ 1 + pdq()),
    snaive = SNAIVE(return)
  ) |>
  forecast(h = 15) |>
  autoplot() + theme_bw() + facet_grid(vars(.model))

Let’s have a look at the estimated coefficients from models.

coin_hist |>
  na.omit() |>
  filter(symbol == "ETH") |>
  tsibble(index = date) |>
  model(
    arima = ARIMA(return ~ 1 + pdq()),
    snaive = SNAIVE(return)
  ) |>
  coef()

.model	term	estimate	std.error	statistic	p.value
arima	ar1	-0.5343421	0.0178175	-29.9896975	0.0000000
arima	sar1	-0.0252163	0.0210651	-1.1970613	0.2314085
arima	constant	0.0000123	0.0013710	0.0089548	0.9928560

Note: sar is an auto-regressive component with seasonality.
We can specify a particular set of orders p, d and q.

coin_hist |>
  na.omit() |>
  filter(symbol == "ETH") |>
  tsibble(index = date) |>
  model(
    arima = ARIMA(return ~ 1 + pdq(1,0,1)),
    snaive = SNAIVE(return)
  ) |>
  coef()

.model	term	estimate	std.error	statistic	p.value
arima	ar1	-0.7941743	0.1482942	-5.355396	0.0000001
arima	ma1	0.7680584	0.1548280	4.960718	0.0000008
arima	constant	0.0066511	0.0020068	3.314320	0.0009332

Prediction with LSTM (RNN)

First, because we are using an M1 Mac, we must resort to a trick to make tensorflow work. See the following post if you are also on a recent Mac. We also build the time-series from which to learn.

library(reticulate)
use_condaenv("r-tensorflow") # This is for M1 Macs with special conda environment
library(TSLSTM)

data_coin <- coin_hist |> 
  select(timestamp, symbol, close, volume) |>
  filter(symbol %in% c("BTC", "ETH")) |>
  group_by(symbol) |>
  mutate(date = as.Date(timestamp),
         return = close / lag(close) - 1,
         vol_change = volume / lag(volume) - 1) |>
  ungroup() |>
  select(date, symbol, return, vol_change) |>
  pivot_wider(names_from = "symbol", values_from = c("return", "vol_change")) |>
  mutate(return_ETH = lead(return_ETH)) |> # We forward the ETH return, as we want to predict it
  na.omit()

Adding missing grouping variables: `name`

train_coin <- data_coin |> filter(date < "2022-06-30")
test_coin <- data_coin |> filter(date > "2022-06-30")

The idea is to predict ETH return with past BTC return, past ETH volume change and past BTC volume change.

library(keras)


Attaching package: 'keras'

The following object is masked from 'package:fabletools':

    new_model_class

model_rnn <- keras_model_sequential() %>% 
  layer_gru(units = 9,                                     # Nb units in hidden layer
            batch_input_shape = c(1, nrow(train_coin), 3), # Dimensions = tricky part!
            activation = 'tanh',                           # Activation function
            return_sequences = TRUE) %>%                   # Return all the sequence
  layer_dense(units = 1)                                   # Final aggregation layer
model_rnn %>% compile(
  loss = 'mean_squared_error',                             # Loss = quadratic
  optimizer = optimizer_rmsprop(),                         # Backprop
  metrics = c('mean_absolute_error')                       # Output metric MAE
)

Onto training!

fit_rnn <- model_rnn %>% 
  fit(x = train_coin |>                                           # Training features   
        select(return_BTC, vol_change_BTC, vol_change_ETH) |> 
        data.matrix() |>
        array(dim = c(1, nrow(train_coin), 3)),        
      y = train_coin |> pull(return_ETH) |> 
        array(dim  = c(1,nrow(train_coin))),                      # Training labels
      epochs = 7,                                                 # Number of rounds
      batch_size = 1 ,                                            # Length of sequences
      verbose = 0)                                                # No comments
plot(fit_rnn) + theme_bw() + geom_point(color = "black") + geom_line()

Now, unfortunately, we cannot use the fitted model as such because the dimensions between the training and testing sets are not the same. We have to create a new model and import the old model’s weights…

new_model <- keras_model_sequential() %>% 
  layer_gru(units = 9, 
            batch_input_shape = c(1,              # New dimensions
                                  nrow(test_coin), 
                                  3), 
            activation = 'tanh',                      # Activation function
            return_sequences = TRUE) %>%              # Return the full sequence
  layer_dense(units = 1)                              # Output dimension
new_model %>% keras::set_weights(keras::get_weights(model_rnn))

We are now ready.

preds_rnn <- predict(new_model, 
        test_coin |> 
          select(return_BTC, vol_change_BTC, vol_change_ETH) |> 
          data.matrix() |>
          array(dim = c(1, nrow(test_coin), 3)))
mean(abs(preds_rnn - test_coin |> pull(return_ETH) ))

[1] 0.04693831

Note: the result, because of the randomization of weights, is also random.
To be compared with the training metric…