# install.packages("tidyverse")
# install.packages(c("forecast", "quantmod", "crypto2", "RcppRoll", "rugarch", "fGarch"))
# install.package(c("fable", "tsibble", "feasts", "keras", "highcharter", "plotly"))
Time-series & cryptocurrencies
WARNING: this notebook relies on several R packages. Whenever one is called (for instance via a call such as “library(torch)”), this implies that the package has been installed, e.g., via “install.packages(”torch”)”. Hence, make sure the following packages are installed!
library(tidyverse)
Crash course on time-series
Definitions & notations
Time-series are random variables indexed by time. Usually, they are written \(x_t\), or \(X_t\) or \(y_t\) or similar notations of this kind. The index \(t\) refers to time. The essential question in time-series is to determine the law of motion of the process, as well as its characteristics.
Most often, the process integrates some form of randomness, which is usually written \(e_t\), \(\epsilon_t\) or \(\varepsilon_t\). There are two common simplifying assumptions that are made with respect to these quantities, which are sometimes referred to as innovations or residuals:
- they can be normally distribution (with Gaussian law);
- they are independent from each-other.
NOTE: in almost all models, innovations have zero means, \(\mathbb{E}[e_t]=0\), hence in the case of Gaussian innovations, only the variance \(\sigma^2\) is specified. We will work under these (standard) assumptions below. Let’s generate a series of $e_t and plot it below.
<- 120
N <- 0.2
sigma <- rnorm(N, mean = 0, sd = sigma)
x data.frame(t =1:N, x = x) |>
ggplot(aes(x = t, y = x)) + geom_line()
mean(x)
[1] 0.02531184
sd(x)
[1] 0.2129533
Examples
The random walk
The simplest case is the white noise: \[x_t= e_t,\]
but this is not very interesting. A second important case is linked to the notion of random walk: \[x_t=x_{t-1}+e_t,\] and, iterating, we see that we have \(x_t=\sum_{s=0}^t e_s\) - where we assume a starting point at \(t=0\). Notably, we have \[\mathbb{E}[x_t]=\mathbb{E}\left[\sum_{s=0}^t e_s\right]=\sum_{s=0}^t\mathbb{E}\left[ e_s\right]=0.\]
Moreover, if the innovations are independent,
\[\mathbb{V}[x_t]=\mathbb{V}\left[\sum_{s=0}^t e_s\right]=\sum_{s=0}^t\mathbb{V}\left[ e_s\right]=t\sigma^2.\]
Hence, the variance increases with time - linearly. If perturbations are not independent, then it’s a bit more complicated and the variance will depend on all auto-covariances.
The covariance between two points in time is given by:
\[\mathbb{C}ov[x_t,x_{t+h}]=t\sigma^2, \quad h \ge 0.\] This is because all innovation terms are independent, hence the only ones that are non-null are the \(t\) terms of the form \(\mathbb{C}ov[x_s,x_s]=\sigma\).
Hence, we see that both the variance and auto-covariance depend on \(t\), say, the present time. This makes analysis complicated, because, depending on the moment we consider, the distribution of \(x_t\) will change.
Let’s simulate one trajectory.
set.seed(42) # This freezes the random number generator
<- cumsum(rnorm(N, mean = 0, sd = sigma))
x tibble(t = 1:N, x = x) |>
ggplot(aes(x = t, y = x)) + geom_line()
The auto-regressive model
Let us introduce a small modification to the above process and consider: \[x_t=a+\rho x_{t-1}+e_t, \quad a \in \mathbb{R}, \quad |\rho|< 1.\] This process is called the Auto-Regressive model of order one and is written AR(1). Iterating, we have: \[x_t = a + \rho (a + \rho x_{t-2}+e_{t-1})+e_{t}\] and, until infinity… \[x_t=a \sum_{k=0}^\infty\rho^k+ \sum_{k=0}^\infty \rho^ke_{t-k}=\frac{a}{1-\rho}+ \sum_{k=0}^\infty \rho^ke_{t-k},\] which explains why we need \(|\rho|<1\).
Again, we have
\[\mathbb{E}[x_t]=\frac{a}{1-\rho}, \] and when innovations are independent, \[\mathbb{V}[x_t]=\mathbb{V}\left[\sum_{k=0}^\infty \rho^ke_{t-k} \right]=\sum_{k=0}^\infty \mathbb{V}\left[\rho^ke_{t-k} \right]=\frac{\sigma^2}{1-\rho^2}.\]
In this case, we see that the mean and variance do not depend on \(t\)! (obvious given the infinite series)
Moreover,
\[\mathbb{C}ov[x_t,x_{t+h}]=\mathbb{C}ov\left[\sum_{k=0}^\infty \rho^ke_{t-k} ,\sum_{j=0}^\infty \rho^je_{t-j+h} \right]=\sum_{k=0}^\infty \sum_{j=0}^\infty \mathbb{C}ov\left[ \rho^ke_{t-k}, \rho^je_{t-j+h} \right]\] Again, we need to focus on the cases when the time indices are equal, so \[\mathbb{C}ov[x_t,x_{t+h}]=\sum_{k=0}^\infty \sum_{j=h}^\infty \mathbb{C}ov\left[ \rho^ke_{t-k}, \rho^{j-h}e_{t-j} \right]=\sum_{k=0}^\infty \sum_{j=h}^\infty \rho^{k+j-h}\mathbb{C}ov[e_{t-k},e_{t-j} ]\]
Thus, for \(j=k\), we get \[\mathbb{C}ov[x_t,x_{t+h}]=\sigma \sum_{j=h}^\infty \rho^{2j-h}=\sigma^2 \rho^h \sum_{j=0}^\infty \rho^{2j}=\sigma^2\frac{\rho^h}{1-\rho^2}.\] Again, this does not depend on the time index \(t\). If innovations are Gaussian, as Gaussian distributions are entirely defined by their first two moments, this means that the full (unconditional) distribution of the process is invariant. This very convenient property is called stationarity, and comes in various flavors (have a look on the net!). It is very suitable for analysis because now we know that the object that we study (the underlying distribution) is stable in time.
The above quantity, which measures the link between different realizations of the same variable (at different points in time) is very important. Often, the covariance is scaled by the variance to yield the auto-correlation of the process. Relatedly, there is the auto-correlogram, which is defined, for stationary series, as
\[\gamma(h)=\mathbb{C}orr[x_t,x_{t+h}].\] For the AR(1), \(\gamma(h)=\rho^h\).
More generally, autoregressive models of higher orders (AR(p)) have memory that goes back further in time: \[x_t=a+\sum_{k=1}^p\rho_k x_{t-k}+e_t, \quad a \in \mathbb{R},\]
Even more generally, there are ARMA(p,q) processes, defined as \[x_t=a+\sum_{k=1}^p\rho_k x_{t-k} +\sum_{j=1}^q\delta_je_{t-j}+e_t, \quad a \in \mathbb{R},\]
but more in these is left to your curiosity.
Let’s simulate two such processes. We refer to the ARIMA page for details on the implementation. The order is \((p,d,q)\), like in the ARMA(p,q) process (the \(d\) is for integration, a notion out of the scope of this course).
<- arima.sim(list(order = c(1,0,0), ar=.05), n = N)
x_05 <- arima.sim(list(order = c(1,0,0), ar=.95), n = N)
x_95 tibble(time = 1:N, x_05 = x_05, x_95 = x_95) |>
pivot_longer(-time, names_to = "process", values_to = "values") |>
ggplot(aes(x = time, y = values, color = process)) + geom_line() +
theme_bw() +
scale_color_manual(values = c("#DD9911", "#0066AA"))
One of the processes seems to be just white noise, the other evolves more slowly and oscillates much less.
Let’s have a look at the auto-correlogram of the series.
library(forecast) # For the ggAcf() function
ggAcf(x_95) + theme_bw() +
geom_function(fun = function(x) exp(x*log(0.95)), color = "#11DD33")
The values we extract from the sample are not exactly those predicted from the model. This comes from size of the sample, which is too small. As it increases, we would obtain a perfect fit (you can try!).
ARCH models
In practice simple autoregressive models can be too limited because some variables are not exactly stationary. To illustrate this statement, let’s have a look at a volatility index, the VIX. NOTE: usually, the volatility is the standard deviation of past returns. The VIX is slightly different, as it is based on forward-looking option prices. Still, it is linked (empirically) to the traditional volatility.
library(quantmod)
library(highcharter)
<- "2000-01-01" # Starting date
min_date <- "2023-01-27" # Ending date
max_date getSymbols("^VIX", src = 'yahoo', # The data comes from Yahoo Finance
from = min_date, # Start date
to = max_date, # End date
auto.assign = TRUE,
warnings = FALSE)
[1] "^VIX"
<- VIX |> as.data.frame() |>
VIX rownames_to_column("date") |>
mutate(date = as.Date(date))
head(VIX,7) # Have a look at the result!
date | VIX.Open | VIX.High | VIX.Low | VIX.Close | VIX.Volume | VIX.Adjusted |
---|---|---|---|---|---|---|
2000-01-03 | 24.36 | 26.15 | 23.98 | 24.21 | 0 | 24.21 |
2000-01-04 | 24.94 | 27.18 | 24.80 | 27.01 | 0 | 27.01 |
2000-01-05 | 27.98 | 29.00 | 25.85 | 26.41 | 0 | 26.41 |
2000-01-06 | 26.68 | 26.71 | 24.70 | 25.73 | 0 | 25.73 |
2000-01-07 | 25.14 | 25.17 | 21.72 | 21.72 | 0 | 21.72 |
2000-01-10 | 21.89 | 22.49 | 21.36 | 21.71 | 0 | 21.71 |
2000-01-11 | 21.98 | 22.63 | 21.69 | 22.50 | 0 | 22.50 |
<- VIX |> select(VIX.Close)
vix_chart rownames(vix_chart) <- VIX |> pull(date)
highchart(type = "stock") %>%
hc_title(text = "Evolution of the VIX") %>%
hc_add_series(as.xts(vix_chart)) %>%
hc_tooltip(pointFormat = '{point.y:.3f}')
Hence, we see that the series exhibits some moments of extreme activity and its distribution is not the same through time. This is referred to as “clusters of volatility”.
Therefore, we also need to introduce models that allow for this type of feature. In 1982, Robert Engle proposed such a model, called ARCH, which was generalized in 1986 to GARCH models. As we show below, there is a direct link with auto-regressive processes!
The idea is to work with the innovations, \(e_t\) and to specify them a bit differently, i.e., like this: \(e_t=\sigma_t z_t\), where \(\sigma_t\) is going to be a changing standard deviation and \(z_t\) is a simple white noise. What matters is that the vol term is modelled as \[\sigma^2_t = a+\sum_{j=1}^pd_j\sigma^2_{t-j} + \sum_{k=1}^qb_ke^2_{t-k},\] hence, the value of \(\sigma_t^2\) depends both on its past and on the past of squared innovations.
References
For time-series
There many resources available online. We single out two below:
For cryptocurrencies
Application to cryptocurrencies
For simplicity, we will work with daily data. Higher frequencies can be obtained outside the crypto space via the highfrequency package. Or playing with dedicated APIs (e.g. with packages: binance/coinmarketcap).
Fetching data
We will use the crypto2 package below. If need be: install it.
library(lubridate)
library(crypto2)
First, let’s have a look at the list of available coins… which are numerous!
<- crypto_list() coins
You can have a look at the info for the coins via the commented code below.
<- crypto_info(coin_list = coins, limit = 30) c_info
❯ Scraping crypto info
Scraping https://web-api.coinmarketcap.com/v1/cryptocurrency/info?id=1,2,3,4,5,6,8,10,13,16,18,25,35,42,43,45,52,53,56,61,64,66,67,69,72,74,77,78,80,83 with 142 characters!
❯ Processing crypto info
Next, we can download historical quotes.
<- c("BTC", "ETH", "DOGE", "USDT", "BNB", "USDC", "XRP")
coin_symb <- crypto_history(coins |> filter(symbol %in% coin_symb),
coin_hist start_date = "20170101",
end_date = "20230325")
❯ Scraping historical crypto data
❯ Processing historical crypto data
<- coin_hist |> # Timestamps are at midnight, hence we add a day.
coin_hist mutate(date = 1 + as.Date(as.POSIXct(timestamp, origin="1970-01-01")))
And plot them. NOTE: mind the log-scale for the y-axis.
Also, we use plotly below for a better experience.
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
<- coin_hist |>
g ggplot(aes(x = date, y = close, color = name)) + geom_line() +
scale_y_log10() + theme_bw() + xlab("")
ggplotly(g)
Now, the problem with some of these series is that they are NOT stationary. At the beginning of the sample, their values are much lower to those at th end of the sample. This is one of the reasons why in finance, we look at returns: \[r_t=\frac{p_t-p_{t-1}}{p_{t-1}},\] which are simply relative price changes.
Let’s compute them below. We recall that prices are gathered in the close column.
<- coin_hist |>
coin_hist group_by(name) |>
mutate(return = close / lag(close) - 1 )
And have a look at their distributions.
|>
coin_hist ggplot(aes(x = return, fill = name)) + geom_histogram() +
facet_wrap(vars(name), scales = "free") + theme_bw() +
theme(axis.title = element_blank())
There seem to be outliers (to the left or right of the most common values)!
Estimations
Ok, so now let’s try to link what we have seen in the theoretical part with the data. This step is called estimation. Basically, if we think that the data follows \(x_t=bx_{t-1}+e_t\), we need to find \(b\)!
Returns
First, let’s have a look a autocorrelograms (of returns).
<-
crypto_acf |>
coin_hist na.omit() %>%
split(.$name) %>%
map(~acf(.$return, plot = F)) %>%
map_dfr(
~data.frame(lag = .$lag,
acf = .$acf,
ci = qnorm(0.975)/sqrt(.$n.used)
.id = "name")
), |>
crypto_acf filter(lag > 0, lag < 6) |>
ggplot(aes(x = lag, y = acf, fill = name)) +
geom_col(position = "dodge") + theme_bw()
But recall that some of the coins are stable!
Overall, however, there does not seem to be a strong memory pattern. Hence, daily returns appear to be mostly white noise.
Realized volatility
Then, let’s have a look at volatility. It’s measured as
\[v_t=\sqrt{\frac{1}{T-1}\sum_{s=1}^T(r_{t-s}-\bar{r})^2},\] where \(\bar{r}\) is the mean return in the sample of \(T\) observations. Below, we use an optimized (C++-based function) to compute it via rolling standard deviation over 20 days.
library(RcppRoll)
<- coin_hist |>
coin_hist group_by(name) |>
na.omit() |>
mutate(real_vol_20 = roll_sd(return, n = 20, fill = NA, na.rm = T))
|>
coin_hist filter(name == "Bitcoin") |>
pivot_longer(all_of(c("close", "real_vol_20")), names_to = "series", values_to = "value") |>
ggplot(aes(x = date, y = value, color = series)) + geom_line() + theme_bw() +
facet_wrap(vars(series), nrow = 2, scales = "free")# + scale_y_log10()
Warning: Removed 19 rows containing missing values (`geom_line()`).
Now let’s have a look a the autocorrelation of the volatility: it’s highly persistent.
<-
crypto_acf |>
coin_hist na.omit() %>%
split(.$name) %>%
map(~acf(.$real_vol_20, plot = F)) %>%
map_dfr(
~data.frame(lag = .$lag,
acf = .$acf,
ci = qnorm(0.975)/sqrt(.$n.used)
.id = "name")
), |>
crypto_acf filter(lag > 0, lag < 15) |>
ggplot(aes(x = lag, y = acf, fill = name)) +
geom_col(position = "dodge", alpha = 0.8) + theme_bw() +
scale_fill_viridis_d(option = "magma", end = 0.9, begin = 0.1)
GARCH estimation
There are many packages for GARCH estimation. We propose two: ruGARCH and fGARCH.
library(rugarch)
library(fGarch)
In fGARCH, the model is:
\[s_t = \omega + \alpha e_{t-1} + \beta s_{t-1}.\] In the output, there are some additional parameters that allow to propose a more general model:
- shape is the shape parameter of the conditional distribution.
- skew is the skewness parameter of the conditional distribution.
- delta a numeric value, the exponent delta of the variance recursion.
garchFit(formula = ~garch(1,1),
data = coin_hist |>
filter(name == "Bitcoin") |>
na.omit() |>
pull(return))
Series Initialization:
ARMA Model: arma
Formula Mean: ~ arma(0, 0)
GARCH Model: garch
Formula Variance: ~ garch(1, 1)
ARMA Order: 0 0
Max ARMA Order: 0
GARCH Order: 1 1
Max GARCH Order: 1
Maximum Order: 1
Conditional Dist: norm
h.start: 2
llh.start: 1
Length of Series: 2255
Recursion Init: mci
Series Scale: 0.04020018
Parameter Initialization:
Initial Parameters: $params
Limits of Transformations: $U, $V
Which Parameters are Fixed? $includes
Parameter Matrix:
U V params includes
mu -0.56577614 0.5657761 0.05657761 TRUE
omega 0.00000100 100.0000000 0.10000000 TRUE
alpha1 0.00000001 1.0000000 0.10000000 TRUE
gamma1 -0.99999999 1.0000000 0.10000000 FALSE
beta1 0.00000001 1.0000000 0.80000000 TRUE
delta 0.00000000 2.0000000 2.00000000 FALSE
skew 0.10000000 10.0000000 1.00000000 FALSE
shape 1.00000000 10.0000000 4.00000000 FALSE
Index List of Parameters to be Optimized:
mu omega alpha1 beta1
1 2 3 5
Persistence: 0.9
--- START OF TRACE ---
Selected Algorithm: nlminb
R coded nlminb Solver:
0: 3074.0720: 0.0565776 0.100000 0.100000 0.800000
1: 3072.8588: 0.0565791 0.0899618 0.104616 0.799261
2: 3071.6500: 0.0565807 0.0890091 0.113466 0.805848
3: 3070.5888: 0.0565831 0.0786451 0.117301 0.806547
4: 3070.1457: 0.0565886 0.0738245 0.119106 0.816350
5: 3069.6613: 0.0565955 0.0699463 0.111191 0.823052
6: 3069.5469: 0.0566059 0.0697466 0.103940 0.831416
7: 3069.3578: 0.0566291 0.0619124 0.102089 0.839009
8: 3069.3234: 0.0566293 0.0626464 0.102319 0.839517
9: 3069.3101: 0.0566402 0.0629910 0.101604 0.839091
10: 3069.3094: 0.0566405 0.0631617 0.101564 0.839207
11: 3069.3080: 0.0566462 0.0630812 0.101398 0.839201
12: 3069.3068: 0.0566616 0.0631353 0.101201 0.839447
13: 3069.3051: 0.0566998 0.0628995 0.100884 0.839757
14: 3069.3042: 0.0567443 0.0628645 0.100771 0.840032
15: 3069.3036: 0.0567915 0.0628381 0.100667 0.840032
16: 3069.2974: 0.0580855 0.0628640 0.100099 0.840539
17: 3069.2968: 0.0582250 0.0625147 0.100020 0.840897
18: 3069.2966: 0.0583666 0.0626168 0.100177 0.840714
19: 3069.2966: 0.0583788 0.0626341 0.100165 0.840629
20: 3069.2965: 0.0583851 0.0626616 0.100168 0.840648
21: 3069.2965: 0.0583978 0.0626412 0.100120 0.840684
22: 3069.2962: 0.0586564 0.0626042 0.100145 0.840705
23: 3069.2962: 0.0589148 0.0627866 0.100026 0.840590
24: 3069.2961: 0.0589246 0.0626123 0.0999245 0.840894
25: 3069.2961: 0.0589191 0.0626087 0.100001 0.840823
26: 3069.2961: 0.0589111 0.0626286 0.0999955 0.840802
27: 3069.2961: 0.0589041 0.0626419 0.100010 0.840779
28: 3069.2961: 0.0589050 0.0626392 0.100005 0.840783
Final Estimate of the Negative LLH:
LLH: -4178.012 norm LLH: -1.852777
mu omega alpha1 beta1
0.0023679905 0.0001012283 0.1000053730 0.8407829550
R-optimhess Difference Approximated Hessian Matrix:
mu omega alpha1 beta1
mu -1773500.423 -1209415 -2088.352 -3757.959
omega -1209414.780 -30217659139 -23901412.544 -36761870.026
alpha1 -2088.352 -23901413 -33060.998 -37498.318
beta1 -3757.959 -36761870 -37498.318 -52055.985
attr(,"time")
Time difference of 0.007368088 secs
--- END OF TRACE ---
Time to Estimate Parameters:
Time difference of 0.04829717 secs
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 1), data = pull(na.omit(filter(coin_hist,
name == "Bitcoin")), return))
Mean and Variance Equation:
data ~ garch(1, 1)
<environment: 0x11b9f8000>
[data = pull(na.omit(filter(coin_hist, name == "Bitcoin")), return)]
Conditional Distribution:
norm
Coefficient(s):
mu omega alpha1 beta1
0.00236799 0.00010123 0.10000537 0.84078296
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu 0.0023680 0.0007512 3.152 0.00162 **
omega 0.0001012 0.0000178 5.686 1.30e-08 ***
alpha1 0.1000054 0.0149331 6.697 2.13e-11 ***
beta1 0.8407830 0.0207521 40.516 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Likelihood:
4178.012 normalized: 1.852777
Description:
Sun Apr 23 18:44:01 2023 by user:
Now, with rugarch…
<- ugarchspec()
spec ugarchfit(data = coin_hist |>
filter(name == "Bitcoin") |>
na.omit() |>
pull(return),
spec = spec)
*---------------------------------*
* GARCH Model Fit *
*---------------------------------*
Conditional Variance Dynamics
-----------------------------------
GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(1,0,1)
Distribution : norm
Optimal Parameters
------------------------------------
Estimate Std. Error t value Pr(>|t|)
mu 0.002355 0.000744 3.16358 0.001558
ar1 -0.507177 0.565593 -0.89672 0.369871
ma1 0.492086 0.571224 0.86146 0.388985
omega 0.000101 0.000018 5.67344 0.000000
alpha1 0.099950 0.014962 6.68036 0.000000
beta1 0.840838 0.020820 40.38513 0.000000
Robust Standard Errors:
Estimate Std. Error t value Pr(>|t|)
mu 0.002355 0.000817 2.8817 0.003955
ar1 -0.507177 0.302363 -1.6774 0.093469
ma1 0.492086 0.304918 1.6138 0.106564
omega 0.000101 0.000034 3.0127 0.002589
alpha1 0.099950 0.027631 3.6173 0.000298
beta1 0.840838 0.034147 24.6240 0.000000
LogLikelihood : 4178.258
Information Criteria
------------------------------------
Akaike -3.7005
Bayes -3.6852
Shibata -3.7005
Hannan-Quinn -3.6949
Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
statistic p-value
Lag[1] 4.894 2.695e-02
Lag[2*(p+q)+(p+q)-1][5] 8.024 8.541e-09
Lag[4*(p+q)+(p+q)-1][9] 10.105 8.312e-03
d.o.f=2
H0 : No serial correlation
Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
statistic p-value
Lag[1] 0.1105 0.7396
Lag[2*(p+q)+(p+q)-1][5] 1.9035 0.6414
Lag[4*(p+q)+(p+q)-1][9] 2.8555 0.7826
d.o.f=2
Weighted ARCH LM Tests
------------------------------------
Statistic Shape Scale P-Value
ARCH Lag[3] 1.078 0.500 2.000 0.2992
ARCH Lag[5] 2.446 1.440 1.667 0.3808
ARCH Lag[7] 2.733 2.315 1.543 0.5654
Nyblom stability test
------------------------------------
Joint Statistic: 0.9403
Individual Statistics:
mu 0.28365
ar1 0.16215
ma1 0.16032
omega 0.09247
alpha1 0.17441
beta1 0.13575
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic: 1.49 1.68 2.12
Individual Statistic: 0.35 0.47 0.75
Sign Bias Test
------------------------------------
t-value prob sig
Sign Bias 0.3367 0.7364
Negative Sign Bias 0.3295 0.7418
Positive Sign Bias 0.6874 0.4919
Joint Effect 1.8233 0.6099
Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
group statistic p-value(g-1)
1 20 264.5 3.604e-45
2 30 295.7 5.775e-46
3 40 304.1 8.974e-43
4 50 326.2 1.325e-42
Elapsed time : 0.162442
The results are not exactly the same!
Forecasting
Of course, when it comes to cryptos, one of the most interesting (and hardest) topic is prediction. To show how tough this is, we will rely on the fable package.
It’s highly integrated and performs many analyses automatically. Below, we use an ARIMA model for the forecast, see the reference for the parameters: https://fable.tidyverts.org/reference/ARIMA.html Beyond ARIMA, the methods available are ETS (exponential smoothing state space), TSLM (time-series linear model, \(y_t=a+bx_t\)), NNETAR (neural network autoregression, \(y_t=f(y_{t-1})\)).
Below, we leave the function pick the optimal orders via the function pdq(). But orders can be specified (see below).
library(fable)
library(tsibble)
library(feasts)
|>
coin_hist na.omit() |>
filter(symbol == "ETH") |>
tsibble(index = date) |>
model(
arima = ARIMA(return ~ 1 + pdq()),
snaive = SNAIVE(return)
|>
) forecast(h = 15) |>
autoplot() + theme_bw() + facet_grid(vars(.model))
Let’s have a look at the estimated coefficients from models.
|>
coin_hist na.omit() |>
filter(symbol == "ETH") |>
tsibble(index = date) |>
model(
arima = ARIMA(return ~ 1 + pdq()),
snaive = SNAIVE(return)
|>
) coef()
.model | term | estimate | std.error | statistic | p.value |
---|---|---|---|---|---|
arima | ar1 | -0.5343421 | 0.0178175 | -29.9896975 | 0.0000000 |
arima | sar1 | -0.0252163 | 0.0210651 | -1.1970613 | 0.2314085 |
arima | constant | 0.0000123 | 0.0013710 | 0.0089548 | 0.9928560 |
Note: sar is an auto-regressive component with seasonality.
We can specify a particular set of orders p, d and q.
|>
coin_hist na.omit() |>
filter(symbol == "ETH") |>
tsibble(index = date) |>
model(
arima = ARIMA(return ~ 1 + pdq(1,0,1)),
snaive = SNAIVE(return)
|>
) coef()
.model | term | estimate | std.error | statistic | p.value |
---|---|---|---|---|---|
arima | ar1 | -0.7941743 | 0.1482942 | -5.355396 | 0.0000001 |
arima | ma1 | 0.7680584 | 0.1548280 | 4.960718 | 0.0000008 |
arima | constant | 0.0066511 | 0.0020068 | 3.314320 | 0.0009332 |
Prediction with LSTM (RNN)
First, because we are using an M1 Mac, we must resort to a trick to make tensorflow work. See the following post if you are also on a recent Mac. We also build the time-series from which to learn.
library(reticulate)
use_condaenv("r-tensorflow") # This is for M1 Macs with special conda environment
library(TSLSTM)
<- coin_hist |>
data_coin select(timestamp, symbol, close, volume) |>
filter(symbol %in% c("BTC", "ETH")) |>
group_by(symbol) |>
mutate(date = as.Date(timestamp),
return = close / lag(close) - 1,
vol_change = volume / lag(volume) - 1) |>
ungroup() |>
select(date, symbol, return, vol_change) |>
pivot_wider(names_from = "symbol", values_from = c("return", "vol_change")) |>
mutate(return_ETH = lead(return_ETH)) |> # We forward the ETH return, as we want to predict it
na.omit()
Adding missing grouping variables: `name`
<- data_coin |> filter(date < "2022-06-30")
train_coin <- data_coin |> filter(date > "2022-06-30") test_coin
The idea is to predict ETH return with past BTC return, past ETH volume change and past BTC volume change.
library(keras)
Attaching package: 'keras'
The following object is masked from 'package:fabletools':
new_model_class
<- keras_model_sequential() %>%
model_rnn layer_gru(units = 9, # Nb units in hidden layer
batch_input_shape = c(1, nrow(train_coin), 3), # Dimensions = tricky part!
activation = 'tanh', # Activation function
return_sequences = TRUE) %>% # Return all the sequence
layer_dense(units = 1) # Final aggregation layer
%>% compile(
model_rnn loss = 'mean_squared_error', # Loss = quadratic
optimizer = optimizer_rmsprop(), # Backprop
metrics = c('mean_absolute_error') # Output metric MAE
)
Onto training!
<- model_rnn %>%
fit_rnn fit(x = train_coin |> # Training features
select(return_BTC, vol_change_BTC, vol_change_ETH) |>
data.matrix() |>
array(dim = c(1, nrow(train_coin), 3)),
y = train_coin |> pull(return_ETH) |>
array(dim = c(1,nrow(train_coin))), # Training labels
epochs = 7, # Number of rounds
batch_size = 1 , # Length of sequences
verbose = 0) # No comments
plot(fit_rnn) + theme_bw() + geom_point(color = "black") + geom_line()
Now, unfortunately, we cannot use the fitted model as such because the dimensions between the training and testing sets are not the same. We have to create a new model and import the old model’s weights…
<- keras_model_sequential() %>%
new_model layer_gru(units = 9,
batch_input_shape = c(1, # New dimensions
nrow(test_coin),
3),
activation = 'tanh', # Activation function
return_sequences = TRUE) %>% # Return the full sequence
layer_dense(units = 1) # Output dimension
%>% keras::set_weights(keras::get_weights(model_rnn)) new_model
We are now ready.
<- predict(new_model,
preds_rnn |>
test_coin select(return_BTC, vol_change_BTC, vol_change_ETH) |>
data.matrix() |>
array(dim = c(1, nrow(test_coin), 3)))
mean(abs(preds_rnn - test_coin |> pull(return_ETH) ))
[1] 0.04693831
Note: the result, because of the randomization of weights, is also random.
To be compared with the training metric…