This the companion notebook for the session on Neural Networks applied to factor investing.
The implementation is done using Keras and the Tensorflow environment, both of which run on Python.
These three items need to be installed on the compuer before you proceed.
In the financial applications below, the choice of dependent variable is not optimal. It was chosen to minimise code complexity and ease understanding of basic principles. Results may seem disappointing, but they are realistic! Neural networks are no magic wands in asset management. It take experience/experimentation to make them deliver good results. That’s up to you!
\(\rightarrow\) ABOUT THE NOTEBOOK:
We start slowly with plots of activation functions. All of them, except the absence of activation (linear/identity function) and the ReLu, are bounded, either in [0,1] or [-1,1].
library(tidyverse)
# Definition of functions below
ReLu <- function(x){return(pmax(0,x))}
Heaviside <- function(x){return((sign(x)+1)/2)}
Sigmoid <- function(x){return(1/(1+exp(-x)))}
Identity <- function(x){return(x)}
# The graph
ggplot(data.frame(x = c(-1.5,1.5)), aes(x)) +
stat_function(fun = Identity, aes(color = "Identity")) +
stat_function(fun = ReLu, aes(color = "ReLu"), linetype = 2, size = 1.2) +
stat_function(fun = Heaviside, aes(color = "Heaviside")) +
stat_function(fun = Sigmoid, aes(color = "Sigmoid")) +
stat_function(fun = tanh, aes(color = "TanH")) +
coord_fixed(0.6) + theme_light() +
theme(legend.position = c(0.8,0.2))
Data preparation takes time, but proper formatting is imperative. From data, we create data_f, which has normalised data: each feature has a uniform distribution at each date.
load("data.RData")
tick <- unique(data$Tick) # Vector of ticker symbols
data <- data %>% arrange(Date, Tick) # Order the data
data <- data %>%
select(-ESG_rank) %>% # Remove ESG: not enough points!
group_by(Tick) %>%
mutate(F_Return = lead(Close) / Close - 1) %>% # Forward monthly return
ungroup() %>% # Remove grouping
na.omit() # Remove missing data
We created returns above. Below, we rescale the features.
normalise <- function(v){ # This is a function that 'uniformalises' a vector
v <- v %>% as.matrix()
return(ecdf(v)(v))
}
data_f <- data %>% # Creating data with normalised values
select(-Close) %>% # Closing prices are not predictors
group_by(Date) %>% # Normalisation takes place for each given date
mutate_if(is.numeric, normalise) %>% # Apply normalisation on numerical columns
ungroup() %>% # Ungroup
arrange(Date, Tick) # Just making sure the order the same as before
data_f$F_Return <- data$F_Return # Keep original returns: better for interpretation
Then, further formatting is required! We prepare the data so that it fits as Keras inputs.
features <- c("Mkt_Cap", "P2B", "Vol_1M", "D2E", "Prof_Marg") # Predictor columns
split_date <- "2015-06-15" # Splitting point train versus test
train <- data_f %>% filter(Date < split_date) # Training data
test <- data_f %>% filter(Date > split_date) # Testing data
# Below, we split these sets into features versus labels
train_features <- select(train, all_of(features)) %>% as.matrix() # Matrix format is important
train_labels <- train$F_Return
test_features <- select(test, all_of(features)) %>% as.matrix() # Matrix format is important
test_labels <- test$F_Return
Next step, the model architecture and the training details. We keep low dimensions for the layers to reduce computational cost and also because a large number of parameters would be:
if(!require(keras)){install.packages("keras")} # Install Keras
# The user must also install Python & Tensorflow
library(keras)
# install_keras() # To complete installation
model <- keras_model_sequential()
model %>% # This defines the structure of the network, i.e. how layers are organized
layer_dense(units = 4, activation = 'relu', input_shape = ncol(train_features)) %>%
layer_dense(units = 3, activation = 'sigmoid') %>%
layer_dense(units = 1) # No activation means linear activation: f(x) = x.
model %>% compile( # Model specification
loss = 'mean_squared_error', # Loss function
optimizer = optimizer_rmsprop(), # Optimisation method (weight updating)
metrics = c('mean_absolute_error') # Output metric
)
summary(model) # Model architecture
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_2 (Dense) (None, 4) 24
## ________________________________________________________________________________
## dense_1 (Dense) (None, 3) 15
## ________________________________________________________________________________
## dense (Dense) (None, 1) 4
## ================================================================================
## Total params: 43
## Trainable params: 43
## Non-trainable params: 0
## ________________________________________________________________________________
The number of parameters is indeed equal to \[\mathcal{N}=\left(\sum_{l=1}^L(U_{l-1}+1)U_l\right)+U_L+1,\] where \(U_l\) is the number of units of layer l and \(U_0\) is the number of features. Here, \[(5+1)\times 4 + (4+1)\times 3+ (3+1) = 28 + 15 + 3 + 1 = 43.\]
Finally, training and testing!
fit <- model %>% fit(train_features, # Training features
train_labels, # Training labels
epochs = 10, batch_size = 64, # Training parameters
validation_data = list(test_features, test_labels) # Test data
)
plot(fit) + theme_light() # Plot, evidently!
mean(abs(test_labels)) # Error if zero prediction
## [1] 0.05318341
Training errors are usually smaller, but sometimes that may not be the case. The agnostic bet should reach an absolute error of 5.3% on average. The performance here is not great.
An important step: extracting predicted values! We extract the first output of the test set below.
predict(model, test_features[1:3,]) # The transpose is purely technical - but required!
## [,1]
## [1,] 0.007683896
## [2,] 0.010230683
## [3,] 0.012292013
This is the prediction from the neural network for the first test instance.
Below, we discuss four options (among so many!):
We discuss these four extensions below.
Sometimes, it is hard to know what an appropriate number of epochs might be. Luckily, it is possible to stop the training when the learning process stagnates. This feature is embedded inside a callback function in Keras, as is shown below. In the code below, we stop training when validation loss (val_loss) stops decreasing for more than 3 epochs.
fit <- model %>% fit(train_features, # Training features
train_labels, # Training labels
epochs = 10, batch_size = 32, # Training parameters
validation_data = list(test_features, test_labels), # Test data
verbose = 0, # No comments
callbacks = list(callback_early_stopping(
monitor = "val_loss", # Monitor validation loss
min_delta = 0.01, # Improvement threshold
patience = 3, # Nb epochs for no improvmt
verbose = 0 # No warnings
)
)
)
plot(fit) + theme_light() # Plot
Indeed, the training stops early!
VERY IMPORTANT NOTE: the model does not reinitiate the weights. It starts back with the fitted values obtain during the previous training! Hence, this new round of training only marginally improves on the previous one. We have simply changed the batch size, so this makes sense: no additional data was provided to the model.
We then turn to weight initialisation and penalisation. In Keras, weights can either be set to (small) random values or to constant values. We refer to the dedicated reference page for more details on the subject: https://keras.rstudio.com/reference/index.html#section-initializers
Moreover, weights can be penalised. This is done easily in the objective function: \[O=\sum_{i=1}^I \text{loss}(y_i,\tilde{y}_i) + \sum_{n=1}^N\left( \lambda_n^1 \| \mathbb{w}_n \|_1+\lambda_n^2\| \mathbb{w}_n \|_2^2\right),\]
where \(\mathbb{w}_n\) are the weights of unit \(n\) and \(\lambda_n^1\) and \(\lambda_n^2\) are the scaling factors related to the \(L^1\) and \(L^2\) norms of weights. Note: another option is to directly constrain the norm of weights, see: https://keras.rstudio.com/reference/constraints.html.
We provide one example below. Note that fixed weight initialisations have the advantage of reproducibility. Random seed are not easy to manage given the two underlying systems (R and Python).
Finally and for the sake of exhaustiveness, we mention the idea of dropout which is another technique used to avoid overfitting. The dropout line applies to the layer that follows it.
model_options <- keras_model_sequential()
model_options %>% # This defines the structure of the network, i.e. how layers are organized
layer_dense(units = 8, activation = 'relu', input_shape = ncol(train_features)) %>%
layer_dropout(rate = 0.25) %>% # Dropout: 25% of units (1/4) will be omitted
layer_dense(units = 4, activation = 'sigmoid', kernel_initializer = "random_normal") %>%
layer_dense(units = 3, # A layer with many options
activation = 'relu', # Activation: classic
bias_initializer = initializer_constant(0.2), # Bias initialisation
kernel_regularizer = regularizer_l2(0.01)) %>% # Weight regularisation
layer_dense(units = 1, activation = 'tanh')
model_options %>% compile( # Model specification
loss = 'mean_squared_error', # Loss function
optimizer = optimizer_rmsprop(), # Optimisation method (weight updating)
metrics = c('mean_absolute_error') # Output metric
)
summary(model_options) # Model structure
## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_6 (Dense) (None, 8) 48
## ________________________________________________________________________________
## dropout (Dropout) (None, 8) 0
## ________________________________________________________________________________
## dense_5 (Dense) (None, 4) 36
## ________________________________________________________________________________
## dense_4 (Dense) (None, 3) 15
## ________________________________________________________________________________
## dense_3 (Dense) (None, 1) 4
## ================================================================================
## Total params: 103
## Trainable params: 103
## Non-trainable params: 0
## ________________________________________________________________________________
Below, we train this model and make a quick prediction.
fit <- model_options %>% fit(train_features, # Training features
train_labels, # Training labels
epochs = 10, batch_size = 64, # Training parameters
validation_data = list(test_features, test_labels) # Test data
)
plot(fit) + theme_light() +
scale_color_manual(values = c("#EE9911", "#1199BB")) +# Plot, evidently!
theme(legend.position = c(0.8,0.8))
A comment here: training error decreases rapidly, but not testing error, which remains relatively constant (and below training error). The chances of high generalisation ability for the model are slim.
The prediction syntax is unchanged.
predict(model_options, test_features[1:3,]) # The transpose is purely technical - but required!
## [,1]
## [1,] 0.01127903
## [2,] 0.01587060
## [3,] 0.01737709
We now turn to classification. We again resort to a ternary coding of returns. We start by preparing the data. Unlike xgboost, Keras requires one-hot encoding.
r_thresh <- 0.02 # Return threshold
data_f$FC_Return <- data_f$F_Return # Duplicate return
data_f$FC_Return[data_f$F_Return <= (-r_thresh)] <- 0 # Low return
data_f$FC_Return[(data_f$F_Return > (-r_thresh)) & (data_f$F_Return < r_thresh)] <- 1 # Soso return
data_f$FC_Return[data_f$F_Return >= r_thresh] <- 2 # High return
train_class <- data_f %>% filter(Date < split_date) # Training data
test_class <- data_f %>% filter(Date > split_date) # Testing data
# Below, we split these sets into features versus labels
train_features_class <- select(train_class, features) %>% as.matrix()
train_labels_class <- model.matrix(~0+as.factor(train_class$FC_Return)) # One-hot encoding
test_features_class <- select(test_class, features) %>% as.matrix()
test_labels_class <- model.matrix(~0+as.factor(test_class$FC_Return)) # One-hot encoding
We then build a classification model. Several changes occur in the code:
model_class <- keras_model_sequential()
model_class %>% # This defines the structure of the network, i.e. how layers are organized
layer_dense(units = 4, activation = 'relu', input_shape = ncol(train_features_class)) %>%
layer_dense(units = 3, activation = 'softmax') # Softmax for classification
model_class %>% compile( # Model specification
loss = 'categorical_crossentropy', # Loss function
optimizer = optimizer_rmsprop(), # Optimisation method (weight updating)
metrics = c('categorical_accuracy') # Output metric
)
summary(model_class) # Model structure
## Model: "sequential_2"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_8 (Dense) (None, 4) 24
## ________________________________________________________________________________
## dense_7 (Dense) (None, 3) 15
## ================================================================================
## Total params: 39
## Trainable params: 39
## Non-trainable params: 0
## ________________________________________________________________________________
And finally, the training!
fit <- model_class %>% fit(train_features_class, # Training features
train_labels_class, # Training labels
epochs = 10, batch_size = 32, # Training parameters
validation_data = list(test_features_class, test_labels_class) # Test data
)
plot(fit) + theme_light() # Plot, evidently!
predict(model_class, t(test_features_class[1,])) # Prediction
## [,1] [,2] [,3]
## [1,] 0.328903 0.2590789 0.4120181
colMeans(test_labels_class) # Class imbalance
## as.factor(test_class$FC_Return)0 as.factor(test_class$FC_Return)1
## 0.3073529 0.2759804
## as.factor(test_class$FC_Return)2
## 0.4166667
We get a vector of probabilities for each class. The learning pattern is close to a plateau, which is again disappointing.
A whole section is dedicated to recurrent networks because they are much more tougher to handle. For simplicity, we will implement a Gated Recurrent Unit (GRU) model.
The implementation is slightly more complicated for a RNN because:
First, we must reshape the inputs, using arrays.
nb_ticks <- length(tick) # Nb stocks
nb_chars <- length(features) # Nb features
nb_dates_train <- nrow(train_features) / nb_ticks # Nb training dates (size of training sample)
nb_dates_test <- nrow(test_features) / nb_ticks # Nb testing dates
# Below, we morph the inputs in array/tensor format
train_features_rnn <- array(train_features, dim = c(nb_ticks, nb_dates_train, nb_chars))
test_features_rnn <- array(test_features, dim = c(nb_ticks, nb_dates_test, nb_chars))
train_labels_rnn <- matrix(train_labels, nrow = nb_ticks) %>%
array(dim = c(nb_ticks, nb_dates_train,1))
test_labels_rnn <- matrix(test_labels, nrow = nb_ticks) %>%
array(dim = c(nb_ticks, nb_dates_test,1))
The dimensions are crucial. In Keras, they are defined for RNNs as:
The problem is that these choices are binding when turning to the forecasting stage (as of Jan. 2019). Typically, when the training sample has a given number of timesteps, the testing sample must also have the same amount. Since we are aiming at one step-ahead prediction, it will be more convenient to set timesteps = 1.
This is not a big issue for two reasons: first, Keras models are mutable, hence calling fit() several times will update the training of the model. Second, there is a feature in Keras RNN that allows to take values from a previous batch and consider them as starting points in the current batch. This option is activated when stateful = TRUE in the model arguments.
The solution we employ is thus to loop the training across the different assets and to use batch sizes equal to the number of chronological points in the sample. Nevertheless, to avoid undesired overlapping of hidden states when switching from one batch (i.e., asset) to another while training, we resort to the command reset_states() after each batch.
model <- keras_model_sequential() %>%
layer_gru(units = 16, # Nb units in hidden layer
batch_input_shape = c(nb_dates_train, 1, nb_chars), # Dimensions = tricky part!
activation = 'relu', # Activation function
stateful = TRUE) %>% # Passing last state to nxt batch
layer_dense(units = 1) # Final aggregation layer
model %>% compile(
loss = 'mean_squared_error', # Loss = quadratic
optimizer = optimizer_rmsprop(), # Backprop
metrics = c('mean_absolute_error') # Output metric MAE
)
for(i in 1:nb_ticks){ # Loop over assets
model %>% fit(array(train_features_rnn[i,,], dim = c(nb_dates_train, 1, nb_chars)),
train_labels_rnn[i,,], # Training labels
epochs = 10, # Number of rounds
batch_size = nb_dates_train, # Length of sequences
verbose = 0 # Do not display progress bar
)
model %>% reset_states() # Reset hidden states in the network when changing asset
}
In the above example, we roll the training via the stateful option. It could be possible to avoid doing this by inverting the batch size with the timesteps, but the label size must be handled differently. Because the batch size would be one, the label should consist of only one point (the last one) and the architecture would be somewhat different.
For a detailed account of all the properties of GRU networks in Keras, we recommend the reference page: https://keras.rstudio.com/reference/layer_gru.html
Below, we create a new model with the parameters of the one we just have trained. Given the input requirements of Keras, we choose (!) to work with a double loop. Indeed the handling of batch sizes is not straightforward, we thus proceed with single batches, one step at a time.
new_model <- keras_model_sequential() %>%
layer_gru(units = 16, batch_input_shape = c(1, 1, nb_chars), # NEW BATCH SIZE: ONE ITEM ONLY!
activation = 'relu', # Activation function
stateful = TRUE) %>% # Passing last state to next batch
layer_dense(units = 1)
new_model %>% set_weights(get_weights(model))
forecasts <- matrix(0, nrow = nb_dates_test, ncol = nb_ticks)
for(j in 1:nb_ticks){ # Loop on stocks
for(i in 1:nb_dates_test){ # Loop on test dates
test_sample <- test_features_rnn[j,i,] %>% array(dim = c(1, 1, nb_chars)) # Temp test sample
forecasts[i,j] <- predict(new_model, test_sample, batch_size = 1) # Prediction
}
}
forecasts <- forecasts %>% data.frame() # Reformatting the results
colnames(forecasts) <- tick
head(forecasts) # Displaying the results
## AAPL BA BAC C CSCO CVS CVX
## 1 0.01949294 0.1238271 0.06203436 0.1339080 0.12036975 0.11682026 0.01977338
## 2 0.02365539 0.1459534 0.09767368 0.1354819 0.10002992 0.11945204 0.03210038
## 3 0.03345459 0.1409788 0.10650982 0.1172802 0.11369056 0.10393780 0.03851758
## 4 0.04446844 0.1326413 0.12706353 0.1234715 0.10177889 0.09102554 0.05417507
## 5 0.04890548 0.1214298 0.11671874 0.1258701 0.09734003 0.07978674 0.04877361
## 6 0.05294538 0.1139696 0.12207721 0.1205379 0.11232569 0.07107325 0.05154252
## D DIS F GE HD IBM
## 1 0.08462397 0.04617634 0.06466951 -0.02943616 0.05651163 0.005473061
## 2 0.08870284 0.03499522 0.07626037 -0.06653591 0.05382570 0.018538706
## 3 0.08739338 0.05773829 0.06301197 -0.07642568 0.05148355 0.038852725
## 4 0.08094107 0.06522466 0.05196173 -0.05488770 0.04876354 0.045998901
## 5 0.07125250 0.05381625 0.05262602 -0.04663152 0.04824727 0.049476229
## 6 0.07674345 0.05238795 0.02854374 -0.04711435 0.03413342 0.054076016
## INTC JNJ JPM K MCK MRK
## 1 0.08360042 0.10979138 0.03852692 0.13752125 0.09829006 0.03465381
## 2 0.10941762 0.08284923 0.05040085 0.14351591 0.09505129 0.02757927
## 3 0.11979502 0.07167646 0.05541964 0.13038389 0.09392039 0.03120726
## 4 0.11808260 0.05501388 0.06449872 0.11614393 0.09699351 0.05998521
## 5 0.10988586 0.04390071 0.06273219 0.09954669 0.10594976 0.06630570
## 6 0.10702608 0.03607112 0.06193262 0.08743370 0.10409075 0.06360394
## MSFT ORCL PFE PG T UNH
## 1 0.0124983741 0.10099673 0.0008221548 0.012940611 0.028254710 0.03116279
## 2 -0.0082691582 0.09977308 0.0101430742 0.011836113 0.020050183 0.05357419
## 3 -0.0185379088 0.08653473 0.0289592668 -0.003246208 0.005723811 0.06629346
## 4 0.0008776253 0.07808002 0.0332365818 -0.001323950 -0.014161185 0.06943801
## 5 0.0237668119 0.07187674 0.0361789837 0.001643203 -0.024276286 0.06797642
## 6 0.0427880175 0.08039701 0.0450083539 0.007681711 -0.034014985 0.06659734
## UPS VZ WFC WMT XOM
## 1 0.07078810 0.003322459 0.01036216 0.04370237 -0.030562453
## 2 0.09271787 -0.004052998 0.02455194 -0.01037045 -0.002536726
## 3 0.10198719 -0.009198244 0.03961971 -0.03069826 0.022028757
## 4 0.10231321 -0.013529894 0.05484089 -0.04052212 0.023464072
## 5 0.09630442 -0.019007489 0.05764509 -0.02398769 0.017434955
## 6 0.08929661 -0.019625962 0.05295432 -0.02582015 0.022202801
In the above table, the figures can serve as input for investment decisions. In the next section, we implement one such NN-based strategy.
We now turn to the embedding of neural networks in portfolio backtesting. We plan to compare a NN-based strategy to the usual equally-weighted (EW) portfolio. We use the neural network to forecast future returns and form an EW portfolio of the ten stocks with the highest predictions. All parameters of the NN are hard-coded in the weights function (which is pretty bad!). The predictors are the same as above and gathered in the variable features.
weights_multi <- function(train_data, test_data, j){
N <- test_data$Tick %>% unique() %>% length() # Number of stocks
features <- c("Mkt_Cap", "P2B", "Vol_1M", "D2E", "Prof_Marg") # Hard-coded
if(j == 1){ # j = 1 => EW
return(rep(1/N,N))
}
if(j == 2){ # j = 2 => NN-based
model <- keras_model_sequential() # We directly define the model
model %>%
layer_dense(units = 4, activation = 'relu', input_shape = length(features)) %>%
layer_dense(units = 3, activation = 'sigmoid', kernel_initializer = "random_normal") %>%
layer_dense(units = 1) # Linear activation: f(x) = x
model %>% compile( # Model specification
loss = 'mean_squared_error', # Loss function
optimizer = optimizer_rmsprop(), # Optimisation method (weight updating)
metrics = c('mean_absolute_error') # Output metric
)
# Below, we quickly format the data
train_features <- select(train_data, features) %>% as.matrix()
train_labels <- train_data$F_Return
# And now, it's ready for its training purpose!
fit <- model %>% fit(train_features, # Training features
train_labels, # Training labels
epochs = 5, batch_size = 256, # Training parameters
verbose = 0
)
test_features <- select(test_data, features) %>% as.matrix() # Test input
pred <- predict(model, test_features)
return((pred>quantile(pred, 2/3))/10) # Ten largest values, equally-weighted
}
}
Then we fix the backtesting canvas, with a focus on dates, essentially.
sep_date <- as.Date("2013-01-01") # This date separates in-sample vs out-of-sample
t_oos <- data$Date[data$Date > sep_date] %>% # Out-of-sample dates (i.e., testing set)
unique() %>%
as.Date(origin = "1970-01-01")
Tt <- length(t_oos) - 1 # Nb of dates, avoid T = TRUE
nb_port <- 2 # Nb of portfolios
portf_weights <- array(0, dim = c(Tt, nb_port, length(tick))) # Store weights
portf_returns <- matrix(0, nrow = Tt, ncol = nb_port) # Store returns
We stop the backtest before the last date because on this date, there is no forward return. Also, we try to avoid to set a value to T, as it is reserved to the short version of the boolean TRUE.
We can now turn to the dynamic implementation. Given the computation time for one neural network, the iteration over all out-of-sample dates will take time even though the dataset is rather small. The usual solution is to resort to GPU computing. Neural nets are notoriously slow, compared to tree methods.
for(t in 1:(length(t_oos)-1)){ # Stop before the last date: no fwd return!
if(t%%12==0){print(t_oos[t])} # Just checking the date status
train_data <- data_f %>% filter(Date < t_oos[t]) # Expanding window: all past values!!!
test_data <- data_f %>% filter(Date == t_oos[t]) # Current values
realized_returns <- data %>% # Computing returns via
filter(Date == t_oos[t]) %>% # Filtering
select(F_Return) # Selecting
for(j in 1:nb_port){
portf_weights[t,j,] <- weights_multi(train_data, test_data, j) # Hard-coded params, bad!
portf_returns[t,j] <- sum(portf_weights[t,j,] * realized_returns)
}
}
## [1] "2013-12-31"
## [1] "2014-12-31"
## [1] "2015-12-31"
## [1] "2016-12-30"
## [1] "2017-12-29"
## [1] "2018-12-31"
## [1] "2019-12-31"
## [1] "2020-12-31"
We recycle the perf_met() function we already worked with in previous sessions. It is sourced from a separate file in the working directory.
source("perf_met.R") # Import external code/script
returns <- data %>% # Compute return matrix: start from original data
filter(Date %in% t_oos) %>% # Keep out-of-sample dates
select(Date, Tick, F_Return) %>% # Keep 3 attributes
spread(key = Tick, value = F_Return) # Shape in matrix format
perf_met_multi(portf_returns = portf_returns, # Performance metrics (using imported function)
weights = portf_weights,
asset_returns = returns,
t_oos = t_oos,
strat_name = c("EW", "NN"))
## avg_ret vol Sharpe_ratio VaR_5 turn
## EW 0.010228461 0.03932889 0.2600750 -0.05506972 0.03884991
## NN 0.009584362 0.04532412 0.2114627 -0.06855076 1.32234583
On paper, the enhanced NN-based strategy seems comparable to the EW benchmark, at least in terms of Sharpe ratio. The big issue in optimised portfolios is always turnover: asset rotation is much too high. This is a classical problem in quantitative portfolio management. There are many ways to solve this drawback and the core ideas are either to smooth the allocation scheme or to enforce hard constraints (e.g., min/max weights).