[Paper] A Backtesting Protocol in the Era of Machine Learning

A Backtesting Protocol in the Era of Machine Learning, very recent paper (re-)stating the danger of using blindly machine learning in quantitative finance: essentially not enough data for powerful machine learning models to roam free of structure and economic hypotheses.

The paper lists a couple of pitfalls such as

  • the selection bias,
  • not discounting discoveries for multiple testing (cf. my implementation of Lopez de Prado deflated sharpe ratio),
  • picking the data transformations that yield the best results without being robust to small changes of those,
  • cross-validating is not as effective in quant finance,
  • ignoring trading costs and fees,
  • ignoring structural changes and overcrowding,
  • tweaking the model once in production,
  • heading for complex models when simple ones can do the job,
  • aiming at good results instead of good science.