In this trading Strategy Backtesting Guide you will find a backtesting approach as well as some guidelines on how to avoid overfitting and what metrics to include in performance reports.
Table Of Content
Backtesting refers to testing a predictive model or trading system using historical data. Traders use backtesting to test strategy ideas, compare strategy performance in different markets, time frames as well as determine optimal input parameter values for their systems.
Trading strategies and parameters are evaluated by feeding a set of historical data, such as open/high/low/close prices, technical analysis calculations, options greeks, etc.. to a custom backtesting application, R script, or an Excel spreadsheet and evaluating resulting strategy performance using a set of metrics.
In this article we’ll be focusing on applying backtesting to the so called “system trading” – a trading approach where traders develop, test, and run automated rule-based trading algorithms and evaluate strategy performance based on concrete data.
Why Do We Still Need to Backtest?
Everyone knows that past results do not guarantee future performance. However, while a good backtest performance result is not by itself sufficient to guarantee trading profits (and what is?) – it is nevertheless one of the necessary tests that traders will put their systems through in order to consider them fit for live trading.
In order to ensure long-term success in the markets – traders need not only generate and apply original trading ideas, but also keep coming up with fresh ones on a regular basis. A system trader may have to go through dozens of prospective strategies before he finds one that works. And then, how does one pick the optimal set of indicators, input parameters, and markets to apply the strategy to? Should you use a 20-period Bollinger Band, 30, or 50? Do you go with 2 standard deviations or 2.5, or 3? Should you use the same std. deviation value for both upper and lower bands or take into account current price trend and set a higher (lower) std. deviation value on the corresponding band? Go with 14, 30, 50, or a 100-bar moving average? SMA, EMA, WMA? What mechanisms to use to detect volatility levels? These are simple examples of the type of questions a trading system developer will have to answer.
To make things worse – no trading strategy, no matter how good it is, remains profitable forever. Profitability window generally remains open only for a small period of time, generally between few weeks and a few months. After that a number of things may happen to make the strategy stop performing: market conditions change, trending market turns into range-bound or vice versa, other traders find the same profit opportunity and close it, HFT trades mess things up, your broker catches on to your strategy and starts front-running you (it does happen – always make sure you use a reputable and regulated broker), etc… Regardless of the reason – a trader who wants to stay profitable for an extended period of time must keep regularly changing strategies and adapting to new conditions. That means system traders need to constantly be looking for and testing new strategies. It might take weeks or months to develop a new working strategy. In the meantime, an alternative solution can be to modify a proven existing strategy by tweaking input parameters, adding a new “secret sauce” rule, or simply applying it to a different market. All of the above can be time consuming considering the sheer number of input parameter combinations that need to be evaluated and tested.
Backtesting proves useful for a couple of reasons. First, it is a method that provides concrete performance data for side-by-side strategy comparison. It eliminates guesswork and enables traders to apply scientific method to trading. Second, automated backtesting is a great time-saving tool. A good backtesting tool provides a way to iterate over thousands of parameter combinations and find the optimal ones. This process can be executed repeatedly on daily basis to ensure that a strategy stays fine-tuned using most up-to-date data.
One of the biggest challenges in backtesting is curve fitting (also known as “overfitting”). Since backtesting makes it easy to tweak parameters until your strategy performs perfectly – you often end up over-optimizing the strategy to the specific data set you happened to be testing the system on. This is called overfitting. Overfitting tends to happen more frequently when strategy uses a large number of parameters and indicators . With higher number of available variables it becomes easier to generate a curve that fits historical performance perfectly.
There are several ways to mitigate the risk of overfitting:
Keep number of input parameters reasonable. If you use more than 2-3 price-based technical analysis indicators – you probably need to get rid of some of them. Price-based indicators tend to duplicate each other’s signals with varying degrees of delay and adding more than a couple is redundant.
Test your strategy on several distinct sets of data. One approach often used in machine learning is to split your historical data set into two parts: training (about 60-70% of available data), and validation (the other 30-40%). Training data set is used for testing and parameter optimization. When you think you found sufficiently good parameter values – run them on the validation data set and compare performance results. If performance shown on the training data set is significantly better than from the validation data set – you have an overfitting problem. You over-optimized the strategy to work perfectly on the training set, but not on any other data.
Test on historical data from different market instruments. If your strategy is genuinely profitable – it should not only perform well on AAPL or S&P 500 futures – it should show at least comparable results on other contracts/symbols. Similarly, if you trade Forex – don’t test your strategy only on EUR/USD, even if that is the only pair you intend to trade live – test your strategy on a few other pairs and see if performance results are more or less similar. If performance varies significantly – dig deeper to figure out the cause. Again, the main culprit would be overfitting.
Commissions and Slippage
A trading strategy performance report that does not include commissions and slippage cannot be considered seriously. The whole idea of backtesting is to test how a strategy would perform during live trading using real money, and commissions and slippage are two unavoidable realities of trading.
Optimizing a strategy with commissions and slippage will reflect reality of trading and prevent nasty surprises, such as finding out that your strategy, while super-profitable in idealized backtesting environment, performs horribly in live trading.
Strategies that generate a large number of trades will obviously accumulate large commission and slippage costs.
Calculating commission is straightforward- find out what your broker charges per trade and multiply that amount by number of trades.
Slippage is a bit trickier. First of all, there are several main causes of slippage: bid -ask spreads, market volatility, and (lack of) liquidity for low-volume instruments. Calculating slippage accurately can become a laborious task. The good news – you probably don’t need to do it. In practice approximating slippage is all that’s needed to generate sufficiently accurate performance reports that will reflect live trading performance close enough. One simple approach we recommend is to simulate slippage by adjusting every entry and exit trade price by a few ticks against your direction. Number of slippage ticks should usually be one of the input parameters for your strategy, just like the commission amount.
Keep in mind: slippage can have varying degrees of impact on your strategy depending on what type of orders you use and what markets you trade. Strategies using market orders will experience higher slippage, while this using limit orders – lower. Similarly, slippage will be less of an issue in high-volume liquid markets than in low-volume slow ones.
Strategy Performance Reports
A performance report should include a number of metrics that will describe trading system performance, expected returns, and, more importantly: expected risk.
As a trading system developer you will often find yourself comparing multiple strategy performance reports for different parameter values, trading instruments, time frames, and time periods. For example, you may have access to ten years of stock prices, but only one or two years of S&P futures data. In order to compare apples to apples it is helpful to standardize metrics presented in performance reports. All expected profit and loss figures, both absolute dollar amounts, and percentage value should be annualized.
Drawdown is the difference, at any given time, between equity value at that time, and the maximum equity generated by the strategy up to that point in time.
Drawdowns are a measure of risk, and managing risk should be the primary objective a trading strategy developer, much more important than profit generation. Your first priority should always be to “stay alive” and preserve your capital, and only then to increase it.
A performance report must always include drawdown statistics such as the longest drawdown and biggest loss due to a drawdown, measured as both: dollar amount and percentage of initial account size.
There are dozens of different performance ratios used to measure trading strategy performance. In fact, an entire series of articles can be written on them (and many already have been). We find that using a couple of ratios which are widely accepted and understood will usually be sufficient for strategy performance assessment. Ratios must be risk-adjusted so that they reflect the risks of running a strategy as opposed to only its profit generation potential.
Again, you can find detailed descriptions of many different ratios in various books, blog posts, and white papers available online and in print. At some point we may write a more detailed post on the subject, but for now we’ll limit ourselves to describing two ratios that we have been using on performance reports for our clients:
Sharpe Ratio divides the average return of an investment by the standard deviation of its returns. The standard deviation is taken as a measure of the investment’s risk. A higher Sharpe Ratio suggests more returns at lower risk. But the standard deviation includes variations above the average returns. Most people like those and only worry about the below average returns.
Measures average return adjusted for risk as measured by standard deviation (i.e. volatility). Higher volatility brings Sharpe ratio down.
Value interpretation guidelines: over 1.0 : good, over 2.0 : very good, over 3.0 : awesome
Calmar ratio uses maximum drawdown (decline from a historical peak, see the Wikipedia article, Drawdown (economics)) instead of standard deviation as a measure of risk. So the Calmar Ratio is an investment’s average return (usually for a 3 year period, but does not have to be) divided by its maximum drawdown in the same period. A higher Calmar Ratio suggests more returns at lower risk.
Because big drawdowns bring Calmar ratio down – it is a valuable risk-adjusted performance indicator.
Other Useful Metrics
Other metrics useful for getting a better insight into performance of a strategy and learning what to expect if/when you launch for live trading:
- Total number of trades.
Reflects how active your strategy is. How many trades should you expect to see when your run your strategy.
- Expected Profit/Loss per trade.
This, combined with the total number of trades, is one of the most important metrics. It will show you how to much profit (or loss) your strategy is expected to generate over a period of time.
Calculating expected trade Profit/Loss amount:
Exp P/L = (Avg Profit * Pct Win Trades) + (Avg Loss * Pct of Losing Trades)
- Exp P/L = expected profit or loss per trade
- Avg Profit = average profit per winning traded, expressed as currency amount
- Pct Win Trades = percent of winning trades (round-trip)
- Avg Loss = average loss per losing trade, expressed as currency amount
- Pct of Losing Trades = percent of losing trades (round-trip)
- Number of winning trades & pct of all.
This and the next metric are important for setting expectations and managing stress levels when you watch your strategy running live. Depending on your personality if this number is too low (and number of losing trades is too high) – you may not be comfortable with this strategy, even if the expected P/L per trade will turn overall performance in your favor over time.
- Number of losing trades and pct of all
- Average Winning trade P/L. Again, important for setting expectations.
- Average Losing trade P/L
- Total commissions.
Both dollar amount and as percent of starting account balance.
Watch this statistic, if total amount of commissions and slippage (next) is too high – it can ruin overall performance of an otherwise profitable strategy.
- Slippage. Dollar amount and as percent of starting account balance
This is an introductory article on trading strategy backtesting. We hope we provided some useful insights and tips into trading strategy development and backtesting. We intend to continue posting more articles on the subject, please check back regularly. In the meantime – please check out our other posts and hand-picked book selections we posted at the end of each article. Each book we recommend is one that we read ourselves and found it containing useful information for traders and system developers.
Trading Geeks provides consulting services in trading strategy and software development for independent traders, partnerships, and hedge funds. Please inquire for more information or a free quote for your project via Contact Us form on the right.