Trading Glass
FeaturesPricingAcademyBlogChartJournal
Loading
All Courses
Biases in BacktestingEdge DegradationOutliers and Their Impact on MetricsSharpe Ratio & Sortino RatioSignal-to-Noise Ratio
Academy/Trading Intelligence/Advanced Statistical Thinking

Biases in Backtesting

Trading Intelligence

9 min read

Avoid survivorship bias, overfitting, and lookahead bias that make backtest results lie to you before you go live.

Loading

Related Topics

Outliers and Their Impact on Metrics

12 min

Sharpe Ratio & Sortino Ratio

9 min

Signal-to-Noise Ratio

9 min

Nash Equilibrium and No Arbitrage

8 min

Previous Topic

Law of Large Numbers & Confidence Intervals

Next Topic

Edge Degradation

Trading Glass

Next-generation charting order flow platform with rotation view, cluster visualization, and real-time analytics for professional traders and quantitative analysts.

Product

  • Features
  • Pricing
  • Chart
  • Journal

Resources

  • Academy
  • Blog
  • Documentation
  • API Reference
  • Support

Company

  • About
  • Contact

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy

© 2026 Trading Glass. All rights reserved.

PrivacyTerms

Your backtest results might be lying to you — here’s how to spot it and fix it before going live.

Introduction

Backtesting is critical.

But bad backtesting?

It’s worse than no testing at all — because it gives you false confidence.

Most traders don’t lose because they’re lazy. They lose because they over-trusted a strategy that looked great in hindsight — but was built on invisible flaws.

Three forces conspire to inflate every backtest: (1) you only ever publish the strategies that worked on history (selection), (2) you tune until they work (overfit), (3) your cost model understates real execution friction. The result is a Sharpe distribution centered well above your live distribution. Understanding why is what separates testing from theatre.

Prereqs: comfort with in-sample vs out-of-sample, basic Monte Carlo, Sharpe ratio. Module path: this lesson covers the structural errors that make a backtest lie. The next lesson, Edge Degradation, covers what happens to honest edges over time. Outliers covers the third lie: a single fat tail masquerading as skill.


The 3 Most Common Backtest Biases


1. What is lookahead bias?

Using future information that wasn’t actually available at the time of trade.

Examples:

  • Entering based on the close of a candle — before it’s actually closed
  • Calculating moving average crossovers using the current, unclosed bar's price — the cross only becomes valid after the bar closes; using its in-progress value is lookahead
  • Using signals from a candle that hasn’t fully formed

Why it's dangerous: Your entries and exits appear “accurate,” but they’re unrealistically perfect — because you’re cheating time.

How to fix it:

  • Only act on closed candles (use bar replay or time-based logic)
  • Avoid functions that reference “future bars” in code
  • Simulate entries realistically (e.g. next bar open, bid/ask spreads)

2. What is overfitting in trading strategies?

Creating a strategy that performs well only on past data — but fails in real-time.

Symptoms:

  • Too many filters (volume spike + RSI + MA + pattern + moon phase)
  • Perfect equity curve in one market — but breaks in others
  • Strategy only works on one pair, one timeframe, one year

Why it's dangerous: You’re not discovering an edge — you’re memorizing noise. Bailey & López de Prado (2014) showed that with as few as 7 trials at the standard 5% level, the probability of a false positive exceeds 30%. Their Deflated Sharpe Ratio adjusts your reported Sharpe for the number of trials run.

How to fix it:

  • Test across multiple instruments & time periods
  • Keep your rules simple and robust
  • Apply Combinatorially Symmetric Cross-Validation (CSCV) and report Probability of Backtest Overfitting (PBO). At PBO > 0.5 your "best" strategy is more likely overfit than not
  • Walk-forward (anchored): fit on [t0, t0+12m], test on [t0+12m, t0+15m], roll the test window forward 3 months, re-fit, repeat. Report only the concatenated test equity. Never reuse a test slice in fitting
Equity Curve Simulator
34.8k28.6k22.4k16.2k10.0k0100200Trades
Final: $34281 (+242.8%)

3. What is survivorship bias?

Only testing systems or assets that still exist — ignoring those that failed or changed drastically.

Examples:

  • Backtesting an index-style universe using today's constituents (e.g. current S&P 500 members) and projecting their history backward — winners are over-represented because losers were delisted
  • Treating cost mismodeling (slippage, spread, fees, black-swan blowups) as part of survivorship — they're a distinct bias and need their own fix

Why it’s dangerous: You’re assuming the conditions that created your edge will always exist.

How to fix it:

  • Use complete historical datasets, not just what exists now
  • Include “dead” assets in portfolio-level testing
  • Simulate volatility regimes, liquidity drops, and spreads increasing

4. What is data-snooping (multiple-testing) bias?

If you test 100 strategies at the 5% significance level, ~5 will look "good" by pure chance.

This is the bias that makes most public backtests garbage. Every parameter you sweep, every variation you tweak, every chart you eyeball is another silent trial — and the more trials you run, the higher the probability that something looks like edge purely from noise.

How to fix it:

  • Track the number of trials honestly (every parameter combination, every variant counts)
  • Apply the Deflated Sharpe Ratio (Bailey & López de Prado, 2014) — it adjusts your reported Sharpe for the number of trials
  • Use CSCV (Combinatorially Symmetric Cross-Validation) to estimate the Probability of Backtest Overfitting (PBO). PBO > 0.5 → your "best" strategy is more likely overfit than not
  • Tools: López de Prado's mlfinlab library, or hand-roll CSCV in ~50 lines of Python

Other Biases to Watch For

Bias TypeDescriptionFix
Selection biasOnly testing your “favorite” tradesInclude every trade in your data sample
Cherry pickingManually excluding ugly outcomesLog every result, good or bad
Optimism biasAssuming you’ll always get filled at ideal pricesSimulate slippage and order book depth realistically
Anchoring biasRefusing to retest or abandon old systemsLet data guide decisions, not nostalgia

Best Practices for Honest Backtesting

1. Use realistic assumptions

  • Apply a cost floor before judging edge: crypto perp ≈ 5 bps fee + 2–5 bps slippage round-trip; equities ≈ half-spread + 0.1·σ·√(size/ADV). If your edge dies under realistic costs, it was never edge
  • Account for execution delay (e.g. not entering at candle close)
  • Simulate partial fills for large size

Cost-floor models by asset class. Apply these before judging edge.

Asset classFeeSlippageTotal round-trip floor
Crypto perp~5 bps2 to 5 bps7 to 10 bps
Equities~half-spread0.1 * sigma * sqrt(size/ADV)Spread plus impact term

2. Separate in-sample and out-of-sample periods

  • Train your strategy on one period
  • Validate it on a completely different one → If performance holds across both: more robust

3. Keep your strategy as simple as possible

“A system is only as good as its worst assumption.”

Fewer moving parts = less overfitting risk. (See Outliers and Their Impact on Metrics for how a single bar can create the illusion of a fitted edge, and Sharpe Ratio & Sortino Ratio for the metric most degraded by these biases.) The simpler it is, the easier it is to test, improve, and trust


FAQ

What is lookahead bias in backtesting?

Lookahead bias is using future information that wasn't actually available at the time of the simulated trade — for example, acting on a candle's close price before that candle has fully closed. It produces unrealistically perfect entries that vanish in live trading.

What is overfitting in trading strategies?

Overfitting is creating a strategy that performs well only on past data because it has memorized noise rather than discovered structure. Bailey & López de Prado (2014) showed that with as few as 7 trials at the standard 5% level, the probability of a false positive exceeds 30%.

How much should I discount my backtest Sharpe ratio?

Working heuristic: discount your backtest Sharpe by 30–50% before believing it. Even with rigorous IS/OOS, regime change and execution friction take roughly that bite. If your strategy is unprofitable after the haircut, it has no edge — only fitting.


Final Thought

Most failed traders didn’t skip testing. They trusted flawed testing.

Your system’s performance is only as reliable as the integrity of your backtest — and the Working heuristic is to discount your backtest Sharpe by 30–50% before believing it. Even with rigorous IS/OOS, regime change and execution friction take roughly that bite. If your strategy is unprofitable after the haircut, it has no edge — only fitting.

Backtest Sharpe haircut

Even with rigorous IS/OOS, regime change and execution friction take roughly this bite out of reported Sharpe. Apply the haircut before you judge edge.

30 to 50%

Pre-trust checklist (5 items). Before you bet a dollar on a backtest, run this:

  1. IS/OOS split with no peeking (re-running OOS after a poor result silently turns it into IS)
  2. Costs floored at realistic exchange numbers (≈ 5–10 bps round-trip for crypto perps)
  3. Tested across >1 instrument and >1 regime
  4. Parameter count << degrees of freedom in the data
  5. Sharpe deflated for trial count (DSR), and PBO < 0.5 from CSCV

Fail any one → assume your edge is artifact.


Further reading: Bailey, Borwein, López de Prado, Zhu (2014) Pseudo-Mathematics and Financial Charlatanism — the PBO/DSR paper. Bessembinder (2018) Do Stocks Outperform Treasury Bills? — the canonical survivorship-bias study. López de Prado (2018) Advances in Financial Machine Learning, ch. 11–14.