Every strategy gets tested with our 8 Level Validation Framework.

Here is exactly how we decide what earns it.

The Problem We Are Solving

Most backtests are built to look good — not to be honest.

Most trading platforms show you a backtest. A backtest is a chart that shows how a strategy would have performed on historical data. The chart usually looks great — because it was built on that exact data.

The uncomfortable truth: more than half of published trading strategies fail in live markets. Not because markets changed. Because the backtest was flawed from the start — too little data, hidden costs, or the strategy was tuned until it happened to look good on that particular piece of history.

A strategy that looks good on paper is not the same as a strategy that works in the real world. Our job is to close that gap.

The Core Idea

Three questions every strategy must answer.

Before trusting any strategy, you need honest answers to three questions.

The Question	What it means
Is the backtest honest?	Was this built on enough real data, with real costs, without cherry-picking the best time period?
Is the edge real?	Does this strategy have a genuine, provable reason to make money — or did it just get lucky?
Does it hold up?	Does it work on data it has never seen before — and can a real account survive its worst moments?

The 8 Levels of Validation are how we answer those questions. Each level is an independent test. All 8 must pass. There is no partial credit.

The 8 Levels

The 8 levels of validation.

Levels run in order from fastest to most rigorous. A strategy that fails an early level does not proceed.

Level 1: Enough Data

The question it answers

Do we have enough history to trust the numbers?

Why this matters

Imagine a coin that came up heads 7 times in a row. Does that mean the coin always lands heads? No — 7 flips is not enough to know. Trading strategies work the same way. A strategy that looks brilliant after 6 months of testing might simply have been lucky. Level 1 sets a minimum threshold: we need enough independent data points before any result is meaningful.

What we check

The number of active trading periods in the backtest. For daily strategies, a minimum of 30 active days. For monthly strategies, a minimum of 10 active months.

What failure looks like

A strategy tested on only a few months of data, or a strategy with a very long setup period that consumes most of the available history before the first trade.

Level 1: Enough Data

The question it answers

Do we have enough history to trust the numbers?

Why this matters

What we check

The number of active trading periods in the backtest. For daily strategies, a minimum of 30 active days. For monthly strategies, a minimum of 10 active months.

What failure looks like

A strategy tested on only a few months of data, or a strategy with a very long setup period that consumes most of the available history before the first trade.

Level 2: Real Costs

The question it answers

Does it still work when you add trading fees?

Why this matters

Every time you buy or sell, you pay a cost — the spread between the buy price and the sell price, plus any commissions. A strategy that trades frequently can look great before costs and terrible after. Level 2 applies realistic transaction costs to every trade in the backtest and asks a simple question: is there still a profit left?

What we check

We deduct a small, realistic cost on every day the strategy makes a trade. The net return after all costs must still be positive.

What failure looks like

A high-frequency strategy that trades constantly, consuming its own edge in fees. Or a strategy with a paper-thin margin that disappears the moment real-world frictions are applied.

Level 2: Real Costs

The question it answers

Does it still work when you add trading fees?

Why this matters

What we check

We deduct a small, realistic cost on every day the strategy makes a trade. The net return after all costs must still be positive.

What failure looks like

A high-frequency strategy that trades constantly, consuming its own edge in fees. Or a strategy with a paper-thin margin that disappears the moment real-world frictions are applied.

Level 3: Proven Signal

The question it answers

Is there genuine statistical evidence the strategy works?

Why this matters

Even a strategy with positive returns after costs might just have been lucky. Level 3 applies a statistical test to determine whether the returns are genuinely above zero — or whether the same result could plausibly have appeared by chance. Think of it as asking: if we ran this strategy on 1,000 different random markets, how often would we see returns this good by accident?

What we check

A one-tailed statistical test on the full return history, requiring the positive result to be statistically significant.

What failure looks like

A strategy where the returns, while positive, are too small or too inconsistent to be distinguishable from random noise given the amount of data available.

Level 3: Proven Signal

The question it answers

Is there genuine statistical evidence the strategy works?

Why this matters

What we check

A one-tailed statistical test on the full return history, requiring the positive result to be statistically significant.

What failure looks like

A strategy where the returns, while positive, are too small or too inconsistent to be distinguishable from random noise given the amount of data available.

Level 4: Consistent Over Time

The question it answers

Did it work in multiple different periods — not just one?

Why this matters

Markets change. A strategy that worked brilliantly from 2010 to 2020 may have simply been riding a decade-long bull market. Level 4 breaks the backtest into multiple rolling five-year windows and checks whether the strategy was profitable in most of them — not just the overall period.

What we check

The strategy is tested across all rolling five-year windows in the data. At least 60% of those windows must show a profit.

What failure looks like

A strategy whose entire backtest performance is driven by one exceptional period while losing money in every other window. That is not a strategy. That is a lucky era.

Level 4: Consistent Over Time

The question it answers

Did it work in multiple different periods — not just one?

Why this matters

What we check

The strategy is tested across all rolling five-year windows in the data. At least 60% of those windows must show a profit.

What failure looks like

A strategy whose entire backtest performance is driven by one exceptional period while losing money in every other window. That is not a strategy. That is a lucky era.

Level 5: Beats Random

The question it answers

Does it outperform a purely random trading strategy?

Why this matters

This is one of the most powerful tests in the framework. We take the strategy's trading signals and compare them against 1,000 randomly generated signals applied to the same price data. If the real strategy doesn't consistently outperform random entry and exit points, the signal mechanism is not adding value.

What we check

The strategy's return must rank in the top 5% of 1,000 random strategies tested on the same data.

What failure looks like

A strategy that scores at the 50th percentile or below — meaning random trading would have done just as well.

Level 5: Beats Random

The question it answers

Does it outperform a purely random trading strategy?

Why this matters

What we check

The strategy's return must rank in the top 5% of 1,000 random strategies tested on the same data.

What failure looks like

A strategy that scores at the 50th percentile or below — meaning random trading would have done just as well.

Level 6: Works in All Markets

The question it answers

Does it hold up in rising, falling, and sideways markets?

Why this matters

Markets spend time in three environments: rising (bull), falling (bear), and going sideways. A strategy that only makes money when markets are rising is not a strategy — it is simply leveraged exposure to the market. Level 6 tests whether the strategy generates value across all three environments.

What we check

The backtest is divided into bull, bear, and sideways periods. The strategy must show positive performance in at least two of the three environments.

What failure looks like

A strategy that makes money in bull markets and loses it in bear markets. This is the most common failure mode for strategies that look good on paper but fail when they're needed most.

Level 6: Works in All Markets

The question it answers

Does it hold up in rising, falling, and sideways markets?

Why this matters

What we check

The backtest is divided into bull, bear, and sideways periods. The strategy must show positive performance in at least two of the three environments.

What failure looks like

A strategy that makes money in bull markets and loses it in bear markets. This is the most common failure mode for strategies that look good on paper but fail when they're needed most.

Level 7: Risk Is Acceptable

The question it answers

Are the returns good enough and the losses manageable?

Why this matters

A strategy that doubles your money over five years sounds great — until you learn it also cut your account in half along the way. Most people cannot hold through a 50% loss. Level 7 evaluates both the quality of returns and the severity of the worst losses, using four metrics that institutional investors use to compare strategies across asset classes.

What we check

Sharpe ratio, Calmar ratio, CVaR, and a Monte Carlo stress test simulating thousands of possible futures.

What failure looks like

Either the returns are too weak to justify active management, or the worst losses are so deep that most real traders would abandon the strategy before it recovers.

Level 7: Risk Is Acceptable

The question it answers

Are the returns good enough and the losses manageable?

Why this matters

What we check

Sharpe ratio, Calmar ratio, CVaR, and a Monte Carlo stress test simulating thousands of possible futures.

What failure looks like

Either the returns are too weak to justify active management, or the worst losses are so deep that most real traders would abandon the strategy before it recovers.

Level 8: Passes the Future Test

The question it answers

Does it work on data it has never been tested on?

Why this matters

This is the final and most important test. We split the historical data into two parts: training data (the past) and testing data (a period the strategy has never seen). The strategy is built using only the training data, then evaluated on the testing data. This simulates what actually happens when you trade a strategy live — the future is always data the strategy has never seen.

What we check

We roll this train-then-test procedure across multiple windows. The performance on unseen testing data must be at least 50% as strong as on the training data.

What failure looks like

A strategy that performs brilliantly on the data it was built on, but collapses when tested on new data. This is overfitting — the single most common reason great-looking backtests fail in real trading.

Level 8: Passes the Future Test

The question it answers

Does it work on data it has never been tested on?

Why this matters

What we check

We roll this train-then-test procedure across multiple windows. The performance on unseen testing data must be at least 50% as strong as on the training data.

What failure looks like

The Verdict

The verdict.

After all 8 levels, every strategy carries one of three statuses — set automatically by code, not by human judgment.

Status	What it means	Suitable for
PASS	All 8 levels cleared. The backtest is honest, the edge is proven, and the risk profile is acceptable.	Live trading consideration with appropriate position sizing.
ADMISSIBLE	Close but not there yet. One or more non-critical levels did not pass. The strategy shows genuine promise but has documented weaknesses.	Educational study and paper trading. Understand the failure before committing capital.
DID NOT PASS	A critical level failed, or too many tests failed. The strategy has a fundamental flaw that cannot be fixed by minor adjustments.	We document the failure honestly and move on. Failures are data — they tell us what does not work.

Every strategy on QS Lab — whether it passed or failed — carries a full validation report. You can see exactly which levels passed, which failed, and by how much.

What We Cannot Promise

Honesty over comfort.

Validation means the backtest was constructed and evaluated honestly. It does not mean the future will cooperate.

—A strategy can pass all 8 levels and still lose money.
—Past performance is not a guarantee of future results.
—Backtests are not the same as live trading.
—Our validation is a starting point, not a finish line.

We cannot guarantee performance. But we can guarantee that every strategy you see on QS Lab earned its place.

How QS Lab Is Different

How QS Lab is different.

Most platforms	QS Lab
Show you a backtest and leave you to trust it	Show you a validated backtest — and exactly how it was validated
Apply the same statistics to every strategy	Apply the right tests for each strategy type
Emphasise performance — hide the risk	Require every strategy to include a failure narrative
Binary pass/fail with no explanation	Graduated scoring with a public report
Set and forget	Ongoing monitoring as live data accumulates

Ready to see what passes?

Get Early Access

Early access. No cost. No credit card. Your input shapes what we build.