From Data to Edge | Trading Glass

Journaling is only powerful if it leads to decisions. Here’s how to turn your data into focused, confident upgrades.

Introduction

Journal review is the structured process of converting recorded trades into testable hypotheses about your edge — and rejecting the ones the data cannot support. It happens at three levels: system EV, setup segmentation, and execution behavior. This lesson walks all three, and names the statistical traps at each.

Prereq: this lesson assumes you have completed Trader Journaling OS and have setup tags, execution scores, and emotion tags in your log. If those columns do not exist, do that lesson first — this one will be useless without them.

You’ve logged 200 trades. You open the spreadsheet. Within five minutes you will have invented three theories the data does not support. This lesson is the protocol that stops that.

You have tags, metrics, EVs, win rates
You know which setups are performing
You’ve spotted discipline errors and emotional triggers

Now what?

This post shows you how to review your data like a strategist — and make adjustments without overreacting or falling into perfectionism.

Three Traps This Lesson Is Built to Avoid

Before you start, name the failure modes:

Small-sample storytelling — “I do better in the morning” with N=4 is not a finding; it is a coin flip.
Confounded segments — emotion tags correlate with outcome because you assigned them after the trade.
p-hacking yourself — slice the journal 20 ways and one slice will look like edge by chance alone.

The rest of the lesson is built around these three traps. If a finding cannot survive them, it does not earn a rule change.

The 3 Levels of Journaling Review

Level 1 – System Check

“Is my edge still statistically intact?”

Ask:

Is EV still positive over the last 50–100 trades?
Is win rate within your expected range?
Is your drawdown within tolerance?

If yes: stay the course. If not: time to investigate deeper.

Level 2 – Setup Filtering

“Which trades are carrying my performance?”

Sort journal by setup tag (Excel/Google Sheets pivot table on setup_tag x R_outcome; Notion grouped-by-tag view; or df.groupby('setup_tag')[['R','win']].agg(['mean','count','sem']) in pandas — the sem column is your noise floor):

Setup	Win Rate	EV	Trades	N adequate?	Decision
Liquidity Sweep	52%	+0.6R	45	yes (at least 30)	scale
Trend Continuation	40%	-0.2R	33	yes (at least 30)	pause + forward-track
VWAP Fade	65%	+0.8R	29	borderline	hold, do not scale

Rule: do not drop or scale a setup with fewer than 30 closed trades, and require the EV difference between two setups to exceed roughly 2× the standard error before acting. At N=33, a 40% win rate has ±~8.5% noise — that “underperformer” may be a coin flip away from your “winner.”

Pause underperforming setups (do not delete them — forward-track on paper). Scale setups whose edge survives the sample-size test.

Level 3 – Execution & Behavior Check

This level is the most informative and the most easily abused. Emotion and execution tags are usually assigned after you know the outcome, so they encode the result, not the cause. Treat Level 3 as hypothesis generation, not proof: a finding here becomes a forward-tracked rule (tag before entry next week) before it earns the right to change your system.

“Am I the reason my system isn’t working?”

Filter by:

Execution Score (Perfect vs Hesitant vs Impulse)
Error Tags
Emotion Tags

You may find:

Perfect execution EV

Per-trade expectancy when the plan was followed exactly.

+0.7R

Impulse trade EV

Per-trade expectancy when discipline broke down.

-0.6R

Losers from poor discipline

Share of losing trades driven by behavioral leak rather than setup failure.

80%

Caveat: emotion and execution tags are self-assigned after you know the outcome. Losers feel impulsive in retrospect; winners feel disciplined. Before concluding “execution is my edge,” check whether tags were entered before the trade closed, and whether impulse trades cluster on specific setups or sessions — confounders, not causes.

Your edge may be in your execution habits — Level 3 is the most likely level to mislead you. Treat every Level 3 finding as a hypothesis to forward-track for the next 30 trades, not as a conclusion.

Weekly Review Framework (15–30 min)

Performance Overview

EV / Win rate for past 20–50 trades
Drawdown range
Most/least profitable setups

Execution Analysis

How many trades followed your plan 100%?
Which mistakes repeated? (Late entry, early exit, overtrading?)

Setup Filter

Tag 1–2 setups to “pause”
Tag 1–2 to “prioritize” next week

Micro-Adjustment Plan — each adjustment names the segment that triggered it

“Trend Continuation EV is –0.2R over 33 trades, and 11 of 13 losers entered within the first 90s of the bar → require 5-min close confirmation on Trend Continuation only”
“Stop taking setups during low liquidity (sub-1m volume bucket) — that bucket is –0.4R over 41 trades”
“Only trade my A+ setup after 2 losses (tilt cluster: trades 3–6 after a loss streak run –0.5R over 22 occurrences)”

✍️ Journal Reflection Prompt

What was my biggest mistake this week?
What’s the one habit that would improve everything if I fixed it?

Optional: A/B Test Changes with Journaling

Instead of replacing your strategy:

Test changes in parallel.

Split setups: “original” vs “adjusted”
Track both for at least 100 trades per arm
Let performance confirm (or deny) your change

With 30–50 trades per side you can only detect very large effects (>0.4R EV gap). Smaller, realistic improvements (0.05–0.15R) need 200+ trades per arm or you are flipping coins.

This avoids overfitting and gives you data-based evolution. (For the underlying problem in machine-learning backtests, see López de Prado, Advances in Financial Machine Learning, ch. 11–12 — the same multiple-testing trap applies to a trader running 20 segment splits on the same journal.)

FAQ

Should I drop a trading setup with negative EV?

Pause it, do not delete it. A negative EV over only ~30 trades may still be noise — pull the setup from live trading and forward-track it on paper for another 30 before deciding.

What is the difference between a system review and an execution review?

A system review (Level 1) asks whether your overall edge is statistically intact across the last 50–100 trades. An execution review (Level 3) asks whether you are following the system — segmenting trades by execution score, error tags, and emotion tags to surface behavioral leaks.

How long should a weekly trading review take?

15–30 minutes, run as a fixed five-step framework: performance overview, execution analysis, setup filter, micro-adjustment plan, and a journaling reflection prompt.

What metrics should I review weekly?

EV and win rate over your last 20–50 trades, drawdown range, your most and least profitable setups, and the share of trades that followed your plan 100%.

After one cycle of findings, feed them into Building a Tiered Risk Model so position size reflects the segments you trust. Cross-link Level 3 work back to Behavioral Risk Management when the findings concern tilt or impulse patterns.

Final Thought

Your trading journal is a lab — not a graveyard of mistakes.

The discipline is not in finding patterns. It is in killing the ones that fail the sample-size test, even when the story is good. One pause, one prioritize, one forward-tracked rule per week — and a willingness to reverse any of them next week if 30 more trades disagree.