EDUCATION

What Surprised Us This Month: Counterintuitive Patterns in Our Data

Two asset categories — stocks and commodities — badly underperformed our naive baseline this month, and the margins were wide enough to demand explanation.

BY SHAKER ABADY

PUBLISHED APR 18 ·7 MIN READ ·69 VIEWS

EDUCATION · APRIL 18, 2026

Two asset categories — stocks and commodities — badly underperformed our naive baseline this month, and the margins were wide enough to demand explanation. · STOCKS365 / SA

When the Model Expects One Thing and Gets Another

Every quantitative framework carries embedded assumptions. Ours are no different. We build naive baselines — simple, rules-based expectations of what a category should return given historical behavior — and then we track how far reality drifts from those expectations. Most of the time, the drift is noise. This month, it wasn't. Two categories in our live signal and backtest system deviated from their naive baselines by margins that cleared our 15% threshold for material anomaly: stocks and commodities. Neither moved in the direction we would have anticipated. Both underperformed, and both did so by enough that we owe our readers a direct accounting of what happened, what it might mean, and — critically — what we genuinely don't know yet.

Stocks365 Research · ML-Discovered Pattern

▲ BULLISH SETUP

76.8%

historical win rate

Setup conditions (all must hold):

is_long > 0.00atr_pct > 6.52dist_sma50 ≤ -0.94williams_r > -94.67

1,611

signals tested

1.5%

coverage

54%

ML confidence

📊 See full research on our Insights page · Based on real backtest data from Stocks365

Stocks365 Research

This report does not attempt to manufacture a tidy narrative around messy data. It presents the numbers as they are, examines the most plausible structural explanations, and flags the limits of what a small-sample observation can actually tell us. Honesty here is not a disclaimer. It is the methodology.

The Numbers That Broke Our Priors

Let's start with what the data actually says, because the raw figures are striking on their own terms.

Category	Signals (n)	Win Rate	Avg Return	Deviation from Naive Baseline
Stocks	8	25.0%	-1.05%	-25.0%
Commodities	6	16.7%	-2.19%	-33.3%

In the stocks category, we ran 8 signals this month. Three of those resolved as wins. Five did not. That produces a win rate of exactly 25.0% — one quarter of trades moving in the anticipated direction. The average return across all 8 positions came in at -1.05%. Against our naive baseline, that represents a -25.0 percentage point deviation. To be clear about what that deviation figure means: it is not a return figure. It is the gap between what our baseline model predicted the win rate would be and what the win rate actually was. A 25-point shortfall in a category as broadly diversified as equities is not something we can attribute to one bad trade or one bad sector week. It is systemic within this sample.

Commodities told a sharper story. Six signals. One resolved as a win. That's a win rate of 16.7% — or more viscerally, one-in-six. The average return across those six positions was -2.19%, the worst absolute figure in the dataset. The deviation from naive baseline stood at -33.3 percentage points. If the stocks number raised a flag, the commodities number raised a different kind of alert entirely. A one-in-six win rate in a category that includes instruments with idiosyncratic supply-demand dynamics, geopolitical sensitivity, and seasonal factors suggests that either our signals are picking up false patterns in the current regime, or the regime itself has shifted in a way our baseline hasn't yet priced in.

Structural Explanations Worth Taking Seriously

We resist the temptation to assign a single cause. But several structural explanations are worth thinking through carefully, even if we cannot validate them with the data in hand.

On stocks: A 25% win rate means our directional signals were wrong three times for every one time they were right. One plausible explanation is mean-reversion resistance — a market environment in which momentum signals or value-based entries keep failing because the underlying price action is being driven by macro flows rather than security-level fundamentals. When liquidity rotates at the index level, individual stock signals lose predictive power, and win rates compress toward or below the coin-flip threshold. Our signals fell well below that threshold. That's notable.

A second explanation involves signal crowding. If our model shares structural logic with a large portion of systematic strategies running similar factor exposures, then in a month where institutional capital is reallocating, those shared signals get hit simultaneously. We cannot confirm this from our data alone, but it is consistent with the magnitude of the underperformance.

On commodities: The -33.3 percentage point deviation from naive baseline is harder to explain away. Commodities markets are typically less crowded at the systematic signal level than equities, which makes a crowding explanation less satisfying. More plausible here is a volatility regime change — specifically, a period where spot and futures prices behaved in ways inconsistent with the historical patterns our backtest was trained on. Whether that reflects dollar-driven pressure, supply disruption noise, or positioning extremes unwinding, we don't know. What we do know is that a 16.7% win rate on six signals is an outlier that demands we revisit the assumptions baked into our commodities baseline.

It is also worth noting that commodities signals, by nature, tend to carry wider return distributions than equity signals. A single large loser in a six-trade sample can significantly damage both the win rate and the average return figure simultaneously. The -2.19% average return across those six trades, while negative, is not catastrophic in absolute terms — but combined with the win rate, it suggests consistent small losses rather than one catastrophic outlier. That pattern is arguably more informative than a single blowup would be.

What This Data Does Not Support

It would be easy to over-read a dataset this small. We are not going to do that. There are several conclusions that the data explicitly does not support, and naming them is as important as naming the findings themselves.

This is not statistically significant. With n=8 in stocks and n=6 in commodities, we are well below any threshold that would allow us to claim statistical significance with confidence. A 25% win rate on 8 trades could occur by chance with a true underlying win rate of 50%. The probability is low but non-negligible. We are not claiming these numbers prove anything. We are claiming they are worth watching.
We cannot generalize to asset classes broadly. Our signals are specific in their construction. Poor performance in our stocks signals this month says nothing definitive about equities as an asset class, and poor performance in our commodities signals says nothing definitive about commodity markets in aggregate. These are our signals, in our framework, over this period.
We do not have enough data to distinguish a regime shift from a bad month. This is the most important caveat. One month of underperformance — even sharp underperformance — can happen within a normally functioning strategy. We need more observations before we can say with confidence that something structural has changed. We don't know yet. That is an honest statement, not a hedge.
The baseline deviation figures do not measure alpha loss. A -25 or -33.3 point deviation from naive baseline tells us the signals are missing their targets. It does not tell us whether a passive or benchmark approach would have performed better over the same period. We are measuring internal signal accuracy, not relative performance against an index.

What Traders Should Actually Do With This

Data like this is most useful when it changes behavior at the margin. Here are the specific, actionable considerations we'd recommend, calibrated to the actual evidence rather than to the worst-case interpretation of it.

Reduce position sizing in both categories until the signal quality clarifies. This is not a call to exit. It is a call to acknowledge uncertainty with position sizing. When win rates drop this far below baseline expectations — 25% in stocks, 16.7% in commodities — the expected value of any individual trade in that category declines meaningfully. Smaller positions preserve optionality and reduce the cost of being wrong during a potentially transitional period.

Monitor the next 10-15 signals in each category with particular attention. The signal quality question will not resolve itself in a week. But the next month of observations, added to this month's data, will begin to tell us whether we are looking at random variance or something more persistent. Traders who track this systematically will be better positioned to act decisively when the picture clarifies — in either direction.

Do not abandon the framework on the basis of this data alone. This is worth stating explicitly, because the instinct after a bad month is sometimes to over-correct. Two categories with small sample sizes underperforming their baselines is not a reason to scrap a quantitative system. It is a reason to examine it carefully, ask hard questions, and watch what comes next. Reactive abandonment of process is how discretionary errors compound quantitative ones.

For commodities specifically: Given the more severe deviation (-33.3 points) and worse absolute returns (-2.19%), we would apply tighter risk parameters here than in stocks. If we are in a regime where commodities signals are generating one win per six trades, even reduced-size exposure may not be justified until we see evidence of recovery. Selective avoidance of new commodities signals for the next two to three weeks is a defensible posture, not a permanent one.

Methodology: What We Measured and Where the Edges Are

Our signal system generates directional calls across asset categories based on a combination of quantitative factors built and tested on historical data. The naive baseline represents the expected win rate for each category under average conditions — it is not a random 50/50 coin flip; it is calibrated to the historical behavior of our specific signal type in each category. Deviations are calculated as the difference between actual win rate and baseline win rate, in percentage points. The data reported here covers signals resolved within the current monthly measurement window as of April 18, 2026. Total sample sizes are small: 8 signals in stocks, 6 in commodities. These are not large enough to support strong inferential claims. We report them because the deviations crossed our internal materiality threshold of 15 percentage points, not because we believe they are definitive. Average return figures reflect the mean percentage return across all signals in each category, including both winners and losers, and are not annualized. This is preliminary data. Treat it accordingly.

research-report:surprise_findingsresearch-reportquantitative-analysisstockscommoditiessignal-performancerisk-management

SHAKER ABADY

EDITOR-IN-CHIEF & FOUNDER · STOCKS365

Editor-in-Chief & Founder at Stocks365. 10+ years in financial markets, technical analysis, and algorithmic trading. Oversees editorial standards and platform content quality.