The math behind every screen.
A self-contained derivation of every model used in this Lab. Equations are rendered in KaTeX and link directly to the modules where they run live. Read top-to-bottom or skip — the sections are independent.
Why a pairs trade has a chance
The premise: two assets driven by a shared stochastic factor will, in equilibrium, walk together. If we can identify a stable linear combination that mean-reverts, we can fade short-term divergences and earn a small return that does not require a directional view.
A single asset price is, to a first approximation, a martingale. The right thing to bet on is not its level but a deviation from a relationship. Engle and Granger (1987) formalised the idea: two processes that are individually (integrated of order 1) can have a linear combination that is (stationary). The vector is then called the cointegration vector and the residual is the spread we trade.
Sector neutrality is the practitioner's lever for keeping the relationship stable: if we pick A and B from inside the same industry, both prices share a sector factor and most macroeconomic shocks cancel. What is left to bet on is the idiosyncratic disagreement between A and B — usually a small, mean-reverting process.
Cointegration via the Engle-Granger two-step
Estimate the long-run coefficient by OLS, then test the residuals for a unit root using ADF — but compare the test statistic to cointegration-specific critical values, not standard Dickey-Fuller.
Step 1 — OLS regression in levels:
Under cointegration the OLS estimator is super-consistent: it converges to the true long-run coefficient at rate rather than the usual , even when the regressors are I(1). That is what makes a simple rolling regression a viable real-time estimator.
Step 2 — ADF on the residuals:
Under the residual has a unit root and A and B are not cointegrated. The test statistic is the t-ratio on . Critical values come from MacKinnon (2010) response surfaces — different from the standard Dickey-Fuller table because is estimated, not observed:
For two variables with a constant (case "c"): versus the plain ADF . The shift is real and ignoring it inflates false-positive cointegration calls. The Lab uses the cointegration CVs in cointegration.ts and the standard ADF CVs in adf.ts.
ADF, intuition first
The Augmented Dickey-Fuller test asks one question: when the spread drifts away from its mean, does it pull itself back?
Consider an AR(1) process . Subtract from both sides:
If then : a high implies a negative expected change next bar — the spring pulls. If then : no pull, random walk. The augmented form adds lagged differences to soak up serial correlation in the noise:
The Lab uses MacKinnon's (1996) sample-size adjusted critical values, so a 60-bar window and a 1200-bar window are not held to the same threshold.
Hedge ratio: OLS, rolling, Kalman
Three estimators for the same β, each with a different view of how stable the relationship is over time.
Static OLS. The all-sample point estimate. Useful as a baseline, but assumes the cointegration coefficient is constant — which is rarely true once you cross months of data:
Rolling OLS. A window of length slides forward; we recompute using only observations in the window . Easy to reason about, breaks gracefully when the relationship shifts, and is what the Lab uses by default.
Kalman filter. Treat the hedge ratio itself as a state that evolves slowly. Let with state and observation equations:
The standard recursion produces a posterior mean and covariance for the state at every bar. The process-noise scale controls adaptivity: a small keeps β nearly constant; a larger lets it follow regime shifts but at the cost of estimation noise. Predict-then-update:
Elliott, van der Hoek & Malcolm (2005) is the textbook starting point for a state-space approach to mean-reverting spreads; Chan (Algorithmic Trading, 2013) gives the implementation that this Lab follows.
The z-score signal
A single-line trading rule: enter when the spread is far from its rolling mean in standard-deviation units, exit when it returns.
A long-spread entry triggers when and unwinds when ; the short-spread rule is the mirror image. Hard stops cap the worst case: a stop-loss when (the relationship has probably broken) and a time-stop after bars (the mean reversion was supposed to have happened by now). The Backtest Studio surfaces all four levers as sliders.
Ornstein-Uhlenbeck, half-life, and 'fast enough to trade'
Why some cointegrated pairs are still useless: their spread mean-reverts on a timescale that exceeds your patience.
The continuous-time OU process is the canonical mean-reverting SDE:
Discretise at unit step and run an AR(1) regression on the spread: . Then and the half-life of mean reversion is
A spread with bars closes half its gap in a week and is highly tradable; a spread with bars takes nearly a year and dies on costs. The Pair Lab plots the half-life and the Portfolio module screens out pairs whose half-life exceeds a user threshold.
Optimal entry/exit: Bertram bands
Given the OU parameters and a fixed cost, what entry threshold maximises expected profit per unit time?
Bertram (2010) derives a closed-form expression for expected return per unit time as a function of the entry level for a symmetric OU strategy:
The expected first-passage time involves imaginary error functions, but the qualitative result is intuitive: if costs are small, the optimum is a tight band of about ; as costs rise the optimum widens toward and beyond. The Lab plots a numerical scan instead of the special-function form so the intuition is preserved without machinery.
Risk parity sizing across pairs
Each pair contributes the same fraction of total portfolio variance. Sizing pairs by capital almost always overweights the noisiest one.
Maillard, Roncalli & Teiletche (2010) define the equal-risk-contribution (ERC) portfolio as the with satisfying:
For uncorrelated strategies this collapses to inverse-volatility weighting ; for correlated strategies the iterative solver in risk/sizing.ts finds the fixed point. The Lab reports both.
β-hedging vs dollar-neutrality
Two notions of 'neutral'. They are not the same and the difference matters during a sell-off.
Dollar-neutral: equal dollar long, equal dollar short. Gross exposure is 100%, net dollar exposure is zero. Simple and symmetric, but the residual is exposed to whatever β-loading the two legs have to the broader market.
β-hedged: for every $1 long in A, short $β in B, where β is the cointegration coefficient (often ≈ 1 for sector pairs but not always). Net market exposure is approximately zero, even if dollar exposure is asymmetric. Preferable when the two legs have different market betas.
Both modes are available in the Backtest Studio. The Portfolio page reports residual β-to-market for the full book of pairs and flags drift away from sector neutrality.
Amihud illiquidity
Even a perfect signal is worthless if the market cannot absorb your order. Amihud (2002) gives the cleanest single-number proxy.
Higher ILLIQ means a one-dollar trade moves the price more — i.e., a thinner book. The Portfolio module screens out pairs whose worse leg sits above a percentile threshold, on the principle that any edge gets eaten by impact and slippage in illiquid names.
Walk-forward methodology
The minimum acceptable backtest discipline: parameters chosen on data the strategy never sees during execution.
Pure in-sample testing is worthless because every parameter — the z-entry, the lookback, the stop-loss — is implicitly chosen with knowledge of the future. Walk-forward fixes this by splitting the sample into a sequence of (training, test) windows. Models are fit on training only; performance is recorded on test only; the window rolls forward.
A second-best, lighter alternative — and the default in the Lab — is rolling estimation: the hedge ratio and z-score moments are re-estimated on every bar using only data up to that bar. This produces an honest equity curve at the cost of some statistical inefficiency in the early part of the sample.
Do & Faff (2010, 2012) showed that pairs strategies identified by Gatev et al. delivered shrinking returns post-2002 once realistic costs and walk-forward selection were imposed. The Lab enforces both by default.
Johansen and the VECM — when two assets is not enough
Engle-Granger asks 'is there a single cointegration vector, given a fixed regressor?' Johansen asks 'how many cointegration vectors exist among k variables?' For k = 2 they should agree; for k ≥ 3 only Johansen is correct.
Stack the levels into and write the vector autoregression in error-correction form (VECM):
The rank of equals the number of cointegration relations. Decomposing , the columns of are cointegration vectors; the rows of are loadings: how fast each variable adjusts back toward equilibrium. The latter is the practical pay-off — Johansen tells you which leg leads and which leg lags.
Johansen's trace test compares
to Osterwald-Lenum (1992) critical values. The eigenvalues come from a generalised eigenvalue problem on residual second-moment matrices; the Lab implements the 2-variable case in johansen.ts and reports both the trace and max-eigenvalue statistics on the Methods page.
KPSS — the test with the opposite null
A pair that 'fails to reject the unit-root null' under ADF is consistent with stationarity but does not prove it. KPSS reverses the null and asks for direct evidence of stationarity.
Kwiatkowski, Phillips, Schmidt & Shin (1992) construct a Lagrange-multiplier statistic
where is the long-run variance estimated with a Newey-West kernel. Under the null of stationarity, is bounded; under a unit-root alternative, the cumulative sum drifts and the statistic explodes. Critical values: . The cleanest pair-trading workflow uses ADF and KPSS together: a pair worth trading both rejects the unit-root null (ADF) and fails to reject the stationarity null (KPSS).
Variance ratio — Lo & MacKinlay
A different angle on the same question: do increments compound like a random walk, or do they cancel?
Lo & MacKinlay (1988) propose
Under a random walk, . A value below 1 implies negative serial correlation — what we want from a tradable spread. The Lab reports the heteroskedasticity-robust z-statistic across multiple horizons , because a single horizon can be misleading when the spread mean-reverts at a specific timescale.
Hurst exponent — the self-similarity slope
A geometric view of persistence. R/S plots are simple, robust, and intuitive.
Take the spread, split it into non-overlapping chunks of length , compute the rescaled range for each. Then , and is the slope of on :
- : random walk.
- : anti-persistent / mean-reverting (what we want).
- : trending — bad news for a pairs strategy.
Hurst is best used as a sanity check: tradable spreads almost always show ; if your "cointegrated" pair returns 0.55, something is wrong.
CUSUM — when the relationship breaks
Brown, Durbin & Evans (1975) — the simplest tool for catching a regime change before the strategy bleeds out.
Compute the standardised cumulative sum of recursive residuals, . Under the null of stable parameters, is approximately Brownian motion; bands grow linearly. An excursion outside the bands means the parameter you assumed was constant has shifted. The Lab plots the CUSUM with bands at the 5% level on the Methodspage — the "Broken" pair (PYRE-VALE) is constructed specifically so the bands are crossed at the regime break.
Distance method — the original empirical recipe
Gatev, Goetzmann & Rouwenhorst (2006) — no parametric assumptions, no β, no ADF. Just normalise prices and trade divergences.
Take a 12-month formation window, normalise both prices to start at 1, compute the sum of squared distances . Pick the minimum-SSD pair (or use a ranked list) and trade in the next 6-month window. Open when the normalised spread diverges by and close when it returns to zero.
The strength of the method is robustness — there are no estimated parameters to over-fit. Its weakness, traced cleanly by Do & Faff (2010, 2012), is that it implicitly assumes ; it under-performs when α ≠ 1 and is sensitive to the choice of normalisation reference point. The Lab runs it head-to-head with the cointegration approach on the Strategies page.
Avellaneda-Lee s-score — the equilibrium standardisation
Where the rolling z-score asks 'is the spread far from its recent mean?', the s-score asks 'is the spread far from its OU equilibrium?'
Fit an Ornstein-Uhlenbeck process to the spread, recover , and define
Avellaneda & Lee (2010) propose and . The big advantage: signals are calibrated to the SDE, not to short-window noise. The big disadvantage: when drift, the s-score keeps using stale parameters and either over- or under-trades. The Strategies page runs this side-by-side with the rolling z-score on the same pair.
Risk metrics past Sharpe — VaR, CVaR, Ulcer, Pain, Sterling
Sharpe assumes Gaussian returns, equal sensitivity to upside and downside, and ignores path. Each of the metrics below relaxes a different assumption.
VaR / CVaR. is the α-quantile of the return distribution; (Rockafellar-Uryasev, 2002) is the average loss in the tail beyond it. CVaR is a coherent risk measure; VaR is not.
Ulcer Index (Martin & McCann 1989) measures depth and duration of drawdowns:
Pain ratio = annual return / Ulcer Index. Sterling ratio = annual return divided by the average annual maximum drawdown — smoother than Calmar, which depends on a single worst-DD point.
The Risk Lab computes all of these alongside the standard Sharpe / Sortino / Calmar trio and a stationary-bootstrap CI on the Sharpe.
Stationary bootstrap — honest standard errors for serially-correlated returns
Naive bootstrap on serially-correlated returns under-estimates standard errors. The block bootstrap fixes this; the stationary bootstrap goes further by randomising block lengths.
Politis & Romano (1994) draw block lengths from a geometric distribution with mean , paste blocks together to length , and re-compute the statistic of interest on each resample. Random block lengths preserve the strong-mixing property of the original series so the implied distribution of the Sharpe is asymptotically valid even when returns are not iid.
A 95% bootstrap CI on the Sharpe is the simplest, most under-used antidote to data-mined backtests. A "Sharpe of 1.2" with a 95% CI of [−0.4, +2.5] is not what it looks like.
Capacity and execution — the cost curve nobody puts in their backtest
Almgren-Chriss style square-root impact: doubling the order doesn't double the cost — it multiplies it by ~1.4.
Real execution costs scale as , where is the order size, is daily volume, is daily volatility and is a market constant near 1 (Almgren et al. 2005). The Lab's default flat-bps cost is a first approximation; the Portfolio page surfaces a per-pair capacity proxy of 1% of mean dollar volume on the worse leg, and the Theory page recommends imposing a square-root impact term once the strategy is sized to a real book.
Half-life and capacity interact: a fast-mean-reverting pair lets you turn over inventory inside the impact-decay window, so realised costs converge toward the spread crossing rather than the full impact curve.
Where this Lab sits — and what it deliberately does not do
An honest map of scope.
The Lab covers, end to end:
- Single-pair cointegration via Engle-Granger, ADF and Johansen.
- Three hedge-ratio estimators (static OLS, rolling OLS, Kalman).
- Two complementary stationarity tests (KPSS) and two persistence diagnostics (VR, Hurst).
- Two regime-stability tools (rolling ADF p-value, Brown-Durbin-Evans CUSUM).
- Three trading recipes (cointegration z-score, distance method, OU s-score).
- Walk-forward backtesting with realistic costs, drawdown halt, time-stops.
- Risk-parity, β-hedged, dollar-neutral sizing.
- Full extended-risk panel + stationary-bootstrap CIs on Sharpe.
It explicitly does not yet cover:
- Copula-based pair dependence (Liew & Wu 2013, Krauss & Stübinger 2017).
- Hidden Markov / regime-switching models for breakdown detection.
- Cross-sectional PCA on a real equity universe (Avellaneda-Lee in full).
- Multi-leg baskets via Johansen with k ≥ 3 (the trace test scales, the rest needs UI).
- EM-based Kalman tuning for δ.
- Live Almgren-Chriss execution scheduling.
Each of these is a natural next module — the Lab's structure separates math, data and UI cleanly so a pull request adding e.g. copula pair-trading is local to two files.
References
The papers and books cited above, in alphabetical order.
- Almgren, R., Thum, C., Hauptmann, E. & Li, H. (2005). Direct estimation of equity market impact. Risk 18, 57–62.
- Amihud, Y. (2002). Illiquidity and stock returns: cross-section and time-series effects. Journal of Financial Markets 5, 31–56.
- Avellaneda, M. & Lee, J.-H. (2010). Statistical arbitrage in the U.S. equities market. Quantitative Finance 10, 761–782.
- Bertram, W. K. (2010). Analytic solutions for optimal statistical arbitrage trading. Physica A 389, 2234–2243.
- Brown, R. L., Durbin, J. & Evans, J. M. (1975). Techniques for testing the constancy of regression relationships over time. JRSS-B 37, 149–192.
- Chan, E. (2013). Algorithmic Trading: Winning Strategies and Their Rationale. Wiley.
- Do, B. & Faff, R. (2010). Does simple pairs trading still work? Financial Analysts Journal 66, 83–95.
- Do, B. & Faff, R. (2012). Are pairs trading profits robust to trading costs? Journal of Financial Research 35, 261–287.
- Elliott, R., van der Hoek, J. & Malcolm, W. (2005). Pairs trading. Quantitative Finance 5, 271–276.
- Engle, R. & Granger, C. (1987). Co-integration and error correction: representation, estimation, and testing. Econometrica 55, 251–276.
- Gatev, E., Goetzmann, W. & Rouwenhorst, G. (2006). Pairs trading: performance of a relative-value arbitrage rule. Review of Financial Studies 19, 797–827.
- Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Trans. ASCE 116, 770–808.
- Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12, 231–254.
- Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in Gaussian VAR models. Econometrica 59, 1551–1580.
- Krauss, C. (2017). Statistical arbitrage pairs trading strategies: Review and outlook. Journal of Economic Surveys 31, 513–545.
- Kwiatkowski, D., Phillips, P. C. B., Schmidt, P. & Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics 54, 159–178.
- Liew, R. Q. & Wu, Y. (2013). Pairs trading: a copula approach. Journal of Derivatives & Hedge Funds 19, 12–30.
- Lo, A. W. & MacKinlay, A. C. (1988). Stock market prices do not follow random walks. RFS 1, 41–66.
- MacKinnon, J. (1996). Numerical distribution functions for unit root and cointegration tests. Journal of Applied Econometrics 11, 601–618.
- MacKinnon, J. (2010). Critical values for cointegration tests. Queen's Economics Working Paper 1227.
- Maillard, S., Roncalli, T. & Teiletche, J. (2010). The properties of equally weighted risk contribution portfolios. Journal of Portfolio Management 36, 60–70.
- Martin, P. G. & McCann, B. B. (1989). The Investor's Guide to Fidelity Funds. Wiley.
- Osterwald-Lenum, M. (1992). A note with quantiles of the asymptotic distribution of the maximum likelihood cointegration rank test statistics. Oxford Bulletin of Economics and Statistics 54, 461–472.
- Politis, D. N. & Romano, J. P. (1994). The stationary bootstrap. JASA 89, 1303–1313.
- Rockafellar, R. T. & Uryasev, S. (2002). Conditional value-at-risk for general loss distributions. Journal of Banking & Finance 26, 1443–1471.
- Stübinger, J. & Endres, S. (2018). Pairs trading with a mean-reverting jump-diffusion model on high-frequency data. Quantitative Finance 18, 1735–1751.
- Vidyamurthy, G. (2004). Pairs Trading: Quantitative Methods and Analysis. Wiley.