Skip to content

Correlation Engine

The correlation engine answers a single question: across a peer set, which financial metrics co-move most strongly with total shareholder return? EqtyTrk computes both Pearson and Spearman correlations for every metric in the library, against a caller-selected TSR target variant, and returns them ranked by absolute Pearson magnitude. The target variant controls how much of the return signal reflects market-wide or sector-wide forces versus company-specific outcomes.

Statistical estimators

Pearson correlation

For a metric vector x and TSR vector y across n peers:

rxy=i=1n(xix¯)(yiy¯)i(xix¯)2i(yiy¯)2

Pearson measures the strength of the linear relationship. r2 (reported as r_squared) gives the fraction of variance in y explained by x under a linear model.

Spearman correlation

ρ=16idi2n(n21)

where di is the rank difference between xi and yi. Spearman is equivalent to Pearson applied to the ranked data. It is more robust to outliers and captures monotone non-linear relationships. For equity fundamentals — where extreme outliers in leverage or valuation multiples are common — Spearman is often the more interpretable number. When ranks contain ties, scipy's implementation handles them via average-rank Pearson rather than the simplified no-ties form above.

p-value

The p-value reported is the two-tailed p-value for the Pearson coefficient under the null hypothesis H0:r=0. The test statistic is:

t=rn21r2tn2

EqtyTrk uses scipy.stats.pearsonr for this computation, which returns both r and the two-tailed p-value simultaneously. A p-value below 0.05 is conventionally read as "the observed correlation is unlikely to be a sampling artifact at n data points," but peer sets are small (typically 20–80 names) and multiple comparisons are implicit (60+ metrics tested simultaneously). Interpret p-values as a signal-to-noise guide, not a formal significance claim.

Missing-data handling

Peers with either a missing metric value or a missing TSR are dropped from that metric's correlation calculation. Each metric therefore has its own effective sample size n, which is reported in the n field of CorrelationRow. A metric with n<2 is skipped entirely.

Six target variants

The target parameter in a CorrelateRequest selects how the TSR outcome is constructed before computing correlations. All six variants share the same underlying raw TSR (dividend-reinvested, using ex-date close prices); they differ in what benchmark is subtracted.

tsr — raw total shareholder return

No adjustment. Each peer's raw TSR over the selected window is used directly. This is the least noise-filtered view: a bull or bear market for the sector will dominate the correlation signal, making it hard to isolate company-specific drivers.

etsr — excess return vs. index benchmark

eTSRi=TSRiTSRbenchmark

The benchmark is determined by the selected index: SPY for sp500, IWB for r1000, etc. (via benchmarks.py). The caller can override with an explicit benchmark ticker. This removes the market-wide return component but leaves sector-level variation in the signal.

etsr_sector — excess return vs. sector ETF

eTSRi=TSRiTSRsector ETF

Each peer's return is benchmarked against the TSR of the corresponding GICS sector ETF (e.g., XLK for Information Technology, XLV for Health Care). Sector ETF TSRs are computed the same way as peer TSRs (dividend-reinvested over the same window). Peers whose sector cannot be resolved, or whose sector ETF has no price data in the window, are set to None and excluded from the correlation.

This is the cleanest isolation of company-specific return versus the simple index benchmark, because sector rotation can drive large index-relative spreads that have nothing to do with individual company fundamentals.

etsr_peer_median — excess return vs. peer-median

eTSRi=TSRiTSR~sector in peer set

Rather than using an external ETF, this variant subtracts the median TSR of same-sector peers within the analysis set. Peers with no sector tag are grouped under an _unknown bucket and benchmarked against the median of that no-sector group. The fallback to the overall peer-set median only fires when the _unknown bucket is empty. The reference benchmark is internal to the peer set, so it adapts to whatever companies happen to be included.

sector_neutral — both sides sector-demeaned

The most fully controlled variant: both the TSR outcome and every metric driver are demeaned by their per-sector median within the peer set.

y~i=yiy~sector of ix~ij=xijx~j,sector of i

This asks: "After removing all variation that can be explained by sector membership, what metric differences explain TSR differences?" It is the closest the engine comes to a within-industry regression, and is most useful for identifying company-specific operational drivers when the peer set spans multiple sectors.

etsr_capm — CAPM alpha (β-adjusted excess return)

αi=TSRiβ^i×TSRSPY

Each peer's TSR is adjusted by a per-peer market beta, allowing defensive names (low beta) to receive smaller adjustments and high-beta names larger ones. Rankings relative to the simple etsr variant shift accordingly. Beta estimation is described in the CAPM Alpha methodology essay.

Lead-lag mode

By default (lag_years = 0), the engine uses contemporaneous correlations: the latest available FY metrics for each peer, correlated against TSR over the period window ending today. This is the fast path — metrics are read from the pre-computed company_metrics_cache.

When lag_years > 0 (accepted values: 1, 2, 3), the engine operates in lead-lag mode:

  • Metrics are observed at the historical snapshot point today − lag_years.
  • TSR is computed over the window from that same observation point through today.
  • The period parameter is overridden to equal lag_years so the windows align.

This tests whether fundamentals at the observation point were predictive of subsequent returns — the forward-return convention used in academic studies of return predictability (Welch & Goyal, 2008). Metrics at time t that correlate strongly with TSR[t,t+k] are leading indicators rather than coincident ones.

Lead-lag metrics are computed on-demand from raw XBRL facts (not the cache), using the FY-end nearest to the observation date within a ±180-day window. This is the slow path, but lead-lag analysis is most meaningful for tighter peer sets (sub-industry, sector) where per-CIK fact loops stay within API Gateway's 30-second wall time.

Ranking

The engine computes correlations for every metric in the library and sorts by |rPearson| descending. The caller controls how many ranked rows to return via the top parameter (default 10). All metrics — including those with no statistically meaningful result — are evaluated; the ranking simply reflects which metrics co-move most with the chosen target, sign-agnostic.

Winsorization

Pearson is fragile to outliers — a single post-merger goodwill spike or NOL-distorted ROE can flip a metric's rank. EqtyTrk applies symmetric per-vector winsorization before computing rPearson and ρSpearman:

xi=max(min(xi,q1α(x)),qα(x))

The X vector is clipped to its own α/1α percentiles, and the Y vector to its own; pairs are not jointly conditioned. The default level is α=0.01, matching Welch & Goyal's convention for cross-sectional finance regressions. The toggle in the UI maps on → 0.01, off → 0.0 (no clipping). Below 5 paired observations, winsorization is skipped — the percentile estimates aren't meaningful at small n.

This affects Pearson noticeably; Spearman is already rank-based and largely insensitive (clipping only touches the tail values, which already had the most extreme ranks).

Limitations

  • Small n. Peer sets of 20–80 names make sampling variance large. A correlation of 0.3 with n=20 has a wide confidence interval; the same 0.3 with n=80 is much tighter. The reported n and p-value help calibrate confidence, but no automated threshold is applied.
  • Multiple comparisons. Testing 60+ metrics against the same target vector implicitly increases the probability of spurious high correlations. EqtyTrk does not apply a Bonferroni or FDR correction. Users should treat the top-ranked metrics as hypotheses to investigate, not as confirmed causal relationships.
  • Contemporaneous vs. causal. The default (lag_years = 0) correlation is contemporaneous. High correlation between, say, net margin and TSR over a shared window could mean that margin drove returns, that rising stock prices funded margin expansion (reverse causality), or that both were driven by a common macro factor. Lead-lag mode partially addresses this but does not eliminate endogeneity.
  • Sector ETF coverage. etsr_sector requires ETF price data in the DB. Sectors without an ingested ETF produce None TSR values for their constituent peers.

Implementation notes

  • correlate_pairs() and rank_metrics_by_correlation() in src/eqtytrk/correlation/analyzer.py
  • Uses scipy.stats.pearsonr() (returns Pearson r and two-tailed p-value) and scipy.stats.spearmanr() (returns Spearman ρ)
  • Six target variants implemented in src/eqtytrk/api/routers/analysis.py: _apply_etsr_sector(), _apply_etsr_peer_median(), _apply_etsr_capm(), _apply_sector_demean_metrics()
  • Lead-lag logic in the same file; uses _historical_fy_end_for() to pick the FY-end nearest start_d within ±180 days
  • CorrelateRequest / CorrelateResponse schemas in src/eqtytrk/api/schemas.py
  • _winsorize_pair() helper in src/eqtytrk/api/routers/analysis.py (1% default, applied after target adjustment, before Pearson/Spearman)

References

  • Welch, I. & Goyal, A. (2008). "A Comprehensive Look at the Empirical Performance of Equity Premium Prediction." Review of Financial Studies, 21(4), 1455–1508.
  • Fama, E.F. & French, K.R. (2015). "A five-factor asset pricing model." Journal of Financial Economics, 116(1), 1–22.

EqtyTrk methodology reference. Data from SEC EDGAR.