Skip to content

Peer Construction

Every correlation analysis and every DuPont benchmarking result in EqtyTrk depends on which companies are treated as peers. The platform exposes five peer-construction methods, ranging from the tightest (same GICS sub-industry) to the broadest (entire index). Choosing the right method is a tradeoff between comparability and sample size.

The five methods

The method is passed as the method field in a CorrelateRequest. The literal values are:

MethodDescription
sub_industrySame GICS sub-industry (8-digit code) as the subject — tightest comparability
sectorSame GICS sector (11-sector classification) — wider set, lower precision
hybridScored blend of sector match and size proximity — top 30 by composite score
index_membershipAll constituents of the selected index — broadest, least filtered
size_bandIndex constituents within ±50% of the subject's market cap

sub_industry

The sub-industry method starts with every constituent of the chosen index and filters to the subset sharing the subject's GICS sub-industry. It applies two automatic fallbacks:

  1. If the subject has no sub-industry tag in companies or index_constituents, it widens to peers_by_sector.
  2. If fewer than three peers share the sub-industry after filtering, it widens to sector to avoid a one- or two-name peer set where correlations would be meaningless.

This is the API default and generally the right starting point for operator-level analysis.

sector

Filters index constituents to the subject's GICS sector. Falls back through the same two-tier lookup: companies.sector first, then index_constituents.sector (sourced from iShares CSV files or Wikipedia) when the companies row is absent or unclassified.

hybrid

Scores every index constituent by a weighted combination of two factors:

score=0.6×sector_match+0.4×size_proximity

Where:

sector_match={1same GICS sector0otherwisesize_proximity=max(0, 1|log10(mcpeer/mcsubject)|2)

The log-ratio in size_proximity means a peer at 10x the subject's market cap scores 0.5 on this axis; at 100x (two decades of log-scale distance) it scores 0. Returns the top 30 candidates by composite score, dropping peers with missing market cap or sector (those contribute 0 on the respective axis).

index_membership

No filtering beyond index membership. The broadest peer set; appropriate when the subject is in a sub-industry with few constituents. Returns all index members except the subject itself, using the latest as_of_date snapshot in index_constituents.

size_band

Subset of index_membership filtered to peers whose market cap falls within ±50% of the subject's market cap:

lower=mcsubject×0.5,upper=mcsubject×1.5

The band_pct parameter defaults to Decimal("0.5"). Peers with no market cap are excluded.

GICS classification source

GICS data enters the system through two channels:

  1. iShares index CSVs — the index_constituents table is populated by scraping the iShares ETF holdings files (one file per index: sp500, sp400, sp600, r1000, r2000, nasdaq100). Each row carries the GICS sector and sub-industry as tagged by BlackRock. This is the primary source for S&P 500 constituents.
  2. companies tablecompanies.sector and companies.sub_industry are set during ingestion when EDGAR or another authoritative source provides them. This column is authoritative when present but is often NULL.

Both peer-sector lookups (in peers_by_sector, peers_by_sub_industry, and the correlate endpoint) follow the same fallback: companies.sectorindex_constituents.sector. This ensures that even tickers not yet fully ingested still classify correctly for correlation purposes.

Index selection and market-cap recommendations

Each peer method operates within a single index. When the caller does not specify index in the request, EqtyTrk selects the most appropriate index based on the subject's market cap:

Market capPreferred indexes
≥ $15B (large/mega)sp500, r1000
$2B–$15B (mid)sp400, r1000
< $2B (small)sp600, r2000

If the preferred index is not available in the database, the engine steps up to a larger-cap index until it finds one. This logic lives in src/eqtytrk/peers/recommend.py.

Dual-class ticker fallback

EDGAR's company_tickers.json sometimes lists only one share class for dual-class issuers. When the requested ticker does not resolve through EDGAR's normalization (dot-to-dash substitution), the endpoint queries companies_aliases — a small seed table that maps the unlisted class to the parent CIK. The current seed covers five SP500 members:

AliasCanonical CIK
BF.BBrown-Forman class B
BRK.BBerkshire Hathaway class B
FOXAFox class A
GOOGLAlphabet class A
NWSNews Corp class B

Because companies, company_metrics_cache, and xbrl_facts are keyed by CIK, resolving to the parent CIK gives immediate access to all ingested facts and metrics for that entity, regardless of which share class was requested.

Limitations

  • Point-in-time index composition. The peer set reflects the latest as_of_date in index_constituents. Companies that were recently added or removed are not handled retroactively; a historical look-back analysis could be biased by survivorship if the peer set includes companies that entered the index after the measurement window.
  • sub_industry fallback to sector. The automatic widening when fewer than three sub-industry peers exist is heuristic. For genuinely narrow sub-industries (e.g., a lone space-launch company in the index), sector peers may not be meaningfully comparable.
  • hybrid scores are ordinal, not cardinal. The 60/40 sector-to-size weighting is fixed and not calibrated from empirical data. Different weightings would produce different top-30 lists.
  • Missing market cap. Dual-class tickers whose share-count XBRL tags differ from what the ingestion pipeline whitelists cannot compute a market cap. size_band and hybrid fall back gracefully (dropping the peer or scoring size at 0); index_membership and sector/sub_industry are unaffected.

Implementation notes

  • peers_by_sub_industry(), peers_by_sector(), peers_by_hybrid(), peers_by_index_membership(), peers_by_size_band() in src/eqtytrk/peers/engine.py
  • recommend_index_by_market_cap() in src/eqtytrk/peers/recommend.py
  • companies_aliases table: src/eqtytrk/migrations/007_companies_aliases.sql
  • Method literal type declared in CorrelateRequest in src/eqtytrk/api/schemas.py

EqtyTrk methodology reference. Data from SEC EDGAR.