Appearance
Peer Construction
Every correlation analysis and every DuPont benchmarking result in EqtyTrk depends on which companies are treated as peers. The platform exposes five peer-construction methods, ranging from the tightest (same GICS sub-industry) to the broadest (entire index). Choosing the right method is a tradeoff between comparability and sample size.
The five methods
The method is passed as the method field in a CorrelateRequest. The literal values are:
| Method | Description |
|---|---|
sub_industry | Same GICS sub-industry (8-digit code) as the subject — tightest comparability |
sector | Same GICS sector (11-sector classification) — wider set, lower precision |
hybrid | Scored blend of sector match and size proximity — top 30 by composite score |
index_membership | All constituents of the selected index — broadest, least filtered |
size_band | Index constituents within ±50% of the subject's market cap |
sub_industry
The sub-industry method starts with every constituent of the chosen index and filters to the subset sharing the subject's GICS sub-industry. It applies two automatic fallbacks:
- If the subject has no sub-industry tag in
companiesorindex_constituents, it widens topeers_by_sector. - If fewer than three peers share the sub-industry after filtering, it widens to sector to avoid a one- or two-name peer set where correlations would be meaningless.
This is the API default and generally the right starting point for operator-level analysis.
sector
Filters index constituents to the subject's GICS sector. Falls back through the same two-tier lookup: companies.sector first, then index_constituents.sector (sourced from iShares CSV files or Wikipedia) when the companies row is absent or unclassified.
hybrid
Scores every index constituent by a weighted combination of two factors:
Where:
The log-ratio in size_proximity means a peer at 10x the subject's market cap scores 0.5 on this axis; at 100x (two decades of log-scale distance) it scores 0. Returns the top 30 candidates by composite score, dropping peers with missing market cap or sector (those contribute 0 on the respective axis).
index_membership
No filtering beyond index membership. The broadest peer set; appropriate when the subject is in a sub-industry with few constituents. Returns all index members except the subject itself, using the latest as_of_date snapshot in index_constituents.
size_band
Subset of index_membership filtered to peers whose market cap falls within ±50% of the subject's market cap:
The band_pct parameter defaults to Decimal("0.5"). Peers with no market cap are excluded.
GICS classification source
GICS data enters the system through two channels:
- iShares index CSVs — the
index_constituentstable is populated by scraping the iShares ETF holdings files (one file per index: sp500, sp400, sp600, r1000, r2000, nasdaq100). Each row carries the GICS sector and sub-industry as tagged by BlackRock. This is the primary source for S&P 500 constituents. - companies table —
companies.sectorandcompanies.sub_industryare set during ingestion when EDGAR or another authoritative source provides them. This column is authoritative when present but is often NULL.
Both peer-sector lookups (in peers_by_sector, peers_by_sub_industry, and the correlate endpoint) follow the same fallback: companies.sector → index_constituents.sector. This ensures that even tickers not yet fully ingested still classify correctly for correlation purposes.
Index selection and market-cap recommendations
Each peer method operates within a single index. When the caller does not specify index in the request, EqtyTrk selects the most appropriate index based on the subject's market cap:
| Market cap | Preferred indexes |
|---|---|
| ≥ $15B (large/mega) | sp500, r1000 |
| $2B–$15B (mid) | sp400, r1000 |
| < $2B (small) | sp600, r2000 |
If the preferred index is not available in the database, the engine steps up to a larger-cap index until it finds one. This logic lives in src/eqtytrk/peers/recommend.py.
Dual-class ticker fallback
EDGAR's company_tickers.json sometimes lists only one share class for dual-class issuers. When the requested ticker does not resolve through EDGAR's normalization (dot-to-dash substitution), the endpoint queries companies_aliases — a small seed table that maps the unlisted class to the parent CIK. The current seed covers five SP500 members:
| Alias | Canonical CIK |
|---|---|
BF.B | Brown-Forman class B |
BRK.B | Berkshire Hathaway class B |
FOXA | Fox class A |
GOOGL | Alphabet class A |
NWS | News Corp class B |
Because companies, company_metrics_cache, and xbrl_facts are keyed by CIK, resolving to the parent CIK gives immediate access to all ingested facts and metrics for that entity, regardless of which share class was requested.
Limitations
- Point-in-time index composition. The peer set reflects the latest
as_of_dateinindex_constituents. Companies that were recently added or removed are not handled retroactively; a historical look-back analysis could be biased by survivorship if the peer set includes companies that entered the index after the measurement window. - sub_industry fallback to sector. The automatic widening when fewer than three sub-industry peers exist is heuristic. For genuinely narrow sub-industries (e.g., a lone space-launch company in the index), sector peers may not be meaningfully comparable.
- hybrid scores are ordinal, not cardinal. The 60/40 sector-to-size weighting is fixed and not calibrated from empirical data. Different weightings would produce different top-30 lists.
- Missing market cap. Dual-class tickers whose share-count XBRL tags differ from what the ingestion pipeline whitelists cannot compute a market cap.
size_bandandhybridfall back gracefully (dropping the peer or scoring size at 0);index_membershipandsector/sub_industryare unaffected.
Implementation notes
peers_by_sub_industry(),peers_by_sector(),peers_by_hybrid(),peers_by_index_membership(),peers_by_size_band()insrc/eqtytrk/peers/engine.pyrecommend_index_by_market_cap()insrc/eqtytrk/peers/recommend.pycompanies_aliasestable:src/eqtytrk/migrations/007_companies_aliases.sql- Method literal type declared in
CorrelateRequestinsrc/eqtytrk/api/schemas.py