Database Schema
The database uses a long (melted) format also known as EAV (Entity-Attribute-Value). This design was chosen because different companies report different metrics, making a wide-column approach impractical at scale.
Schema: finance
All tables live in the finance schema on Neon PostgreSQL.
active_tickers
Tracks which of the 106K tickers are valid/active on Yahoo Finance.
Column |
Type |
Description |
|---|---|---|
|
VARCHAR(20) |
Stock ticker symbol (e.g., “AAPL”, “MSFT”) |
|
VARCHAR |
Company name |
|
VARCHAR |
Exchange code (e.g., “NYQ”, “NMS”) |
|
VARCHAR |
Industry category |
|
VARCHAR |
Country of origin |
|
BOOLEAN |
Whether Yahoo API returns data for this ticker |
|
TIMESTAMP |
When this record was last updated |
Financial Statement Tables
All four tables (income_stmt, cash_flow, balance_sheet, financials)
share the same schema:
Column |
Type |
Description |
|---|---|---|
|
VARCHAR |
Stock ticker symbol |
|
VARCHAR |
“annual” or “quarterly” |
|
DATE |
Financial report date |
|
VARCHAR |
Metric name (e.g., “annualTotalRevenue”) |
|
DOUBLE PRECISION |
Numeric value (parsed from B/M/K suffixes) |
|
TIMESTAMP |
When this row was inserted |
ER Diagram
┌──────────────────────┐
│ active_tickers │
├──────────────────────┤ ┌──────────────────────┐
│ ticker (PK) │───────▶│ income_stmt │
│ name │ ├──────────────────────┤
│ exchange │ │ ticker │
│ category_name │ │ frequency │
│ country │ │ report_date │
│ is_active │ │ metric │
│ upsert_datetime │ │ value │
└──────────────────────┘ │ insert_datetime │
│ └──────────────────────┘
│
├────────────────────▶ cash_flow (same schema)
│
├────────────────────▶ balance_sheet (same schema)
│
└────────────────────▶ financials (same schema)
Example Queries
Get latest revenue for all tickers:
SELECT ticker, report_date, value
FROM finance.income_stmt
WHERE metric = 'annualTotalRevenue'
ORDER BY ticker, report_date DESC;
Find top 10 companies by net income:
SELECT DISTINCT ON (ticker) ticker, value
FROM finance.income_stmt
WHERE metric = 'annualNetIncome'
AND frequency = 'annual'
ORDER BY ticker, report_date DESC, value DESC
LIMIT 10;
Count tickers per exchange:
SELECT exchange, COUNT(*) as count
FROM finance.active_tickers
WHERE is_active = true
GROUP BY exchange
ORDER BY count DESC;