Finance Data Pipeline

_images/architecture_diagram.svg

An automated financial data pipeline that collects stock financial statements from Yahoo Finance for 106,000+ tickers worldwide and stores them in a cloud-hosted PostgreSQL database.


Key Features

  • 106K+ Global Tickers validated against Yahoo Finance API

  • 4 Financial Statement Types: Income Statement, Cash Flow, Balance Sheet, Financials

  • Multi-threaded Execution for high throughput (configurable concurrency)

  • Incremental Updates: New tickers first, then refresh stale data


Pipeline Flow

_images/pipeline_flow.svg

Quick Start

# Install dependencies
pip install -r finance/requirements.txt

# Set up environment
export PG_NEON_FINANCE_URL="your-connection-string"

# Run active tickers check
python -m finance.src.run_active_tickers_check --mode single --threads 30

# Run financial data ETL
python -m finance.src.run_financial_etl --table income_stmt --max-batches 5

What Problem Does It Solve?

📈 Historical Data Accumulation

Yahoo Finance only shows the last 4 quarters/years. This pipeline builds a growing historical database over time.

🔍 SQL Access to Financial Data

Query thousands of companies at once with standard SQL instead of manual scraping.

🎯 Cross-Company Filtering

Filter and rank companies by any financial metric (revenue, EPS, margins, etc.).