finance.src package

Submodules

finance.src.file_generator module

finance.src.postgres_interface module

Module to interact with postgres databases

It contains generic methods to interact with postgres databases regardless of the data they contain

class finance.src.postgres_interface.PostgresInterface[source]

Bases: object

Class to interact with postgres databases

create_engine() Engine[source]

function that creates engines to connect to postgres databases

Returns:

dictionary with the engines to connect to the databases

Return type:

dict

create_table_object(table_name: str, engine: Engine, schema: str = 'stocks') Table[source]

Method to create a table object

Parameters:
  • table_name (str) – name of the table to create the object for

  • engine (sqlalchemy.engine.Engine) – engine to connect to the database

  • schema (str) – schema of the table default: stocks

Returns:

table object

Return type:

sqlalchemy.Table

insert_batch(table: Table, batch: list, conn: Connection) None[source]

Method to insert a batch of data into a table

Parameters:
  • table (str) – table to insert data into

  • data (list) – list of tuples with the data to insert into the table

Return type:

None

migrate_dbs(batch_size: int = 5000, tap_cloud_provider: str = 'NEON', target_cloud_provider: str = 'AVN') None[source]

Method to migrate a database to another one Supposed to be used only once to migrate data to a target database

Parameters:
  • batch_size (int) – number of rows to insert in each batch default: 5000

  • tap_cloud_provider (str) – cloud provider of the tap database default: NEON

  • target_cloud_provider (str) – cloud provider of the target database default: AVN

Return type:

None

read_table_to_df(table: str, schema: str = 'stocks') DataFrame[source]

Method to read a table into a dataframe

Parameters:

table (tabble name to read) –

Returns:

dataframe with the data from the table

Return type:

pd.DataFrame

finance.src.s3_interface module

Module to interact with s3

class finance.src.s3_interface.S3Interface[source]

Bases: object

Class to interact with s3

backup_tables(bucket_name: str = 'ahnazary-finance-prod')[source]

Method to backup tables to s3

get_bucket_names() list[source]

Method to get the names of the buckets in s3

Returns:

list with the names of the buckets in s3

Return type:

list

finance.src.schedule_jobs module

Module that schedules jobs in the CI/CD pipeline use to update the database

This module contains Methods and classes that can be reused over all the different jobs in the CI/CD pipeline

class finance.src.schedule_jobs.ScheduleJobs(provider: str, table_name: str, frequency: str = 'annual', batch_size: int = 500)[source]

Bases: object

Class that schedules jobs in the CI/CD pipeline use for updating the database and extracting data from yfinance

get_tickers_batch_yf_object(tickers_list: list) list[yfinance.ticker.Ticker][source]

Method to get a batch of yfinance tickers from a list of tickers

Parameters:

tickers (list) – list of tickers to get the yfinance tickers from

run_pipeline()[source]

Main method that each of the jobs in the CI/CD pipeline will run It includes steps like:

  • getting a batch of tickers to update from valid_tickers table

  • extracting data from yfinance for each ticker

  • inserting the data into the database

  • updating the validity of the tickers in valid_tickers table

tickers_to_query(table_name: str, engine: Engine, frequency: str = 'annual') list[source]

TODO: make this docs better TODO: make the query better

Method to get a batch of tickers from valid_tickers table that have not been inserted into other main tables (financials, balance_sheet, cashflow, etc.).

Parameters:
  • table_name (str) – name of the table to get the tickers from

  • engine (sqlalchemy.engine.Engine) – engine to connect to the database, defines if it is local or neon

  • frequency (str) – frequency of the data (either annual or quarterly)

  • table) (The query is equivalent to (for cahsflow) – select valid_tickers.ticker, valid_tickers.cashflow_annual_available, subquery.max_insert_date from stocks.valid_tickers left join ( select cashflow.ticker, max(cashflow.insert_date) as max_insert_date from stocks.cashflow group by cashflow.ticker ) as subquery on valid_tickers.ticker = subquery.ticker where valid_tickers.cashflow_annual_available order by subquery.max_insert_date asc

Returns:

list of tickers

Return type:

list

update_validy_in_valid_tickers_table(ticker: list, validity: bool = False)[source]

Method that gets a list of tickers and updates their validity in the valid_tickers table

finance.src.utils module

This module contains utility functions for the finance package.

finance.src.utils.are_incremental(input_list: list)[source]
finance.src.utils.custom_logger(logger_name: str, log_level: int = 30)[source]

Creates a custom logger.

Parameters:
  • logger_name (str) – Name of the logger.

  • log_level (int) – Log level.

Returns:

A custom logger.

Return type:

logging.Logger

finance.src.yahoo_ticker module

Module to perform operations on the yahoo finance API data (tickers)

class finance.src.yahoo_ticker.Ticker(countries: str | List[str] = None, chunksize: int = 20, frequency: Literal['annual', 'quarterly'] = 'annual', schema: str = 'stocks')[source]

Bases: object

extract_tickers_data(ticker: Ticker, table_name: str, table_columns: list) DataFrame[source]

Method that gets the data from the yfinance API and transforms it into a list of tuples

Parameters:
  • ticker (yf.Ticker) – The ticker or stock to update

  • table_name (str) – The name of the table that the ticker data is extracted for

  • table_columns (list) – The columns of the table

Return type:

None

flush_records(table_name: str, records: list)[source]

Method to flush records to a table

Parameters:
  • table_name (str) – The name of the table to flush the records to

  • records (list) – The records to flush to the table

get_columns_names(table_name: str)[source]

Method that returns the columns names of a table

get_currency_code(ticker: str) str[source]

Method that gets the currency code of a ticker from valid_tickers table in the database

Parameters:

ticker (str) – The ticker symbol

Returns:

The currency code of the ticker

Return type:

str

get_data_df(table_name: str, frequency: str, ticker: Ticker)[source]

Method that returns a df based on the name of the table and frequency

Parameters:
  • table_name (str) – The name of the table that is going to be filled

  • frequency (str) – The frequency of the data to be extracted Either annual or quarterly

Returns:

The dataframe with the data

Return type:

pd.DataFrame

load_valid_tickers(sink_table: str) List[str][source]

Method to load the valid tickers from the database based on the validity status of the tickers in the valid_tickers table

Parameters:

sink_table (str) – The name of the table to load the tickers from

Returns:

A list of the valid tickers

Return type:

List[str]

update_tickers_list_table()[source]

Method to update the tickers_list table in postgres Gets all the data in the data dir excel file (all available tickers) and inserts them into the database

update_validity_status(table_name: str, tickers: list[str], availability: bool = False)[source]

Method That gets a list of tickers and updates the validity status of the tickers for a specific criteria (e.g. balance_sheet_annual_availabile) in the valid_tickers table, e.g. if the ticker has not balance sheet data for the quarterly frequency, the balance_sheet_quarterly_available column in the valid_tickers table will be updated to False

Parameters:
  • table_name (str) – The name of the table which the ticker was supposed to be updated

  • ticker (list[str]) – The tickers that was supposed to be updated

  • validity (bool) – The validity status of the ticker for the specific criteria default: False

Return type:

None

Module contents