Skip to content

Project Structure

Source Layout

src/progridpy/
|-- __init__.py
|-- py.typed
|-- common/                # Shared types, config, and registry base
|   |-- config.py          # S3BucketConfig (bucket name env vars)
|   |-- registry.py        # DataRegistry[T_DataType, T_Definition] generic base
|   +-- types.py           # ISO and FileLocation StrEnums
|-- iso/                   # ISO client implementations
|   |-- base.py            # ISOBase[RawT, ProcessedT] abstract class
|   |-- miso/
|   |   |-- client.py      # MISO client (download, process, upload)
|   |   |-- constants.py   # MISO-specific constants
|   |   |-- registry.py    # Raw and processed data registries
|   |   +-- types.py       # MISORawDataType, MISOProcessedDataType enums
|   |-- ercot/
|   |   |-- api/           # ERCOT Public API client and endpoint parser
|   |   |-- client.py      # ERCOT client
|   |   |-- registry.py    # Data registries with ProcessingBindings
|   |   +-- types.py       # ERCOTRawDataType, ERCOTProcessedDataType enums
|   +-- spp/
|       |-- client.py      # SPP client (gold-standard reference)
|       |-- helpers.py     # SPP-specific processing utilities
|       |-- registry.py    # Registry with ProcessingBindings and JoinModes
|       +-- types.py       # SPPRawDataType, SPPProcessedDataType, ReaderType, JoinMode
|-- metrics/               # Trading performance analytics
|   |-- schema.py          # ISOTradeSchema -- adapts ISO DataFrames to canonical columns
|   |-- series.py          # CanonicalSeries -- daily time series wrapper
|   |-- calculator.py      # 50+ metrics in Risk, Return, Ratio, Drawdown, Streak dataclasses
|   |-- engine.py          # MetricsEngine -- public facade (.compute(), .metrics_df, .plots)
|   |-- iso_registry.py    # ISO_REGISTRY dict and register_iso() helper
|   |-- visualization.py   # Plotly chart builders (MetricsPlots)
|   +-- dashboard.py       # StreamlitDashboard -- interactive KPI cards, heatmaps, distributions
|-- aws/
|   +-- s3.py              # S3Handler -- concurrent uploads/downloads, hive path parsing
+-- utils/
    |-- downloader.py      # FileDownloader -- parallel chunked HTTP downloads
    |-- parser.py          # Date parsing, path utilities
    |-- display.py         # Terminal display helpers
    +-- enverus_api/       # EnverusClient, registry, reshape utilities

Module Overview

common/ -- Shared Foundations

Defines the ISO and FileLocation enums used across the project. S3BucketConfig centralizes S3 bucket names (overridable via environment variables). DataRegistry is a generic base class that maps StrEnum data types to frozen DataDefinition instances, providing lookup, listing, and ordered-unique collection methods that each ISO registry inherits.

iso/ -- ISO Clients

Each ISO is a package containing a client.py, types.py, and registry.py. All clients inherit from ISOBase, a PEP 695 generic abstract class parameterized by raw and processed data type enums:

class ISOBase[RawT: StrEnum, ProcessedT: StrEnum](ABC):
    ...

ISOBase provides concrete S3 key builders, hive-partitioned output path construction, and timezone-aware helpers. Subclasses implement five abstract methods:

  • download_raw_data() -- fetch from the ISO source or S3
  • upload_raw_data() -- push raw files to S3
  • process_raw_data() -- transform raw data to standardized hive-partitioned output
  • download_processed_data() -- fetch processed parquet from S3
  • upload_processed_data() -- push processed data to S3

Registry-driven processing (SPP pattern): each raw data type declares ProcessingBinding entries that specify the target processed dataset, join mode (NODE_LEFT, TIME_LEFT, TIME_FULL, etc.), and output columns. Processing loads the scaffold binding first, then iterates remaining bindings and joins according to the declared mode.

metrics/ -- Trading Analytics Pipeline

The metrics module follows a layered pipeline:

Schema --> Series --> Calculator --> Engine --> Visualization / Dashboard
Layer Responsibility
Schema (schema.py) ISOTradeSchema adapts ISO-specific DataFrames to canonical column names
Series (series.py) CanonicalSeries wraps daily gain, cumulative gain, and drawdown arrays
Calculator (calculator.py) MetricsCalculator computes 50+ metrics across five slots=True dataclasses: RiskMetrics, ReturnMetrics, RatioMetrics, DrawdownMetrics, StreakMetrics
Engine (engine.py) MetricsEngine is the public facade exposing .compute(), .metrics_df, and .plots
ISO Registry (iso_registry.py) ISO_REGISTRY maps ISO names to schemas; register_iso() adds new ISOs
Visualization (visualization.py) MetricsPlots builds Plotly charts (cumulative PnL, drawdowns, distributions)
Dashboard (dashboard.py) StreamlitDashboard renders KPI cards, heatmaps, box plots, and time-frame filters

aws/ -- S3 Integration

S3Handler is a context-manager wrapping a boto3.Session with concurrent upload/download support, configurable multipart thresholds, and adaptive retries. extract_date_from_hive_path() parses year=YYYY/month=MM/day=DD segments from S3 keys into datetime objects.

utils/ -- Utilities

  • FileDownloader -- parallel, chunked HTTP downloader with retry logic and Rich progress bars.
  • parser.py -- parse_datetime() for flexible date string parsing, path resolution helpers.
  • display.py -- terminal formatting utilities.
  • enverus_api/ -- EnverusClient for Enverus Power data, with a registry of available datasets and reshape utilities for wide/long transformations.

Data Flow

The following diagram illustrates the end-to-end data pipeline from ISO source to analytics dashboard.

flowchart LR
    subgraph Sources
        ISO_SRC["ISO Source<br/>(MISO, ERCOT, SPP)"]
        S3_RAW["S3 Raw Bucket"]
    end

    subgraph Processing
        RAW["Raw Data<br/>(CSV, Excel)"]
        PROC["Processed Data<br/>(Hive Parquet)"]
    end

    subgraph Analytics
        SCHEMA["ISOTradeSchema"]
        SERIES["CanonicalSeries"]
        CALC["MetricsCalculator"]
        ENGINE["MetricsEngine"]
    end

    subgraph Output
        CSV_OUT["CSV / HTML Export"]
        DASH["Streamlit Dashboard"]
    end

    ISO_SRC -- "download_raw_data()" --> RAW
    S3_RAW -- "download_raw_data(src=s3)" --> RAW
    RAW -- "upload_raw_data()" --> S3_RAW
    RAW -- "process_raw_data()" --> PROC
    PROC -- "upload_processed_data()" --> S3_PROC["S3 Processed Bucket"]
    PROC --> SCHEMA
    SCHEMA --> SERIES
    SERIES --> CALC
    CALC --> ENGINE
    ENGINE --> CSV_OUT
    ENGINE --> DASH

Hive Storage Layout

Processed data is stored in hive-partitioned Parquet:

{dir_name}/year=YYYY/month=MM/day=DD/data.parquet

For example:

iso=spp/dataset=nodal/year=2024/month=01/day=15/data.parquet

Raw data follows a flat naming convention:

{iso}/{dir_name}/{YYYYMMDD}_{suffix}.{ext}

For example:

ercot/da_spp/20240115_da_spp.csv