Project Structure¶
Source Layout¶
src/progridpy/
|-- __init__.py
|-- py.typed
|-- common/ # Shared types, config, and registry base
| |-- config.py # S3BucketConfig (bucket name env vars)
| |-- registry.py # DataRegistry[T_DataType, T_Definition] generic base
| +-- types.py # ISO and FileLocation StrEnums
|-- iso/ # ISO client implementations
| |-- base.py # ISOBase[RawT, ProcessedT] abstract class
| |-- miso/
| | |-- client.py # MISO client (download, process, upload)
| | |-- constants.py # MISO-specific constants
| | |-- registry.py # Raw and processed data registries
| | +-- types.py # MISORawDataType, MISOProcessedDataType enums
| |-- ercot/
| | |-- api/ # ERCOT Public API client and endpoint parser
| | |-- client.py # ERCOT client
| | |-- registry.py # Data registries with ProcessingBindings
| | +-- types.py # ERCOTRawDataType, ERCOTProcessedDataType enums
| +-- spp/
| |-- client.py # SPP client (gold-standard reference)
| |-- helpers.py # SPP-specific processing utilities
| |-- registry.py # Registry with ProcessingBindings and JoinModes
| +-- types.py # SPPRawDataType, SPPProcessedDataType, ReaderType, JoinMode
|-- metrics/ # Trading performance analytics
| |-- schema.py # ISOTradeSchema -- adapts ISO DataFrames to canonical columns
| |-- series.py # CanonicalSeries -- daily time series wrapper
| |-- calculator.py # 50+ metrics in Risk, Return, Ratio, Drawdown, Streak dataclasses
| |-- engine.py # MetricsEngine -- public facade (.compute(), .metrics_df, .plots)
| |-- iso_registry.py # ISO_REGISTRY dict and register_iso() helper
| |-- visualization.py # Plotly chart builders (MetricsPlots)
| +-- dashboard.py # StreamlitDashboard -- interactive KPI cards, heatmaps, distributions
|-- aws/
| +-- s3.py # S3Handler -- concurrent uploads/downloads, hive path parsing
+-- utils/
|-- downloader.py # FileDownloader -- parallel chunked HTTP downloads
|-- parser.py # Date parsing, path utilities
|-- display.py # Terminal display helpers
+-- enverus_api/ # EnverusClient, registry, reshape utilities
Module Overview¶
common/ -- Shared Foundations¶
Defines the ISO and FileLocation enums used across the project. S3BucketConfig
centralizes S3 bucket names (overridable via environment variables). DataRegistry is a
generic base class that maps StrEnum data types to frozen DataDefinition instances,
providing lookup, listing, and ordered-unique collection methods that each ISO registry
inherits.
iso/ -- ISO Clients¶
Each ISO is a package containing a client.py, types.py, and registry.py.
All clients inherit from ISOBase, a PEP 695 generic abstract class parameterized by
raw and processed data type enums:
ISOBase provides concrete S3 key builders, hive-partitioned output path construction,
and timezone-aware helpers. Subclasses implement five abstract methods:
download_raw_data()-- fetch from the ISO source or S3upload_raw_data()-- push raw files to S3process_raw_data()-- transform raw data to standardized hive-partitioned outputdownload_processed_data()-- fetch processed parquet from S3upload_processed_data()-- push processed data to S3
Registry-driven processing (SPP pattern): each raw data type declares
ProcessingBinding entries that specify the target processed dataset, join mode
(NODE_LEFT, TIME_LEFT, TIME_FULL, etc.), and output columns. Processing loads the
scaffold binding first, then iterates remaining bindings and joins according to the
declared mode.
metrics/ -- Trading Analytics Pipeline¶
The metrics module follows a layered pipeline:
| Layer | Responsibility |
|---|---|
Schema (schema.py) |
ISOTradeSchema adapts ISO-specific DataFrames to canonical column names |
Series (series.py) |
CanonicalSeries wraps daily gain, cumulative gain, and drawdown arrays |
Calculator (calculator.py) |
MetricsCalculator computes 50+ metrics across five slots=True dataclasses: RiskMetrics, ReturnMetrics, RatioMetrics, DrawdownMetrics, StreakMetrics |
Engine (engine.py) |
MetricsEngine is the public facade exposing .compute(), .metrics_df, and .plots |
ISO Registry (iso_registry.py) |
ISO_REGISTRY maps ISO names to schemas; register_iso() adds new ISOs |
Visualization (visualization.py) |
MetricsPlots builds Plotly charts (cumulative PnL, drawdowns, distributions) |
Dashboard (dashboard.py) |
StreamlitDashboard renders KPI cards, heatmaps, box plots, and time-frame filters |
aws/ -- S3 Integration¶
S3Handler is a context-manager wrapping a boto3.Session with concurrent
upload/download support, configurable multipart thresholds, and adaptive retries.
extract_date_from_hive_path() parses year=YYYY/month=MM/day=DD segments from S3 keys
into datetime objects.
utils/ -- Utilities¶
FileDownloader-- parallel, chunked HTTP downloader with retry logic and Rich progress bars.parser.py--parse_datetime()for flexible date string parsing, path resolution helpers.display.py-- terminal formatting utilities.enverus_api/--EnverusClientfor Enverus Power data, with a registry of available datasets and reshape utilities for wide/long transformations.
Data Flow¶
The following diagram illustrates the end-to-end data pipeline from ISO source to analytics dashboard.
flowchart LR
subgraph Sources
ISO_SRC["ISO Source<br/>(MISO, ERCOT, SPP)"]
S3_RAW["S3 Raw Bucket"]
end
subgraph Processing
RAW["Raw Data<br/>(CSV, Excel)"]
PROC["Processed Data<br/>(Hive Parquet)"]
end
subgraph Analytics
SCHEMA["ISOTradeSchema"]
SERIES["CanonicalSeries"]
CALC["MetricsCalculator"]
ENGINE["MetricsEngine"]
end
subgraph Output
CSV_OUT["CSV / HTML Export"]
DASH["Streamlit Dashboard"]
end
ISO_SRC -- "download_raw_data()" --> RAW
S3_RAW -- "download_raw_data(src=s3)" --> RAW
RAW -- "upload_raw_data()" --> S3_RAW
RAW -- "process_raw_data()" --> PROC
PROC -- "upload_processed_data()" --> S3_PROC["S3 Processed Bucket"]
PROC --> SCHEMA
SCHEMA --> SERIES
SERIES --> CALC
CALC --> ENGINE
ENGINE --> CSV_OUT
ENGINE --> DASH
Hive Storage Layout¶
Processed data is stored in hive-partitioned Parquet:
For example:
Raw data follows a flat naming convention:
For example: