Development Roadmap¶
This page summarizes the design decisions that shaped hydro-param during its initial development sprint (February--March 2026). Each theme groups related design documents by area of concern. The full design documents are in docs/plans/, and each entry in the tables below links to the corresponding document on GitHub.
Core Pipeline¶
The core pipeline is the model-agnostic engine at the center of hydro-param. It resolves target fabrics, fetches source datasets, computes intersection weights, runs zonal statistics, and writes a Standardized Internal Representation (SIR) to disk. Work in this theme established the SIR contract---normalized variable names, units, and file naming conventions---so that downstream model plugins can consume pipeline output without knowledge of how it was produced. Later additions introduced memory optimization, manifest-based resume, and specialized processing pathways for derived categorical variables that require pixel-level classification before zonal aggregation.
| Date | Design | Summary |
|---|---|---|
| 2026-02-23 | SIR Normalization | Standardized variable names and units at pipeline output boundary |
| 2026-02-23 | Pipeline Memory Optimization | STAC query reuse and memory-efficient batch processing |
| 2026-02-24 | SIR Temporal Normalization | Extended SIR normalization to temporal (multi-year) datasets |
| 2026-02-24 | Pipeline Resilience | Manifest-based resume, pre-fetch, network timeout handling |
| 2026-02-28 | SIR Variable Naming Fix | Year-suffixed SIR variable name resolution |
| 2026-02-28 | SIR Dataset Prefix | Dataset name prefix in SIR filenames for disambiguation |
| 2026-03-02 | Shared Classification Module | USDA texture triangle as shared classification module |
| 2026-03-02 | Derived Categorical Pipeline | Pixel-level multi-source classification before zonal stats |
pywatershed Plugin¶
The pywatershed plugin is hydro-param's primary consumer of pipeline output. It implements all 14 derivation steps required to produce the ~100 static parameters and 3 forcing time series that pywatershed (USGS NHM-PRMS in Python) needs to run. This theme covers the plugin architecture itself---the DerivationContext protocol, formatter separation, and standalone Phase 2 execution---as well as individual derivation step designs for soils, soltab, routing, waterbody overlay, PET, transpiration, forcing generation, and soil texture classification. The config schema evolved through four major versions to reach the current consumer-oriented layout with explicit static_datasets, forcing, and climate_normals sections.
| Date | Design | Summary |
|---|---|---|
| 2026-02-25 | Plugin Architecture | Plugin protocol, DerivationContext, formatter separation |
| 2026-02-25 | Steps 5, 9, 14 | Soils (step 5), soltab (step 9), calibration seeds (step 14) |
| 2026-02-25 | Forcing Generation | Step 7 temporal forcing: per-variable SIR, unit conversion, CBH format |
| 2026-02-25 | PET and Transpiration | Steps 10--11: Jensen-Haise PET and transpiration timing from climate normals |
| 2026-02-26 | Waterbody Overlay | Step 6: NHDPlus waterbody spatial overlay for hru_type and dprst_frac |
| 2026-02-26 | Routing Parameters | Step 12: Muskingum routing coefficients from segment geometry |
| 2026-02-28 | Decouple pywatershed Run | Decouple pywatershed run from Phase 1 pipeline (Phase 2 standalone) |
| 2026-02-28 | Temporal DerivationContext | Wire temporal SIR data into DerivationContext |
| 2026-02-28 | Forcing Regrouping | Per-variable forcing detection with reverse SIR lookup |
| 2026-03-01 | Config Redesign v4.0 | Consumer-oriented config with static_datasets, forcing, climate_normals |
| 2026-03-02 | Soil Texture Triangle | USDA soil texture triangle classifier for soil_type derivation |
| 2026-03-02 | pywatershed Compatibility | pywatershed v2.0 runtime compatibility layer |
| 2026-03-04 | soil_rechr_max_frac | soil_rechr_max_frac from gNATSGO AWC ratio (aws0_30 / aws0_100) |
Data Access¶
hydro-param supports five data access strategies spanning STAC catalogs, local GeoTIFFs, and OPeNDAP endpoints. This theme covers the integration of the Geospatial Fabric v1.1 (GFv1.1) dataset, which required a dedicated download CLI for its ~15 GB of ScienceBase-hosted rasters, a local_tiff processing pathway, and a user-local dataset registry overlay so that site-specific data paths do not pollute the bundled registry.
| Date | Design | Summary |
|---|---|---|
| 2026-03-06 | GFv1.1 Download CLI | GFv1.1 ScienceBase download CLI (~15 GB, fault-tolerant) |
| 2026-03-06 | GFv1.1 Raster Integration | GFv1.1 raster integration via local_tiff strategy |
| 2026-03-09 | GFv1.1 Registry Overlay | User-local dataset registry overlay for GFv1.1 |
Validation and QA¶
Validation work ensures that hydro-param's derived parameters match authoritative reference values. The soltab valid-range fix corrected a bounds error that clipped solar radiation tables. The parameter audit cross-referenced all ~100 PRMS parameters against source code to verify derivation categories. The NHM cross-check compared Delaware River Basin output against the National Hydrologic Model reference parameterization, leading to fixes for elevation statistics, centroid computation, canopy density units, and snow depletion curve assignment.
| Date | Design | Summary |
|---|---|---|
| 2026-03-02 | Soltab Valid Range | Fix soltab valid_range from [0, 1000] to [0, 2000] Langleys |
| 2026-03-05 | Parameter Audit Design | Source-code cross-reference of all ~100 parameters |
| 2026-03-05 | Parameter Audit Findings | Audit findings: parameter inventory with derivation categories |
| 2026-03-10 | GFv1.1 Validation Plan | GFv1.1 static parameter validation against NHM reference |
| 2026-03-10 | NHM Cross-Check | NHM reference cross-check: elevation median, representative_point, CV_INT fix |
Infrastructure¶
Infrastructure work improved the developer and user experience without changing parameterization logic. The UX audit addressed 15 gaps in CLI messages, error handling, and input validation. Registry YAMLs were bundled into the Python package so that installations are self-contained. A config schema audit relocated lookup tables and added waterbody_path support. Stale backward-compatibility code was removed, saving 167 lines. The themed datasets design introduced category-grouped pipeline configs for clearer organization.
| Date | Design | Summary |
|---|---|---|
| 2026-02-27 | Pre-Release UX Audit | 15-gap UX audit: CLI messages, error handling, validation |
| 2026-02-27 | Pipeline Template | Comprehensive pipeline template with all dataset categories |
| 2026-02-27 | Bundle Registry in Package | Bundle dataset registry YAMLs in package via importlib.resources |
| 2026-02-28 | Config Schema Audit | Config schema audit: waterbody_path, lookup table relocation |
| 2026-03-01 | Stale Code Cleanup | Remove dead backward compatibility code (-167 lines) |
| 2026-03-10 | Themed Datasets Config | Themed pipeline config: datasets grouped by category dict |
Open and Planned Work¶
The following items are designed but not yet implemented, or are tracked as open issues:
- Grid processing pathway --- polygon targets use gdptools; grid targets will use xesmf/rioxarray for raster-on-raster operations
- Transparent data caching --- pooch-style, library-managed transparent cache to replace manual download management
- Derived-raster pathway (#200) --- pixel-level raster math before zonal stats (DerivedContinuousSpec)
- PRMS legacy formatter (#92) --- pyPRMS-based text output format for classic PRMS input files
- NextGen hydrofabric slopes (#100) --- flowpath slopes from NextGen fabric for routing parameters
- Subsurface flux rescaling (#154) --- needs GLHYMPS data source for bedrock permeability
- Nearest-neighbor gap-fill (#73) --- temporal SIR features missing grid coverage
Last updated: 2026-03-11