Skip to content

Development Roadmap

This page summarizes the design decisions that shaped hydro-param during its initial development sprint (February--March 2026). Each theme groups related design documents by area of concern. The full design documents are in docs/plans/, and each entry in the tables below links to the corresponding document on GitHub.


Core Pipeline

The core pipeline is the model-agnostic engine at the center of hydro-param. It resolves target fabrics, fetches source datasets, computes intersection weights, runs zonal statistics, and writes a Standardized Internal Representation (SIR) to disk. Work in this theme established the SIR contract---normalized variable names, units, and file naming conventions---so that downstream model plugins can consume pipeline output without knowledge of how it was produced. Later additions introduced memory optimization, manifest-based resume, and specialized processing pathways for derived categorical variables that require pixel-level classification before zonal aggregation.

Date Design Summary
2026-02-23 SIR Normalization Standardized variable names and units at pipeline output boundary
2026-02-23 Pipeline Memory Optimization STAC query reuse and memory-efficient batch processing
2026-02-24 SIR Temporal Normalization Extended SIR normalization to temporal (multi-year) datasets
2026-02-24 Pipeline Resilience Manifest-based resume, pre-fetch, network timeout handling
2026-02-28 SIR Variable Naming Fix Year-suffixed SIR variable name resolution
2026-02-28 SIR Dataset Prefix Dataset name prefix in SIR filenames for disambiguation
2026-03-02 Shared Classification Module USDA texture triangle as shared classification module
2026-03-02 Derived Categorical Pipeline Pixel-level multi-source classification before zonal stats

pywatershed Plugin

The pywatershed plugin is hydro-param's primary consumer of pipeline output. It implements all 14 derivation steps required to produce the ~100 static parameters and 3 forcing time series that pywatershed (USGS NHM-PRMS in Python) needs to run. This theme covers the plugin architecture itself---the DerivationContext protocol, formatter separation, and standalone Phase 2 execution---as well as individual derivation step designs for soils, soltab, routing, waterbody overlay, PET, transpiration, forcing generation, and soil texture classification. The config schema evolved through four major versions to reach the current consumer-oriented layout with explicit static_datasets, forcing, and climate_normals sections.

Date Design Summary
2026-02-25 Plugin Architecture Plugin protocol, DerivationContext, formatter separation
2026-02-25 Steps 5, 9, 14 Soils (step 5), soltab (step 9), calibration seeds (step 14)
2026-02-25 Forcing Generation Step 7 temporal forcing: per-variable SIR, unit conversion, CBH format
2026-02-25 PET and Transpiration Steps 10--11: Jensen-Haise PET and transpiration timing from climate normals
2026-02-26 Waterbody Overlay Step 6: NHDPlus waterbody spatial overlay for hru_type and dprst_frac
2026-02-26 Routing Parameters Step 12: Muskingum routing coefficients from segment geometry
2026-02-28 Decouple pywatershed Run Decouple pywatershed run from Phase 1 pipeline (Phase 2 standalone)
2026-02-28 Temporal DerivationContext Wire temporal SIR data into DerivationContext
2026-02-28 Forcing Regrouping Per-variable forcing detection with reverse SIR lookup
2026-03-01 Config Redesign v4.0 Consumer-oriented config with static_datasets, forcing, climate_normals
2026-03-02 Soil Texture Triangle USDA soil texture triangle classifier for soil_type derivation
2026-03-02 pywatershed Compatibility pywatershed v2.0 runtime compatibility layer
2026-03-04 soil_rechr_max_frac soil_rechr_max_frac from gNATSGO AWC ratio (aws0_30 / aws0_100)

Data Access

hydro-param supports five data access strategies spanning STAC catalogs, local GeoTIFFs, and OPeNDAP endpoints. This theme covers the integration of the Geospatial Fabric v1.1 (GFv1.1) dataset, which required a dedicated download CLI for its ~15 GB of ScienceBase-hosted rasters, a local_tiff processing pathway, and a user-local dataset registry overlay so that site-specific data paths do not pollute the bundled registry.

Date Design Summary
2026-03-06 GFv1.1 Download CLI GFv1.1 ScienceBase download CLI (~15 GB, fault-tolerant)
2026-03-06 GFv1.1 Raster Integration GFv1.1 raster integration via local_tiff strategy
2026-03-09 GFv1.1 Registry Overlay User-local dataset registry overlay for GFv1.1

Validation and QA

Validation work ensures that hydro-param's derived parameters match authoritative reference values. The soltab valid-range fix corrected a bounds error that clipped solar radiation tables. The parameter audit cross-referenced all ~100 PRMS parameters against source code to verify derivation categories. The NHM cross-check compared Delaware River Basin output against the National Hydrologic Model reference parameterization, leading to fixes for elevation statistics, centroid computation, canopy density units, and snow depletion curve assignment.

Date Design Summary
2026-03-02 Soltab Valid Range Fix soltab valid_range from [0, 1000] to [0, 2000] Langleys
2026-03-05 Parameter Audit Design Source-code cross-reference of all ~100 parameters
2026-03-05 Parameter Audit Findings Audit findings: parameter inventory with derivation categories
2026-03-10 GFv1.1 Validation Plan GFv1.1 static parameter validation against NHM reference
2026-03-10 NHM Cross-Check NHM reference cross-check: elevation median, representative_point, CV_INT fix

Infrastructure

Infrastructure work improved the developer and user experience without changing parameterization logic. The UX audit addressed 15 gaps in CLI messages, error handling, and input validation. Registry YAMLs were bundled into the Python package so that installations are self-contained. A config schema audit relocated lookup tables and added waterbody_path support. Stale backward-compatibility code was removed, saving 167 lines. The themed datasets design introduced category-grouped pipeline configs for clearer organization.

Date Design Summary
2026-02-27 Pre-Release UX Audit 15-gap UX audit: CLI messages, error handling, validation
2026-02-27 Pipeline Template Comprehensive pipeline template with all dataset categories
2026-02-27 Bundle Registry in Package Bundle dataset registry YAMLs in package via importlib.resources
2026-02-28 Config Schema Audit Config schema audit: waterbody_path, lookup table relocation
2026-03-01 Stale Code Cleanup Remove dead backward compatibility code (-167 lines)
2026-03-10 Themed Datasets Config Themed pipeline config: datasets grouped by category dict

Open and Planned Work

The following items are designed but not yet implemented, or are tracked as open issues:

  • Grid processing pathway --- polygon targets use gdptools; grid targets will use xesmf/rioxarray for raster-on-raster operations
  • Transparent data caching --- pooch-style, library-managed transparent cache to replace manual download management
  • Derived-raster pathway (#200) --- pixel-level raster math before zonal stats (DerivedContinuousSpec)
  • PRMS legacy formatter (#92) --- pyPRMS-based text output format for classic PRMS input files
  • NextGen hydrofabric slopes (#100) --- flowpath slopes from NextGen fabric for routing parameters
  • Subsurface flux rescaling (#154) --- needs GLHYMPS data source for bedrock permeability
  • Nearest-neighbor gap-fill (#73) --- temporal SIR features missing grid coverage

Last updated: 2026-03-11