pywatershed Workflow¶
hydro-param's primary target model is pywatershed --- the USGS National Hydrologic Model (NHM-PRMS) implemented in Python. This page walks through the end-to-end workflow for generating a complete set of pywatershed model parameters from geospatial source data.
Two-Phase Architecture¶
Generating pywatershed parameters is a two-phase process:
Phase 1 (Generic Pipeline) Phase 2 (pywatershed Derivation)
┌───────────────────────────┐ ┌──────────────────────────────────┐
│ hydro-param run │ │ hydro-param pywatershed run │
│ │ │ │
│ Fetch datasets │ │ Read SIR from disk │
│ Compute zonal statistics │──SIR──│ Derive ~100 PRMS parameters │
│ Write CSV/NetCDF per var │ │ Write parameters.nc, forcing/, │
│ │ │ soltab.nc, control.yml │
│ Model-agnostic │ │ pywatershed-specific │
└───────────────────────────┘ └──────────────────────────────────┘
Phase 1 runs the generic five-stage pipeline. It fetches geospatial datasets (3DEP, NLCD, POLARIS, gridMET, etc.), computes zonal statistics against your target fabric, and writes the results as a Standardized Internal Representation (SIR) --- one file per variable, organized by dataset category. Phase 1 knows nothing about pywatershed or any other model.
Phase 2 reads the SIR output and derives all parameters that pywatershed needs. It performs unit conversions (metric to PRMS internal units: feet, inches, degrees F), variable reclassification (NLCD codes to PRMS cover types), lookup-table joins, solar geometry computations, climate normal derivations, and gap-filling. The result is a set of model-ready files that pywatershed can load directly.
Why two phases?
Separating data access from model derivation means you can re-run Phase 2 with different pywatershed settings without re-fetching datasets. It also means the same SIR output could drive other models in the future.
Prerequisites¶
Before starting, you need:
Required:
-
HRU fabric --- a polygon GeoPackage or GeoParquet file defining your Hydrologic Response Units (HRUs). Obtain this from the USGS Geospatial Fabric or generate it with pynhd.
-
Segment fabric --- a line GeoPackage defining stream segments (needed for routing parameters in Phase 2). Typically the NHDPlus flowline geometry for your domain.
Optional:
-
Local GFv1.1 rasters --- pre-downloaded Geospatial Fabric v1.1 rasters for land cover parameters. Download with
hydro-param gfv11 download --output-dir /path/to/data/gfv11. -
Waterbody polygons --- NHDPlus waterbody geometries for depression storage and HRU type classification.
hydro-param does not fetch fabrics
hydro-param expects pre-existing geospatial files as input. Use pynhd or pygeohydro to obtain and subset fabric files for your study area before running hydro-param.
Step 1: Run the Generic Pipeline (Phase 1)¶
Phase 1 is driven by a pipeline config file that declares your target fabric, datasets, variables, and output settings.
Pipeline config overview¶
target_fabric:
path: data/pywatershed_gis/drb_2yr/nhru.gpkg
id_field: nhm_id
datasets:
topography:
- name: dem_3dep_10m
variables: [elevation, slope, sin_aspect, cos_aspect]
statistics: [mean]
soils:
- name: gnatsgo_rasters
variables: [aws0_100, rootznemc, rootznaws]
statistics: [mean]
- name: polaris_30m
variables: [sand, silt, clay, theta_s, ksat, soil_texture]
statistics: [mean]
land_cover:
- name: nlcd_osn_lndcov
variables: [LndCov]
statistics: [categorical]
year: [2020, 2021]
- name: nlcd_osn_fctimp
variables: [FctImp]
statistics: [mean]
year: [2020, 2021]
climate:
- name: gridmet
variables: [pr, tmmx, tmmn, srad, pet, vs]
statistics: [mean]
time_period: ["2020-01-01", "2021-12-31"]
output:
path: output
format: netcdf
sir_name: drb_2yr_sir
processing:
batch_size: 240
resume: true
Key points:
target_fabricpoints to your HRU polygon file and its unique identifier field.datasetslists what to fetch, organized by category. Each entry names a registered dataset, the variables to extract, and the statistics to compute.statistics: [categorical]computes class fractions (for NLCD land cover codes);[mean]computes continuous zonal means.time_periodtriggers temporal processing for climate datasets.batch_sizecontrols how many HRUs are processed at once (spatial batching keeps memory bounded).resume: trueallows restarting interrupted runs.
See Configuration for the full config reference.
Run it¶
What it produces¶
Phase 1 writes the SIR to the output directory:
output/
.manifest.yml # SIR manifest (resume + Phase 2 lookup)
topography/
dem_3dep_10m_elevation_mean.csv # Mean elevation per HRU
dem_3dep_10m_slope_mean.csv # Mean slope per HRU
dem_3dep_10m_sin_aspect_mean.csv # Sin of aspect (for circular mean)
dem_3dep_10m_cos_aspect_mean.csv # Cos of aspect (for circular mean)
soils/
gnatsgo_rasters_aws0_100_mean.csv # Available water storage 0-100cm
polaris_30m_sand_mean.csv # Sand fraction per HRU
...
land_cover/
nlcd_osn_lndcov_LndCov_categorical_2021.csv # NLCD class fractions per HRU
nlcd_osn_fctimp_FctImp_mean_2021.csv # Fractional impervious per HRU
climate/
gridmet_pr_mm_mean_2020.nc # Daily precipitation (temporal)
gridmet_tmmx_C_mean_2020.nc # Daily max temperature (temporal)
...
Each file contains one variable with the HRU identifier as the index or coordinate dimension. Temporal datasets produce one NetCDF per variable per year. Static datasets produce CSVs.
Check the manifest
The .manifest.yml file in the output directory lists every SIR
file with its dataset, variable, statistic, and file path. Phase 2
uses this manifest to locate SIR data on disk.
Step 2: Derive pywatershed Parameters (Phase 2)¶
Phase 2 reads the SIR and applies pywatershed-specific derivations to produce model-ready parameter files.
pywatershed run config overview¶
target_model: pywatershed
version: "4.0"
sir_path: "../../output"
domain:
fabric_path: data/pywatershed_gis/drb_2yr/nhru.gpkg
segment_path: data/pywatershed_gis/drb_2yr/nsegment.gpkg
id_field: nhm_id
segment_id_field: nhm_seg
time:
start: "2020-01-01"
end: "2021-12-31"
timestep: daily
static_datasets:
topography:
available: [dem_3dep_10m]
hru_elev:
source: dem_3dep_10m
variable: elevation
statistic: mean
hru_slope:
source: dem_3dep_10m
variable: slope
statistic: mean
soils:
available: [polaris_30m, gnatsgo_rasters]
soil_type:
source: polaris_30m
variables: [sand, silt, clay]
statistic: mean
soil_moist_max:
source: gnatsgo_rasters
variable: aws0_100
statistic: mean
# ... more soil parameters
landcover:
available: [nlcd_osn_lndcov, nlcd_osn_fctimp]
cov_type:
source: nlcd_osn_lndcov
variable: LndCov
statistic: categorical
year: [2021]
# ... more landcover parameters
forcing:
available: [gridmet]
prcp:
source: gridmet
variable: pr
statistic: mean
tmax:
source: gridmet
variable: tmmx
statistic: mean
tmin:
source: gridmet
variable: tmmn
statistic: mean
climate_normals:
available: [gridmet]
jh_coef:
source: gridmet
variables: [tmmx, tmmn]
transp_beg:
source: gridmet
variable: tmmn
transp_end:
source: gridmet
variable: tmmn
calibration:
generate_seeds: true
seed_method: physically_based
output:
path: models/pywatershed
format: netcdf
parameter_file: parameters.nc
forcing_dir: forcing
control_file: control.yml
soltab_file: soltab.nc
Key points:
sir_pathpoints to the Phase 1 output directory (relative paths are resolved from the config file's location).domainprovides both HRU and segment fabrics --- segments are needed for routing parameter derivation (Step 12).static_datasetsmaps PRMS parameter names to SIR variables. Each entry says which dataset, variable, and statistic to read from the SIR.forcingmaps PRMS forcing variables (prcp, tmax, tmin) to their SIR temporal data sources.climate_normalsidentifies which climate variables to use for deriving PET coefficients and transpiration timing.calibration.generate_seeds: truecomputes physically-based initial values for calibration parameters.
See Configuration for the full config reference.
Run it¶
What it produces¶
models/pywatershed/
parameters.nc # Static PRMS parameters (CF-1.8 NetCDF)
soltab.nc # Potential solar radiation tables (nhru x 366)
control.yml # Simulation time period and file paths
forcing/
prcp.nc # Daily precipitation (inches/day)
tmax.nc # Daily maximum temperature (degrees F)
tmin.nc # Daily minimum temperature (degrees F)
What the Derivation Steps Produce¶
Phase 2 executes 14 ordered derivation steps. Each step may depend on parameters computed in earlier steps (the steps form a directed acyclic graph).
| Step | Category | Key Parameters | Source |
|---|---|---|---|
| 1 | Geometry | hru_area, hru_lat, hru_lon |
Computed from fabric polygon geometry |
| 2 | Topology | hru_segment, tosegment_nhm, hru_up_id |
Spatial join of fabric to NHDPlus segments |
| 3 | Topography | hru_elev, hru_slope, hru_aspect |
3DEP DEM zonal statistics |
| 4 | Land cover | cov_type, covden_sum, covden_win, srain_intcp, wrain_intcp, snow_intcp |
NLCD reclassification + PRMS lookup tables |
| 5 | Soils | soil_type, soil_moist_max, soil_rechr_max, soil_rechr_max_frac |
POLARIS/gNATSGO zonal stats + USDA texture triangle |
| 6 | Waterbodies | hru_type, dprst_frac |
NHDPlus waterbody overlay |
| 7 | Forcing | prcp, tmax, tmin (daily) |
gridMET temporal data, converted to PRMS units |
| 8 | Lookup tables | hru_deplcrv, snarea_curve, rad_trncf |
PRMS tables keyed by cov_type / forest type |
| 9 | Solar tables | soltab_potsw, soltab_horad (nhru x 366) |
Computed from latitude, slope, and aspect |
| 10 | PET coefficients | jh_coef (monthly) |
Jensen-Haise formula from tmax/tmin normals |
| 11 | Transpiration | transp_beg, transp_end |
Monthly mean tmin threshold analysis |
| 12 | Routing | K_coef, x_coef, seg_cum_area, seg_slope |
Muskingum coefficients from segment geometry |
| 13 | Defaults | tmax_allsnow, dday_slope, cecn_coef, etc. |
PRMS default values |
| 14 | Calibration seeds | snarea_thresh, fastcoef_lin, etc. |
Physically-based initial values |
Step ordering matters
Steps must run in order because later steps depend on earlier
results. For example, Step 8 (lookup tables) needs cov_type from
Step 4, and Step 9 (solar tables) needs hru_lat, hru_slope, and
hru_aspect from Steps 1 and 3.
Unit conversions¶
PRMS uses non-SI internal units. Phase 2 handles all conversions:
| Quantity | SIR Units | PRMS Units | Conversion |
|---|---|---|---|
| Elevation | meters | feet | multiply by 3.28084 |
| Area | square meters | acres | divide by 4046.86 |
| Precipitation | mm/day | inches/day | divide by 25.4 |
| Temperature | degrees C | degrees F | T_F = T_C x 9/5 + 32 |
| Soil moisture | mm | inches | divide by 25.4 |
Step 3: Validate Output¶
After Phase 2 completes, validate the parameter file to confirm all required parameters are present and within acceptable ranges:
The validator checks:
- All required PRMS parameters are present in the file
- Values fall within the valid ranges defined in the bundled parameter metadata
- Dimension sizes are consistent (nhru, nsegment, nmonths, etc.)
A clean run prints a success message. Any issues are reported with the parameter name, the expected range, and the actual min/max values found.
Run validation after every change
If you modify the pywatershed run config --- for example, changing
parameter_overrides or switching data sources --- re-run validation
to catch any out-of-range values early.
Output File Reference¶
| File | Format | Dimensions | Description |
|---|---|---|---|
parameters.nc |
CF-1.8 NetCDF | nhru, nsegment, nmonths, ndeplval |
Static PRMS parameters. Load with pws.Parameters.from_netcdf(). |
forcing/prcp.nc |
NetCDF | nhru x time |
Daily precipitation in inches/day. |
forcing/tmax.nc |
NetCDF | nhru x time |
Daily maximum temperature in degrees F. |
forcing/tmin.nc |
NetCDF | nhru x time |
Daily minimum temperature in degrees F. |
soltab.nc |
NetCDF | nhru x 366 |
Potential shortwave radiation and horizontal radiation by Julian day. |
control.yml |
YAML | --- | Simulation start/end dates, timestep, and paths to parameter/forcing files. |
Running pywatershed¶
Once you have the output files, you can run a pywatershed simulation:
import pywatershed as pws
control = pws.Control.load("models/pywatershed/control.yml")
params = pws.Parameters.from_netcdf("models/pywatershed/parameters.nc")
model = pws.Model(control=control, parameters=params)
model.run()
Refer to the pywatershed documentation for details on model configuration, process selection, and output analysis.
Next Steps¶
- Configuration --- Full reference for both pipeline and pywatershed run configs, including all supported options.
- Datasets --- Browse available datasets, check registration details, and download local data.
- CLI Reference --- Complete command reference for
hydro-param run,hydro-param pywatershed run, and other commands.