Skip to content

pywatershed_config

pywatershed_config

Define the pywatershed run configuration schema and YAML loader.

Provide Pydantic models that validate the YAML configuration for the hydro-param pywatershed run command. This is a Phase 2 (model-specific) config that consumes pre-existing SIR output from the generic Phase 1 pipeline. It does NOT configure the Phase 1 pipeline itself.

The configuration covers nine sections: domain file paths, simulation time period, SIR output location, static dataset declarations, forcing time series, climate normals, manual parameter overrides, calibration seed generation, and output file layout.

Notes

Version 4.0 adds three data sections (static_datasets, forcing, climate_normals) that declare which pipeline datasets provide each pywatershed parameter. This creates a consumer-oriented, self-documenting contract between the Phase 1 pipeline and the Phase 2 derivation plugin.

See Also

hydro_param.sir_accessor.SIRAccessor : Lazy SIR variable loader. hydro_param.plugins.DerivationContext : Derivation step context. hydro_param.cli.pws_run_cmd : Two-phase workflow consumer.

ParameterEntry

Bases: BaseModel

Declare the SIR data source for a single pywatershed parameter.

Each entry maps a pywatershed parameter to the pipeline dataset, source variable(s), and zonal statistic that produced the SIR data.

Exactly one of variable (single) or variables (list) must be provided for entries backed by SIR data. Both may be None only for entries whose source is not a pipeline dataset (e.g., waterbody parameters derived from fabric overlay).

PARAMETER DESCRIPTION
source

Pipeline dataset registry name (e.g., "dem_3dep_10m"), or a reference like "domain.waterbody_path" for non-SIR entries.

TYPE: str

variable

Source variable name when a single variable is used.

TYPE: str or None

variables

Source variable names when multiple variables contribute (e.g., ["sand", "silt", "clay"] for soil_type).

TYPE: list[str] or None

statistic

Zonal statistic applied ("mean", "categorical").

TYPE: str or None

year

NLCD year(s) for multi-epoch land cover.

TYPE: int or list[int] or None

time_period

Temporal range [start, end] in ISO format for temporal datasets.

TYPE: list[str] or None

description

Human-readable description of what this parameter represents.

TYPE: str

RAISES DESCRIPTION
ValueError

If both variable and variables are set simultaneously.

TopographyDatasets

Bases: BaseModel

Topography parameters derived from DEM zonal statistics.

PARAMETER DESCRIPTION
available

Curated datasets available in the registry for this category.

TYPE: list[str]

hru_elev

Mean HRU elevation.

TYPE: ParameterEntry or None

hru_slope

Mean HRU land surface slope.

TYPE: ParameterEntry or None

hru_aspect

Mean HRU aspect.

TYPE: ParameterEntry or None

SoilsDatasets

Bases: BaseModel

Soil parameters derived from soil property datasets.

PARAMETER DESCRIPTION
available

Curated datasets available in the registry for this category.

TYPE: list[str]

soil_type

Soil type classification (1=sand, 2=loam, 3=clay).

TYPE: ParameterEntry or None

sat_threshold

Gravity reservoir storage capacity (from porosity).

TYPE: ParameterEntry or None

soil_moist_max

Maximum available water-holding capacity.

TYPE: ParameterEntry or None

soil_rechr_max_frac

Recharge zone storage as fraction of soil_moist_max.

TYPE: ParameterEntry or None

LandcoverDatasets

Bases: BaseModel

Land cover parameters for vegetation type, density, and interception.

PARAMETER DESCRIPTION
available

Curated datasets available in the registry for this category.

TYPE: list[str]

cov_type

Vegetation cover type.

TYPE: ParameterEntry or None

hru_percent_imperv

Impervious surface fraction.

TYPE: ParameterEntry or None

covden_sum

Summer vegetation cover density (0--1 fraction).

TYPE: ParameterEntry or None

covden_win

Winter vegetation cover density (0--1 fraction).

TYPE: ParameterEntry or None

srain_intcp

Summer rain interception storage capacity (inches).

TYPE: ParameterEntry or None

wrain_intcp

Winter rain interception storage capacity (inches).

TYPE: ParameterEntry or None

snow_intcp

Snow interception storage capacity (inches).

TYPE: ParameterEntry or None

SnowDatasets

Bases: BaseModel

Snow parameters from depletion curve classification and historical SWE data.

PARAMETER DESCRIPTION
available

Curated datasets available in the registry for this category.

TYPE: list[str]

hru_deplcrv

Snow depletion curve class per HRU. Source is typically the GFv1.1 CV_INT raster (categorical majority). Indexes into the SDC table to populate snarea_curve.

TYPE: ParameterEntry or None

snarea_thresh

Snow depletion threshold (calibration seed from historical max SWE).

TYPE: ParameterEntry or None

WaterbodyDatasets

Bases: BaseModel

Depression storage and HRU type from waterbody overlay.

PARAMETER DESCRIPTION
available

Curated datasets available in the registry for this category.

TYPE: list[str]

hru_type

HRU type (0=inactive, 1=land, 2=lake, 3=swale).

TYPE: ParameterEntry or None

dprst_frac

Fraction of HRU with surface depressions.

TYPE: ParameterEntry or None

StaticDatasetsConfig

Bases: BaseModel

Static dataset declarations grouped by domain category.

Each category contains explicit parameter fields that map to SIR data produced by the Phase 1 pipeline.

PARAMETER DESCRIPTION
topography

DEM-derived parameters (elevation, slope, aspect).

TYPE: TopographyDatasets

soils

Soil property parameters.

TYPE: SoilsDatasets

landcover

Land cover and impervious surface parameters.

TYPE: LandcoverDatasets

snow

Historical snow parameters.

TYPE: SnowDatasets

waterbodies

Depression storage and HRU type.

TYPE: WaterbodyDatasets

ForcingConfig

Bases: BaseModel

Temporal forcing time series declarations.

The Phase 2 derivation plugin converts forcing data from SIR units (metric: mm, degC) to PRMS units (inches, degF) during output formatting. pywatershed expects one-variable-per-NetCDF.

PARAMETER DESCRIPTION
available

Temporal-capable datasets available in the registry.

TYPE: list[str]

prcp

Daily precipitation.

TYPE: ParameterEntry or None

tmax

Daily maximum temperature.

TYPE: ParameterEntry or None

tmin

Daily minimum temperature.

TYPE: ParameterEntry or None

ClimateNormalsConfig

Bases: BaseModel

Long-term climate statistics for derived parameters.

Can use the same source as forcing, or a different one (e.g., forcing from CONUS404-BA but normals from gridMET).

PARAMETER DESCRIPTION
available

Temporal-capable datasets available in the registry.

TYPE: list[str]

jh_coef

Jensen-Haise PET coefficient (monthly, from tmax/tmin normals).

TYPE: ParameterEntry or None

transp_beg

Month transpiration begins (from monthly mean tmin threshold).

TYPE: ParameterEntry or None

transp_end

Month transpiration ends (from monthly mean tmin threshold).

TYPE: ParameterEntry or None

PwsDomainConfig

Bases: BaseModel

Define the spatial domain for pywatershed model setup.

Point to pre-existing fabric and segment files on disk. hydro-param does NOT fetch or subset fabrics — use pynhd or pygeohydro upstream.

ATTRIBUTE DESCRIPTION
fabric_path

Path to the HRU fabric file (GeoPackage or GeoParquet).

TYPE: Path

segment_path

Path to the segment/flowline file for routing topology.

TYPE: Path or None

waterbody_path

Path to NHDPlus waterbody polygon file (GeoPackage or GeoParquet) for depression storage overlay (step 6). Must contain an ftype column with values like "LakePond" and "Reservoir". When None, step 6 uses zero defaults.

TYPE: Path or None

id_field

Feature ID column name in the fabric (default "nhm_id").

TYPE: str

segment_id_field

Segment ID column name in the segment fabric (default "nhm_seg").

TYPE: str

Notes

The fabric_path must point to a pre-existing file produced by pynhd, pygeohydro, or similar upstream tools.

PwsTimeConfig

Bases: BaseModel

Define the simulation time period for pywatershed.

ATTRIBUTE DESCRIPTION
start

Simulation start date in ISO format (e.g., "1980-10-01"). Typically a water-year boundary for PRMS.

TYPE: str

end

Simulation end date in ISO format (e.g., "1982-09-30").

TYPE: str

timestep

Temporal resolution. Only daily is currently supported, which matches PRMS's native timestep.

TYPE: {'daily'}

PwsParameterOverrides

Bases: BaseModel

Specify manual overrides for derived parameter values.

Allow users to inject known-good values (e.g., from calibration) that bypass the standard derivation pipeline. Overrides are applied after all other derivation steps complete.

ATTRIBUTE DESCRIPTION
values

Parameter name to scalar or per-HRU value mapping. Scalars are broadcast to all HRUs. List values must match the number of HRUs in the fabric.

TYPE: dict[str, float | list[float]]

from_file

Path to a NetCDF or CSV file containing override values. Not yet implemented.

TYPE: Path or None

PwsCalibrationConfig

Bases: BaseModel

Configure calibration seed generation for PRMS parameters.

PRMS calibration parameters (e.g., carea_max, soil_moist_max, K_coef) need physically plausible initial values. This config controls whether and how those seeds are generated.

ATTRIBUTE DESCRIPTION
generate_seeds

Whether to generate calibration seed values. Default True.

TYPE: bool

seed_method

"physically_based" derives seeds from GIS data (e.g., carea_max from impervious fraction). "all_defaults" uses PRMS default values for all calibration parameters.

TYPE: {'physically_based', 'all_defaults'}

preserve_from_existing

Parameter names to preserve from an existing parameter file rather than re-deriving. Useful for retaining calibrated values during fabric updates.

TYPE: list[str]

PwsOutputConfig

Bases: BaseModel

Specify output file layout for pywatershed model setup.

Control the directory structure and filenames for the four output components: static parameters, climate forcing, solar tables, and simulation control.

ATTRIBUTE DESCRIPTION
path

Root output directory. Created if it does not exist. Default "./output".

TYPE: Path

format

Output format. "netcdf" produces CF-1.8 compliant files loadable by pywatershed. "prms_text" is not yet implemented.

TYPE: {'netcdf', 'prms_text'}

parameter_file

Filename for static parameters (default "parameters.nc").

TYPE: str

forcing_dir

Subdirectory for climate forcing files (default "forcing").

TYPE: str

control_file

Filename for simulation control (default "control.yml").

TYPE: str

soltab_file

Filename for solar radiation tables (default "soltab.nc").

TYPE: str

PywatershedRunConfig

Bases: BaseModel

Define the top-level configuration for pywatershed model setup.

A consumer-oriented, self-documenting contract between the Phase 1 pipeline and the Phase 2 pywatershed derivation plugin. Three data sections (static_datasets, forcing, climate_normals) declare which pipeline datasets provide each pywatershed parameter.

PARAMETER DESCRIPTION
target_model

Target model identifier (fixed to "pywatershed").

TYPE: 'pywatershed' DEFAULT: "pywatershed"

version

Config schema version ("4.0").

TYPE: str

domain

Domain fabric file paths and ID field names.

TYPE: PwsDomainConfig

time

Simulation time period.

TYPE: PwsTimeConfig

sir_path

Path to the Phase 1 pipeline output directory containing .manifest.yml and sir/ subdirectory. Relative paths are resolved against the config file's parent directory.

TYPE: Path

static_datasets

Static dataset declarations grouped by domain category.

TYPE: StaticDatasetsConfig

forcing

Temporal forcing time series declarations.

TYPE: ForcingConfig

climate_normals

Long-term climate statistics for derived parameters.

TYPE: ClimateNormalsConfig

parameter_overrides

Manual parameter value overrides.

TYPE: PwsParameterOverrides

calibration

Calibration seed generation options.

TYPE: PwsCalibrationConfig

output

Output directory structure and filenames.

TYPE: PwsOutputConfig

See Also

load_pywatershed_config : YAML loader for this schema. hydro_param.cli.pws_run_cmd : Two-phase workflow consumer.

declared_entries

declared_entries() -> dict[str, ParameterEntry]

Collect all declared ParameterEntry objects from the config.

Walk static_datasets, forcing, and climate_normals sections and return a flat dictionary keyed by parameter name.

RETURNS DESCRIPTION
dict[str, ParameterEntry]

Parameter name to entry mapping for all non-None entries.

Source code in src/hydro_param/pywatershed_config.py
def declared_entries(self) -> dict[str, ParameterEntry]:
    """Collect all declared ParameterEntry objects from the config.

    Walk ``static_datasets``, ``forcing``, and ``climate_normals``
    sections and return a flat dictionary keyed by parameter name.

    Returns
    -------
    dict[str, ParameterEntry]
        Parameter name to entry mapping for all non-None entries.
    """
    entries: dict[str, ParameterEntry] = {}

    # Static datasets: walk each category
    for category in (
        self.static_datasets.topography,
        self.static_datasets.soils,
        self.static_datasets.landcover,
        self.static_datasets.snow,
        self.static_datasets.waterbodies,
    ):
        for field_name in type(category).model_fields:
            if field_name == "available":
                continue
            value = getattr(category, field_name)
            if value is not None:
                entries[field_name] = value

    # Forcing
    for field_name in ("prcp", "tmax", "tmin"):
        value = getattr(self.forcing, field_name)
        if value is not None:
            entries[field_name] = value

    # Climate normals
    for field_name in ("jh_coef", "transp_beg", "transp_end"):
        value = getattr(self.climate_normals, field_name)
        if value is not None:
            entries[field_name] = value

    return entries

validate_available_fields

validate_available_fields() -> None

Check that available dataset names exist in the registry.

Load the bundled dataset registry (including user-local overlays from ~/.hydro-param/datasets/) and verify that every name in each category's available list is a known dataset. Unknown entries emit a UserWarning rather than raising.

Warnings

UserWarning For each dataset name in an available list that is not found in the current registry.

Source code in src/hydro_param/pywatershed_config.py
def validate_available_fields(self) -> None:
    """Check that ``available`` dataset names exist in the registry.

    Load the bundled dataset registry (including user-local overlays
    from ``~/.hydro-param/datasets/``) and verify that every name in
    each category's ``available`` list is a known dataset.  Unknown
    entries emit a ``UserWarning`` rather than raising.

    Warnings
    --------
    UserWarning
        For each dataset name in an ``available`` list that is not
        found in the current registry.
    """
    from hydro_param.dataset_registry import get_all_dataset_names, load_registry
    from hydro_param.pipeline import DEFAULT_REGISTRY, USER_REGISTRY_DIR

    registry = load_registry(DEFAULT_REGISTRY, overlay_dirs=[USER_REGISTRY_DIR])
    known = get_all_dataset_names(registry)

    categories: list[tuple[str, BaseModel]] = [
        ("topography", self.static_datasets.topography),
        ("soils", self.static_datasets.soils),
        ("landcover", self.static_datasets.landcover),
        ("snow", self.static_datasets.snow),
        ("waterbodies", self.static_datasets.waterbodies),
        ("forcing", self.forcing),
        ("climate_normals", self.climate_normals),
    ]
    for cat_name, category in categories:
        available: list[str] = getattr(category, "available", [])
        for ds_name in available:
            if ds_name not in known:
                warnings.warn(
                    f"Dataset '{ds_name}' in {cat_name}.available "
                    f"is not in the registry. Known: {sorted(known)}",
                    UserWarning,
                    stacklevel=2,
                )

load_pywatershed_config

load_pywatershed_config(
    path: str | Path,
) -> PywatershedRunConfig

Load and validate a pywatershed run configuration from YAML.

Parse the YAML file and construct a fully validated PywatershedRunConfig with Pydantic's strict type coercion.

PARAMETER DESCRIPTION
path

Path to the YAML config file.

TYPE: str | Path

RETURNS DESCRIPTION
PywatershedRunConfig

Validated configuration ready for pws_run_cmd().

RAISES DESCRIPTION
FileNotFoundError

If path does not exist.

YAMLError

If the file contains invalid YAML.

ValidationError

If the config fails schema validation (missing required fields, type mismatches, extra fields).

Notes

Path fields (sir_path, domain.fabric_path, etc.) are returned as-is from the YAML. Relative paths are resolved against the config file's parent directory by the CLI consumer (pws_run_cmd), not by this loader.

Source code in src/hydro_param/pywatershed_config.py
def load_pywatershed_config(path: str | Path) -> PywatershedRunConfig:
    """Load and validate a pywatershed run configuration from YAML.

    Parse the YAML file and construct a fully validated
    ``PywatershedRunConfig`` with Pydantic's strict type coercion.

    Parameters
    ----------
    path
        Path to the YAML config file.

    Returns
    -------
    PywatershedRunConfig
        Validated configuration ready for ``pws_run_cmd()``.

    Raises
    ------
    FileNotFoundError
        If *path* does not exist.
    yaml.YAMLError
        If the file contains invalid YAML.
    pydantic.ValidationError
        If the config fails schema validation (missing required
        fields, type mismatches, extra fields).

    Notes
    -----
    Path fields (``sir_path``, ``domain.fabric_path``, etc.) are
    returned as-is from the YAML.  Relative paths are resolved against
    the config file's parent directory by the CLI consumer
    (``pws_run_cmd``), not by this loader.
    """
    with open(path) as f:
        raw = yaml.safe_load(f)
    if not isinstance(raw, dict):
        raise ValueError(
            f"Expected YAML mapping in {path}, got {type(raw).__name__}. "
            f"Check that the file is non-empty and contains valid config."
        )
    return PywatershedRunConfig(**raw)