pywatershed_config¶

pywatershed_config ¶

Define the pywatershed run configuration schema and YAML loader.

Provide Pydantic models that validate the YAML configuration for the hydro-param pywatershed run command. This is a Phase 2 (model-specific) config that consumes pre-existing SIR output from the generic Phase 1 pipeline. It does NOT configure the Phase 1 pipeline itself.

The configuration covers nine sections: domain file paths, simulation time period, SIR output location, static dataset declarations, forcing time series, climate normals, manual parameter overrides, calibration seed generation, and output file layout.

Notes

Version 4.0 adds three data sections (static_datasets, forcing, climate_normals) that declare which pipeline datasets provide each pywatershed parameter. This creates a consumer-oriented, self-documenting contract between the Phase 1 pipeline and the Phase 2 derivation plugin.

ParameterEntry ¶

Bases: BaseModel

Declare the SIR data source for a single pywatershed parameter.

Each entry maps a pywatershed parameter to the pipeline dataset, source variable(s), and zonal statistic that produced the SIR data.

Exactly one of variable (single) or variables (list) must be provided for entries backed by SIR data. Both may be None only for entries whose source is not a pipeline dataset (e.g., waterbody parameters derived from fabric overlay).

PARAMETER	DESCRIPTION
`source`	Pipeline dataset registry name (e.g., `"dem_3dep_10m"`), or a reference like `"domain.waterbody_path"` for non-SIR entries. TYPE: `str`
`variable`	Source variable name when a single variable is used. TYPE: `str or None`
`variables`	Source variable names when multiple variables contribute (e.g., `["sand", "silt", "clay"]` for soil_type). TYPE: `list[str] or None`
`statistic`	Zonal statistic applied (`"mean"`, `"categorical"`). TYPE: `str or None`
`year`	NLCD year(s) for multi-epoch land cover. TYPE: `int or list[int] or None`
`time_period`	Temporal range `[start, end]` in ISO format for temporal datasets. TYPE: `list[str] or None`
`description`	Human-readable description of what this parameter represents. TYPE: `str`

RAISES	DESCRIPTION
`ValueError`	If both `variable` and `variables` are set simultaneously.

TopographyDatasets ¶

Bases: BaseModel

Topography parameters derived from DEM zonal statistics.

PARAMETER	DESCRIPTION
`available`	Curated datasets available in the registry for this category. TYPE: `list[str]`
`hru_elev`	Mean HRU elevation. TYPE: `ParameterEntry or None`
`hru_slope`	Mean HRU land surface slope. TYPE: `ParameterEntry or None`
`hru_aspect`	Mean HRU aspect. TYPE: `ParameterEntry or None`

SoilsDatasets ¶

Bases: BaseModel

Soil parameters derived from soil property datasets.

PARAMETER	DESCRIPTION
`available`	Curated datasets available in the registry for this category. TYPE: `list[str]`
`soil_type`	Soil type classification (1=sand, 2=loam, 3=clay). TYPE: `ParameterEntry or None`
`sat_threshold`	Gravity reservoir storage capacity (from porosity). TYPE: `ParameterEntry or None`
`soil_moist_max`	Maximum available water-holding capacity. TYPE: `ParameterEntry or None`
`soil_rechr_max_frac`	Recharge zone storage as fraction of soil_moist_max. TYPE: `ParameterEntry or None`

LandcoverDatasets ¶

Bases: BaseModel

Land cover parameters for vegetation type, density, and interception.

PARAMETER	DESCRIPTION
`available`	Curated datasets available in the registry for this category. TYPE: `list[str]`
`cov_type`	Vegetation cover type. TYPE: `ParameterEntry or None`
`hru_percent_imperv`	Impervious surface fraction. TYPE: `ParameterEntry or None`
`covden_sum`	Summer vegetation cover density (0--1 fraction). TYPE: `ParameterEntry or None`
`covden_win`	Winter vegetation cover density (0--1 fraction). TYPE: `ParameterEntry or None`
`srain_intcp`	Summer rain interception storage capacity (inches). TYPE: `ParameterEntry or None`
`wrain_intcp`	Winter rain interception storage capacity (inches). TYPE: `ParameterEntry or None`
`snow_intcp`	Snow interception storage capacity (inches). TYPE: `ParameterEntry or None`

SnowDatasets ¶

Bases: BaseModel

Snow parameters from depletion curve classification and historical SWE data.

PARAMETER	DESCRIPTION
`available`	Curated datasets available in the registry for this category. TYPE: `list[str]`
`hru_deplcrv`	Snow depletion curve class per HRU. Source is typically the GFv1.1 CV_INT raster (categorical majority). Indexes into the SDC table to populate `snarea_curve`. TYPE: `ParameterEntry or None`
`snarea_thresh`	Snow depletion threshold (calibration seed from historical max SWE). TYPE: `ParameterEntry or None`

WaterbodyDatasets ¶

Bases: BaseModel

Depression storage and HRU type from waterbody overlay.

PARAMETER	DESCRIPTION
`available`	Curated datasets available in the registry for this category. TYPE: `list[str]`
`hru_type`	HRU type (0=inactive, 1=land, 2=lake, 3=swale). TYPE: `ParameterEntry or None`
`dprst_frac`	Fraction of HRU with surface depressions. TYPE: `ParameterEntry or None`

StaticDatasetsConfig ¶

Bases: BaseModel

Static dataset declarations grouped by domain category.

Each category contains explicit parameter fields that map to SIR data produced by the Phase 1 pipeline.

PARAMETER	DESCRIPTION
`topography`	DEM-derived parameters (elevation, slope, aspect). TYPE: `TopographyDatasets`
`soils`	Soil property parameters. TYPE: `SoilsDatasets`
`landcover`	Land cover and impervious surface parameters. TYPE: `LandcoverDatasets`
`snow`	Historical snow parameters. TYPE: `SnowDatasets`
`waterbodies`	Depression storage and HRU type. TYPE: `WaterbodyDatasets`

ForcingConfig ¶

Bases: BaseModel

Temporal forcing time series declarations.

The Phase 2 derivation plugin converts forcing data from SIR units (metric: mm, degC) to PRMS units (inches, degF) during output formatting. pywatershed expects one-variable-per-NetCDF.

PARAMETER	DESCRIPTION
`available`	Temporal-capable datasets available in the registry. TYPE: `list[str]`
`prcp`	Daily precipitation. TYPE: `ParameterEntry or None`
`tmax`	Daily maximum temperature. TYPE: `ParameterEntry or None`
`tmin`	Daily minimum temperature. TYPE: `ParameterEntry or None`

ClimateNormalsConfig ¶

Bases: BaseModel

Long-term climate statistics for derived parameters.

Can use the same source as forcing, or a different one (e.g., forcing from CONUS404-BA but normals from gridMET).

PARAMETER	DESCRIPTION
`available`	Temporal-capable datasets available in the registry. TYPE: `list[str]`
`jh_coef`	Jensen-Haise PET coefficient (monthly, from tmax/tmin normals). TYPE: `ParameterEntry or None`
`transp_beg`	Month transpiration begins (from monthly mean tmin threshold). TYPE: `ParameterEntry or None`
`transp_end`	Month transpiration ends (from monthly mean tmin threshold). TYPE: `ParameterEntry or None`

PwsDomainConfig ¶

Bases: BaseModel

Define the spatial domain for pywatershed model setup.

Point to pre-existing fabric and segment files on disk. hydro-param does NOT fetch or subset fabrics — use pynhd or pygeohydro upstream.

ATTRIBUTE	DESCRIPTION
`fabric_path`	Path to the HRU fabric file (GeoPackage or GeoParquet). TYPE: `Path`
`segment_path`	Path to the segment/flowline file for routing topology. TYPE: `Path or None`
`waterbody_path`	Path to NHDPlus waterbody polygon file (GeoPackage or GeoParquet) for depression storage overlay (step 6). Must contain an `ftype` column with values like `"LakePond"` and `"Reservoir"`. When `None`, step 6 uses zero defaults. TYPE: `Path or None`
`id_field`	Feature ID column name in the fabric (default `"nhm_id"`). TYPE: `str`
`segment_id_field`	Segment ID column name in the segment fabric (default `"nhm_seg"`). TYPE: `str`

Notes

The fabric_path must point to a pre-existing file produced by pynhd, pygeohydro, or similar upstream tools.

PwsTimeConfig ¶

Bases: BaseModel

Define the simulation time period for pywatershed.

ATTRIBUTE	DESCRIPTION
`start`	Simulation start date in ISO format (e.g., `"1980-10-01"`). Typically a water-year boundary for PRMS. TYPE: `str`
`end`	Simulation end date in ISO format (e.g., `"1982-09-30"`). TYPE: `str`
`timestep`	Temporal resolution. Only daily is currently supported, which matches PRMS's native timestep. TYPE: `{'daily'}`

PwsParameterOverrides ¶

Bases: BaseModel

Specify manual overrides for derived parameter values.

Allow users to inject known-good values (e.g., from calibration) that bypass the standard derivation pipeline. Overrides are applied after all other derivation steps complete.

ATTRIBUTE	DESCRIPTION
`values`	Parameter name to scalar or per-HRU value mapping. Scalars are broadcast to all HRUs. List values must match the number of HRUs in the fabric. TYPE: `dict[str, float \| list[float]]`
`from_file`	Path to a NetCDF or CSV file containing override values. Not yet implemented. TYPE: `Path or None`

PwsCalibrationConfig ¶

Bases: BaseModel

Configure calibration seed generation for PRMS parameters.

PRMS calibration parameters (e.g., carea_max, soil_moist_max, K_coef) need physically plausible initial values. This config controls whether and how those seeds are generated.

ATTRIBUTE	DESCRIPTION
`generate_seeds`	Whether to generate calibration seed values. Default `True`. TYPE: `bool`
`seed_method`	`"physically_based"` derives seeds from GIS data (e.g., `carea_max` from impervious fraction). `"all_defaults"` uses PRMS default values for all calibration parameters. TYPE: `{'physically_based', 'all_defaults'}`
`preserve_from_existing`	Parameter names to preserve from an existing parameter file rather than re-deriving. Useful for retaining calibrated values during fabric updates. TYPE: `list[str]`

PwsOutputConfig ¶

Bases: BaseModel

Specify output file layout for pywatershed model setup.

Control the directory structure and filenames for the four output components: static parameters, climate forcing, solar tables, and simulation control.

ATTRIBUTE	DESCRIPTION
`path`	Root output directory. Created if it does not exist. Default `"./output"`. TYPE: `Path`
`format`	Output format. `"netcdf"` produces CF-1.8 compliant files loadable by pywatershed. `"prms_text"` is not yet implemented. TYPE: `{'netcdf', 'prms_text'}`
`parameter_file`	Filename for static parameters (default `"parameters.nc"`). TYPE: `str`
`forcing_dir`	Subdirectory for climate forcing files (default `"forcing"`). TYPE: `str`
`control_file`	Filename for simulation control (default `"control.yml"`). TYPE: `str`
`soltab_file`	Filename for solar radiation tables (default `"soltab.nc"`). TYPE: `str`

PywatershedRunConfig ¶

Bases: BaseModel

Define the top-level configuration for pywatershed model setup.

A consumer-oriented, self-documenting contract between the Phase 1 pipeline and the Phase 2 pywatershed derivation plugin. Three data sections (static_datasets, forcing, climate_normals) declare which pipeline datasets provide each pywatershed parameter.

PARAMETER	DESCRIPTION
`target_model`	Target model identifier (fixed to `"pywatershed"`). TYPE: `'pywatershed'` DEFAULT: `"pywatershed"`
`version`	Config schema version (`"4.0"`). TYPE: `str`
`domain`	Domain fabric file paths and ID field names. TYPE: `PwsDomainConfig`
`time`	Simulation time period. TYPE: `PwsTimeConfig`
`sir_path`	Path to the Phase 1 pipeline output directory containing `.manifest.yml` and `sir/` subdirectory. Relative paths are resolved against the config file's parent directory. TYPE: `Path`
`static_datasets`	Static dataset declarations grouped by domain category. TYPE: `StaticDatasetsConfig`
`forcing`	Temporal forcing time series declarations. TYPE: `ForcingConfig`
`climate_normals`	Long-term climate statistics for derived parameters. TYPE: `ClimateNormalsConfig`
`parameter_overrides`	Manual parameter value overrides. TYPE: `PwsParameterOverrides`
`calibration`	Calibration seed generation options. TYPE: `PwsCalibrationConfig`
`output`	Output directory structure and filenames. TYPE: `PwsOutputConfig`

declared_entries ¶

declared_entries() -> dict[str, ParameterEntry]

Collect all declared ParameterEntry objects from the config.

Walk static_datasets, forcing, and climate_normals sections and return a flat dictionary keyed by parameter name.

RETURNS	DESCRIPTION
`dict[str, ParameterEntry]`	Parameter name to entry mapping for all non-None entries.

Source code in src/hydro_param/pywatershed_config.py

def declared_entries(self) -> dict[str, ParameterEntry]:
    """Collect all declared ParameterEntry objects from the config.

    Walk ``static_datasets``, ``forcing``, and ``climate_normals``
    sections and return a flat dictionary keyed by parameter name.

    Returns
    -------
    dict[str, ParameterEntry]
        Parameter name to entry mapping for all non-None entries.
    """
    entries: dict[str, ParameterEntry] = {}

    # Static datasets: walk each category
    for category in (
        self.static_datasets.topography,
        self.static_datasets.soils,
        self.static_datasets.landcover,
        self.static_datasets.snow,
        self.static_datasets.waterbodies,
    ):
        for field_name in type(category).model_fields:
            if field_name == "available":
                continue
            value = getattr(category, field_name)
            if value is not None:
                entries[field_name] = value

    # Forcing
    for field_name in ("prcp", "tmax", "tmin"):
        value = getattr(self.forcing, field_name)
        if value is not None:
            entries[field_name] = value

    # Climate normals
    for field_name in ("jh_coef", "transp_beg", "transp_end"):
        value = getattr(self.climate_normals, field_name)
        if value is not None:
            entries[field_name] = value

    return entries

validate_available_fields ¶

validate_available_fields() -> None

Check that available dataset names exist in the registry.

Load the bundled dataset registry (including user-local overlays from ~/.hydro-param/datasets/) and verify that every name in each category's available list is a known dataset. Unknown entries emit a UserWarning rather than raising.

Warnings

UserWarning For each dataset name in an available list that is not found in the current registry.

Source code in src/hydro_param/pywatershed_config.py

def validate_available_fields(self) -> None:
    """Check that ``available`` dataset names exist in the registry.

    Load the bundled dataset registry (including user-local overlays
    from ``~/.hydro-param/datasets/``) and verify that every name in
    each category's ``available`` list is a known dataset.  Unknown
    entries emit a ``UserWarning`` rather than raising.

    Warnings
    --------
    UserWarning
        For each dataset name in an ``available`` list that is not
        found in the current registry.
    """
    from hydro_param.dataset_registry import get_all_dataset_names, load_registry
    from hydro_param.pipeline import DEFAULT_REGISTRY, USER_REGISTRY_DIR

    registry = load_registry(DEFAULT_REGISTRY, overlay_dirs=[USER_REGISTRY_DIR])
    known = get_all_dataset_names(registry)

    categories: list[tuple[str, BaseModel]] = [
        ("topography", self.static_datasets.topography),
        ("soils", self.static_datasets.soils),
        ("landcover", self.static_datasets.landcover),
        ("snow", self.static_datasets.snow),
        ("waterbodies", self.static_datasets.waterbodies),
        ("forcing", self.forcing),
        ("climate_normals", self.climate_normals),
    ]
    for cat_name, category in categories:
        available: list[str] = getattr(category, "available", [])
        for ds_name in available:
            if ds_name not in known:
                warnings.warn(
                    f"Dataset '{ds_name}' in {cat_name}.available "
                    f"is not in the registry. Known: {sorted(known)}",
                    UserWarning,
                    stacklevel=2,
                )

load_pywatershed_config ¶

load_pywatershed_config(
    path: str | Path,
) -> PywatershedRunConfig

Load and validate a pywatershed run configuration from YAML.

Parse the YAML file and construct a fully validated PywatershedRunConfig with Pydantic's strict type coercion.

PARAMETER	DESCRIPTION
`path`	Path to the YAML config file. TYPE: `str \| Path`

RETURNS	DESCRIPTION
`PywatershedRunConfig`	Validated configuration ready for `pws_run_cmd()`.

RAISES	DESCRIPTION
`FileNotFoundError`	If path does not exist.
`YAMLError`	If the file contains invalid YAML.
`ValidationError`	If the config fails schema validation (missing required fields, type mismatches, extra fields).

Notes

Path fields (sir_path, domain.fabric_path, etc.) are returned as-is from the YAML. Relative paths are resolved against the config file's parent directory by the CLI consumer (pws_run_cmd), not by this loader.

Source code in src/hydro_param/pywatershed_config.py

def load_pywatershed_config(path: str | Path) -> PywatershedRunConfig:
    """Load and validate a pywatershed run configuration from YAML.

    Parse the YAML file and construct a fully validated
    ``PywatershedRunConfig`` with Pydantic's strict type coercion.

    Parameters
    ----------
    path
        Path to the YAML config file.

    Returns
    -------
    PywatershedRunConfig
        Validated configuration ready for ``pws_run_cmd()``.

    Raises
    ------
    FileNotFoundError
        If *path* does not exist.
    yaml.YAMLError
        If the file contains invalid YAML.
    pydantic.ValidationError
        If the config fails schema validation (missing required
        fields, type mismatches, extra fields).

    Notes
    -----
    Path fields (``sir_path``, ``domain.fabric_path``, etc.) are
    returned as-is from the YAML.  Relative paths are resolved against
    the config file's parent directory by the CLI consumer
    (``pws_run_cmd``), not by this loader.
    """
    with open(path) as f:
        raw = yaml.safe_load(f)
    if not isinstance(raw, dict):
        raise ValueError(
            f"Expected YAML mapping in {path}, got {type(raw).__name__}. "
            f"Check that the file is non-empty and contains valid config."
        )
    return PywatershedRunConfig(**raw)