config¶
config
¶
Pipeline configuration: Pydantic models and YAML loader.
Define the declarative configuration schema for the hydro-param pipeline, matching design.md section 11.6. Configs express what to compute (target fabric, datasets, statistics, output format) but never how -- all processing logic lives in Python code, not in YAML.
The schema is validated at load time by Pydantic v2 so that invalid configs fail fast with clear error messages before any data is fetched.
See Also
hydro_param.pipeline : Orchestrator that consumes these config objects.
hydro_param.dataset_registry : Registry that resolves dataset names referenced
in :class:DatasetRequest.
TargetFabricConfig
¶
Bases: BaseModel
Specify the target polygon fabric to parameterize.
The target fabric is the spatial mesh (catchments, HRUs, grid cells) whose features receive zonal statistics from source datasets. The fabric must be a pre-existing geospatial file -- hydro-param does not fetch or subset fabrics (use pynhd/pygeohydro upstream).
| ATTRIBUTE | DESCRIPTION |
|---|---|
path |
Path to the fabric file (GeoPackage, GeoParquet, or Shapefile).
TYPE:
|
id_field |
Column name containing unique feature identifiers. This becomes the index/dimension name in all output files and the SIR xarray Dataset.
TYPE:
|
crs |
Coordinate reference system of the fabric file as an EPSG string.
Defaults to
TYPE:
|
Notes
The id_field propagates through the entire pipeline: it controls the
xarray dimension name in the SIR, the CSV index column, and the feature
matching in the pywatershed derivation plugin. Typical values are
"nhm_id" (pywatershed/NHM), "featureid" (NHDPlus), or
"hru_id" (custom fabrics).
DomainConfig
¶
Bases: BaseModel
Define the spatial domain that restricts which fabric features are processed.
When a domain is configured, stage 1 clips the target fabric to the specified extent before any data fetching or zonal statistics. When omitted, the full fabric extent is used.
Only type="bbox" is currently implemented; HUC and gage-based
subsetting are planned.
| ATTRIBUTE | DESCRIPTION |
|---|---|
type |
Domain specification method.
TYPE:
|
bbox |
Bounding box as
TYPE:
|
id |
Identifier for HUC or gage-based domains (e.g., HUC-2 code or
USGS gage ID). Required when
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the required field for the chosen |
DatasetRequest
¶
Bases: BaseModel
Request a dataset and its variables for pipeline processing.
Each entry within a category list in the datasets: dict of a pipeline
YAML config becomes one DatasetRequest. The name is resolved
against the dataset registry to obtain fetch strategy, STAC collection,
CRS, and variable metadata.
| ATTRIBUTE | DESCRIPTION |
|---|---|
name |
Dataset name as it appears in the registry (e.g.,
TYPE:
|
source |
Local file path override for
TYPE:
|
variables |
Variable names to extract (e.g.,
TYPE:
|
statistics |
Zonal statistics to compute for each variable. Defaults to
TYPE:
|
year |
Year(s) for multi-year static datasets (e.g., NLCD on OSN). When a
list is provided, the pipeline iterates over each year and produces
year-suffixed output keys (e.g.,
TYPE:
|
time_period |
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
See Also
hydro_param.dataset_registry.DatasetEntry : Registry metadata resolved
from name.
OutputConfig
¶
Bases: BaseModel
Configure pipeline output location and format.
| ATTRIBUTE | DESCRIPTION |
|---|---|
path |
Directory for output files. Created automatically if it does not
exist. Subdirectories are created per dataset category (e.g.,
TYPE:
|
format |
File format for temporal output. Static per-variable files are
always written as CSV. Defaults to
TYPE:
|
sir_name |
Human-readable name for the output, used in CF-1.8 metadata
attributes and log messages. Defaults to
TYPE:
|
ProcessingConfig
¶
Bases: BaseModel
Control batching, fault tolerance, and networking.
| ATTRIBUTE | DESCRIPTION |
|---|---|
batch_size |
Maximum number of features per spatial batch. KD-tree recursive bisection groups nearby features to minimize data fetch extent. Must be > 0. Defaults to 500.
TYPE:
|
resume |
When
TYPE:
|
sir_validation |
SIR validation mode for stage 5.
TYPE:
|
network_timeout |
Timeout in seconds for GDAL HTTP operations (COG/vsicurl access).
Applied to both
TYPE:
|
PipelineConfig
¶
Bases: BaseModel
Top-level pipeline configuration loaded from a YAML file.
This is the root model that :func:load_config deserializes. It
composes all sub-configs and is consumed by every pipeline stage.
| ATTRIBUTE | DESCRIPTION |
|---|---|
target_fabric |
Polygon mesh to parameterize.
TYPE:
|
domain |
Optional spatial subsetting. When
TYPE:
|
datasets |
Datasets organized by category (e.g.,
TYPE:
|
output |
Output location and format.
TYPE:
|
processing |
Engine, batching, and fault-tolerance settings.
TYPE:
|
See Also
load_config : Load and validate a YAML file into this model. hydro_param.pipeline.run_pipeline : Execute the pipeline from a config path.
flatten_datasets
¶
Flatten themed dataset dict into a single list for pipeline stages.
Bridge the category-keyed config format to pipeline stages that expect a flat iterable of dataset requests. This allows pipeline internals to remain agnostic to the themed grouping while the config YAML stays organized by domain category.
| RETURNS | DESCRIPTION |
|---|---|
list[DatasetRequest]
|
All dataset requests from all categories, preserving order within each category. |
Notes
Dict insertion order (guaranteed since Python 3.7) preserves intra-category order. Cross-category order follows YAML key order but is not semantically meaningful -- pipeline stages process each dataset independently.
Source code in src/hydro_param/config.py
load_config
¶
Load and validate a pipeline YAML config file.
Parse the YAML file at path and return a fully validated
:class:PipelineConfig. Pydantic model validators run during
construction, so any schema violations raise immediately with
descriptive error messages.
After validation, all relative paths (target_fabric.path,
output.path, per-dataset source) are resolved to absolute
paths using the current working directory. This ensures that
downstream operations (manifest save/load, file existence checks)
work consistently regardless of internal path manipulation.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Path to a YAML pipeline configuration file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
PipelineConfig
|
Validated pipeline configuration with all paths resolved to
absolute paths, ready for
:func: |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If path does not exist. |
YAMLError
|
If the file is not valid YAML. |
ValidationError
|
If the YAML content does not match the config schema. |
Notes
Relative paths in the YAML are interpreted relative to the current
working directory (the standard convention when running
hydro-param run configs/pipeline.yml from the project root).
Absolute paths are left unchanged.