Quick Start¶
A minimal end-to-end example using the Delaware River Basin.
This guide walks through a complete pipeline run: from project setup to inspecting output files. By the end, you will have zonal statistics for elevation, slope, and land cover computed over a polygon fabric.
Prerequisites¶
- pixi installed (installation guide)
- hydro-param cloned and environment synced:
Step 1: Initialize a project¶
This creates a standard project directory with template configs,
data directories organized by dataset category (e.g., data/topography/,
data/soils/), and a .gitignore.
Step 2: Obtain a target fabric¶
hydro-param does not fetch or subset fabrics. You need a pre-existing GeoPackage or GeoParquet file containing your HRU polygons. Options:
- pynhd / pygeohydro --
pygeohydro.get_camels()or NHDPlus catchments - USGS Geospatial Fabric -- download from ScienceBase
- Custom -- any polygon mesh with a unique integer ID column
Copy your fabric into the project:
Fabric requirements
The fabric must be a GeoPackage (.gpkg) or GeoParquet (.parquet)
file with polygon geometries and a unique integer ID column (e.g.,
featureid, nhm_id). hydro-param reads this file as-is and does
not modify it.
Step 3: Explore available datasets¶
List all registered datasets grouped by category:
Get details on a specific dataset, including its variables, access strategy, and spatial extent:
Step 4: Write a pipeline config¶
Create configs/pipeline.yml with the following content. This minimal
config requests two datasets: 3DEP elevation and slope, plus NLCD land
cover fractions.
target_fabric:
path: "data/fabrics/catchments.gpkg"
id_field: "featureid"
crs: "EPSG:5070"
domain:
type: bbox
bbox: [-76.5, 38.5, -74.0, 42.6] # (1)!
datasets:
topography: # (2)!
- name: dem_3dep_10m
variables: [elevation, slope]
statistics: [mean]
land_cover:
- name: nlcd_osn_lndcov
variables: [LndCov]
statistics: [categorical] # (3)!
year: [2021]
output:
path: "./output"
format: netcdf
processing:
batch_size: 500
- Bounding box as
[west, south, east, north]in longitude/latitude. This covers the Delaware River Basin. - Datasets are grouped by category. Valid categories include
topography,soils,land_cover,snow, andclimate. - Use
categoricalfor land cover to get per-class area fractions rather than a single summary statistic.
Config validation
hydro-param validates the config on load using Pydantic. Misspelled field names, unknown categories, or missing required fields produce clear error messages before any processing begins.
Step 5: Run the pipeline¶
The pipeline executes five stages:
- Resolve fabric -- loads and spatially subsets the target fabric
- Resolve datasets -- validates dataset names against the registry
- Compute weights -- gdptools builds intersection weights per batch
- Process datasets -- computes zonal statistics for each variable
- Format output -- writes results to the output directory
Progress is logged to the console. A typical run with 500-feature batches takes a few minutes depending on network speed (remote datasets) or disk I/O (local data).
Step 6: Inspect results¶
After the pipeline completes, the output directory contains one CSV per variable, organized by category:
output/
topography/
dem_3dep_10m_elevation_mean.csv # (1)!
dem_3dep_10m_slope_mean.csv
land_cover/
nlcd_osn_lndcov_LndCov_categorical_2021.csv # (2)!
.manifest.yml # (3)!
- Each CSV is indexed by the fabric ID field (
featureidin this example) with one row per feature. - Categorical output contains one column per land cover class, with values representing the fractional area of each class within each feature.
- The manifest tracks dataset fingerprints for resume support. Re-run
the same config with
resume: trueunderprocessing:to skip datasets whose inputs have not changed.
Load a result in Python:
import pandas as pd
elevation = pd.read_csv("output/topography/dem_3dep_10m_elevation_mean.csv", index_col=0)
print(elevation.head())
Or with xarray for downstream analysis:
Next steps¶
- Configuration -- full reference for all config fields, dataset options, and processing settings
- Datasets -- browse the dataset registry and learn about access strategies
- pywatershed Workflow -- use pipeline output to derive pywatershed model parameters