project¶

project ¶

Project scaffolding: init command, project root detection, and template generation.

Provide the hydro-param init functionality that creates a standard project directory structure with categorical data subfolders, template pipeline and pywatershed configurations, and a .gitignore. A hidden .hydro-param marker file at the project root enables upward directory discovery so that commands run from subdirectories can locate the project root automatically.

The scaffolding is intentionally lightweight -- hydro-param uses library-managed transparent caching (pooch-style), not a heavyweight project directory contract. The templates exist to give users a working starting point, not to enforce a rigid layout.

MARKER_FILE `module-attribute` ¶

MARKER_FILE = '.hydro-param'

str : Hidden file name placed at the project root by hydro-param init.

DEFAULT_CATEGORIES `module-attribute` ¶

DEFAULT_CATEGORIES: list[str] = [
    "climate",
    "geology",
    "hydrography",
    "land_cover",
    "snow",
    "soils",
    "topography",
    "water_bodies",
]

list[str] : Built-in dataset categories used when the registry is unavailable.

These names correspond to the per-category YAML files in hydro_param.data.datasets and become subdirectory names under data/ in an initialized project.

find_project_root ¶

find_project_root(start: Path | None = None) -> Path | None

Walk up from start looking for a .hydro-param marker file.

Traverse parent directories from start toward the filesystem root, returning the first directory that contains the marker file. This allows CLI commands invoked from any subdirectory to locate the project root without requiring an explicit --project-dir flag.

PARAMETER	DESCRIPTION
`start`	Directory to begin searching from. Defaults to the current working directory (`Path.cwd()`). TYPE: `Path or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Path or None`	The resolved project root directory, or `None` if no marker is found before reaching the filesystem root.

Notes

The search terminates when parent == current, which is the filesystem root on both POSIX and Windows systems.

Source code in src/hydro_param/project.py

def find_project_root(start: Path | None = None) -> Path | None:
    """Walk up from *start* looking for a ``.hydro-param`` marker file.

    Traverse parent directories from *start* toward the filesystem root,
    returning the first directory that contains the marker file.  This
    allows CLI commands invoked from any subdirectory to locate the
    project root without requiring an explicit ``--project-dir`` flag.

    Parameters
    ----------
    start : Path or None
        Directory to begin searching from.  Defaults to the current
        working directory (``Path.cwd()``).

    Returns
    -------
    Path or None
        The resolved project root directory, or ``None`` if no marker
        is found before reaching the filesystem root.

    Notes
    -----
    The search terminates when ``parent == current``, which is the
    filesystem root on both POSIX and Windows systems.
    """
    current = (start or Path.cwd()).resolve()
    while True:
        if (current / MARKER_FILE).is_file():
            return current
        parent = current.parent
        if parent == current:
            return None
        current = parent

get_data_categories ¶

get_data_categories(
    registry_path: Path | None = None,
) -> list[str]

Discover dataset categories from the registry, with built-in fallback.

Attempt to load the dataset registry and extract the union of all category values. If the registry cannot be loaded (missing path, parse error, etc.), fall back to :data:DEFAULT_CATEGORIES so that project scaffolding always succeeds.

PARAMETER	DESCRIPTION
`registry_path`	Path to a registry YAML file or directory of YAML files. When `None` or unloadable, :data:`DEFAULT_CATEGORIES` is returned. TYPE: `Path or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[str]`	Sorted, deduplicated list of category names. Always includes at least the :data:`DEFAULT_CATEGORIES`.

Source code in src/hydro_param/project.py

def get_data_categories(registry_path: Path | None = None) -> list[str]:
    """Discover dataset categories from the registry, with built-in fallback.

    Attempt to load the dataset registry and extract the union of all
    ``category`` values.  If the registry cannot be loaded (missing path,
    parse error, etc.), fall back to :data:`DEFAULT_CATEGORIES` so that
    project scaffolding always succeeds.

    Parameters
    ----------
    registry_path : Path or None
        Path to a registry YAML file or directory of YAML files.  When
        ``None`` or unloadable, :data:`DEFAULT_CATEGORIES` is returned.

    Returns
    -------
    list[str]
        Sorted, deduplicated list of category names.  Always includes at
        least the :data:`DEFAULT_CATEGORIES`.
    """
    if registry_path is None:
        return list(DEFAULT_CATEGORIES)
    try:
        from hydro_param.dataset_registry import load_registry

        reg = load_registry(registry_path)
        categories = {entry.category for entry in reg.datasets.values() if entry.category}
        if categories:
            return sorted(categories | set(DEFAULT_CATEGORIES))
    except (OSError, ValueError, KeyError) as exc:
        logger.warning(
            "Could not load registry at '%s' for category discovery; "
            "using built-in defaults. Error: %s",
            registry_path,
            exc,
        )
    return list(DEFAULT_CATEGORIES)

generate_pipeline_template ¶

generate_pipeline_template(project_name: str) -> str

Generate a well-commented pipeline YAML config template.

Produce a starter pipeline.yml with inline comments explaining each section (target fabric, domain, datasets, output, processing). The template includes all 7 tested dataset configurations covering all 5 data access strategies (stac_cog, local_tiff, nhgf_stac static, nhgf_stac temporal, climr_cat).

PARAMETER	DESCRIPTION
`project_name`	Project name inserted into the `output.sir_name` field and the header comment. TYPE: `str`

RETURNS	DESCRIPTION
`str`	YAML content suitable for writing to `configs/pipeline.yml`.

generate_pywatershed_template ¶

generate_pywatershed_template(project_name: str) -> str

Generate a well-commented pywatershed run config template (v4.0).

Produce a starter pywatershed_run.yml for Phase 2 parameterization. This config consumes existing SIR output produced by hydro-param run and derives pywatershed-specific parameters. The v4.0 format includes three data sections (static_datasets, forcing, climate_normals) that declare the pipeline-to-parameter data contract.

PARAMETER	DESCRIPTION
`project_name`	Project name inserted into the output path and header comment. TYPE: `str`

RETURNS	DESCRIPTION
`str`	YAML content suitable for writing to `configs/pywatershed_run.yml`.

References

docs/reference/pywatershed_parameterization_guide.md
docs/reference/pywatershed_dataset_param_map.yml

See Also

hydro_param.pywatershed_config : Schema for pywatershed run configs.

Source code in src/hydro_param/project.py

def generate_pywatershed_template(project_name: str) -> str:
    """Generate a well-commented pywatershed run config template (v4.0).

    Produce a starter ``pywatershed_run.yml`` for Phase 2 parameterization.
    This config consumes existing SIR output produced by ``hydro-param run``
    and derives pywatershed-specific parameters.  The v4.0 format includes
    three data sections (``static_datasets``, ``forcing``, ``climate_normals``)
    that declare the pipeline-to-parameter data contract.

    Parameters
    ----------
    project_name : str
        Project name inserted into the output path and header comment.

    Returns
    -------
    str
        YAML content suitable for writing to ``configs/pywatershed_run.yml``.

    References
    ----------
    - ``docs/reference/pywatershed_parameterization_guide.md``
    - ``docs/reference/pywatershed_dataset_param_map.yml``

    See Also
    --------
    hydro_param.pywatershed_config : Schema for pywatershed run configs.
    """
    return f"""\
# pywatershed run configuration for {project_name}
#
# Phase 2: derive pywatershed parameters from existing SIR output.
# Run Phase 1 first to produce SIR output:
#   hydro-param run configs/pipeline.yml
# Then run this config:
#   hydro-param pywatershed run configs/pywatershed_run.yml
#
# Reference:
#   docs/reference/pywatershed_parameterization_guide.md
#   docs/reference/pywatershed_dataset_param_map.yml

target_model: pywatershed
version: "4.0"

# --- SIR Output ---
# Path to the pipeline output directory containing SIR files and .manifest.yml.
# Relative paths are resolved from this config file's directory.
sir_path: "output"

# --- Domain ---
# Pre-existing geospatial fabric files (GeoPackage or GeoParquet).
# Obtain fabrics with pynhd, pygeohydro, or similar upstream tools —
# hydro-param does NOT fetch or subset fabrics.
domain:
  fabric_path: "data/fabrics/nhru.gpkg"        # REQUIRED: path to HRU fabric
  segment_path: "data/fabrics/nsegment.gpkg"   # path to segment/flowline fabric
  # waterbody_path: "data/fabrics/waterbodies.gpkg"  # NHDPlus waterbody polygons (optional)
  id_field: "nhm_id"
  segment_id_field: "nhm_seg"

# --- Simulation Period ---
time:
  start: "1980-10-01"
  end: "2020-09-30"
  timestep: daily

# --- Static Datasets ---
# Declare which pipeline datasets provide each pywatershed parameter.
# Each entry maps a pywatershed parameter to its SIR data source.
static_datasets:

  topography:
    available: [dem_3dep_10m]
    hru_elev:
      source: dem_3dep_10m
      variable: elevation
      statistic: mean
      description: "Mean HRU elevation"
    hru_slope:
      source: dem_3dep_10m
      variable: slope
      statistic: mean
      description: "Mean land surface slope"
    hru_aspect:
      source: dem_3dep_10m
      variable: aspect
      statistic: mean
      description: "Mean HRU aspect"

  soils:
    available: [polaris_30m, gnatsgo_rasters]
    soil_type:
      source: polaris_30m
      variables: [sand, silt, clay]
      statistic: mean
      description: "Soil type classification (1=sand, 2=loam, 3=clay)"
    sat_threshold:
      source: polaris_30m
      variable: theta_s
      statistic: mean
      description: "Gravity reservoir storage capacity (from porosity)"
    soil_moist_max:
      source: gnatsgo_rasters
      variable: aws0_100
      statistic: mean
      description: "Maximum available water-holding capacity"
    soil_rechr_max_frac:
      source: gnatsgo_rasters
      variables: [rootznemc, rootznaws]
      statistic: mean
      description: "Recharge zone storage as fraction of soil_moist_max"

  landcover:
    available: [nlcd_osn_lndcov, nlcd_osn_fctimp]
    cov_type:
      source: nlcd_osn_lndcov
      variable: LndCov
      statistic: categorical
      year: [2021]
      description: "Vegetation cover type (0=bare, 1=grasses, 2=shrubs, 3=trees, 4=coniferous)"
    hru_percent_imperv:
      source: nlcd_osn_fctimp
      variable: FctImp
      statistic: mean
      year: [2021]
      description: "Impervious surface fraction"

  snow:
    available: [snodas]
    snarea_thresh:
      source: snodas
      variable: SWE
      statistic: mean
      time_period: ["2020-01-01", "2021-12-31"]
      description: "Snow depletion threshold (calibration seed from historical max SWE)"

  waterbodies:
    available: []
    # Uncomment after adding waterbody_path above:
    # hru_type:
    #   source: domain.waterbody_path
    #   description: "HRU type (0=inactive, 1=land, 2=lake, 3=swale)"
    # dprst_frac:
    #   source: domain.waterbody_path
    #   description: "Fraction of HRU with surface depressions"

# --- Forcing ---
# Temporal climate time series for model input (prcp, tmax, tmin).
forcing:
  available: [gridmet]
  prcp:
    source: gridmet
    variable: pr
    statistic: mean
    description: "Daily precipitation"
  tmax:
    source: gridmet
    variable: tmmx
    statistic: mean
    description: "Daily maximum temperature"
  tmin:
    source: gridmet
    variable: tmmn
    statistic: mean
    description: "Daily minimum temperature"

# --- Climate Normals ---
# Long-term climate statistics for derived parameters (PET, transpiration).
# Can use a different source than forcing (e.g., gridMET normals with CONUS404-BA forcing).
climate_normals:
  available: [gridmet]
  jh_coef:
    source: gridmet
    variables: [tmmx, tmmn]
    description: "Jensen-Haise PET coefficient (monthly, from tmax/tmin normals)"
  transp_beg:
    source: gridmet
    variable: tmmn
    description: "Month transpiration begins (from monthly mean tmin threshold)"
  transp_end:
    source: gridmet
    variable: tmmn
    description: "Month transpiration ends (from monthly mean tmin threshold)"

# --- Parameter Overrides ---
# Manually override any derived parameter value.
parameter_overrides:
  values: {{}}
  # values:
  #   tmax_allsnow: 32.0
  #   den_max: 0.55

# --- Calibration ---
calibration:
  generate_seeds: true
  seed_method: physically_based         # physically_based or all_defaults
  preserve_from_existing: []

# --- Output ---
output:
  path: "models/pywatershed"
  format: netcdf                        # netcdf or prms_text
  parameter_file: "parameters.nc"
  forcing_dir: "forcing"                  # one-variable-per-file NetCDF (prcp.nc, tmax.nc, tmin.nc)
  control_file: "control.yml"
  soltab_file: "soltab.nc"
"""

generate_gitignore ¶

generate_gitignore() -> str

Generate .gitignore content for a hydro-param project.

The ignore rules track lightweight config files and the marker, while ignoring downloaded raster/vector data, pipeline output, and model exports. Large geospatial formats (*.nc, *.tif, *.zarr/) are caught by a safety-net section.

RETURNS	DESCRIPTION
`str`	Content suitable for writing to `.gitignore` in the project root.

Source code in src/hydro_param/project.py

def generate_gitignore() -> str:
    """Generate ``.gitignore`` content for a hydro-param project.

    The ignore rules track lightweight config files and the marker, while
    ignoring downloaded raster/vector data, pipeline output, and model
    exports.  Large geospatial formats (``*.nc``, ``*.tif``, ``*.zarr/``)
    are caught by a safety-net section.

    Returns
    -------
    str
        Content suitable for writing to ``.gitignore`` in the project root.
    """
    return """\
# hydro-param project
#
# Track: .hydro-param marker, configs/
# Ignore: downloaded data, pipeline output, model exports

# Downloaded data (large raster/vector files)
data/topography/
data/land_cover/
data/soils/
data/geology/
data/hydrography/
data/climate/
data/snow/
data/water_bodies/
data/fabrics/*.tif
data/fabrics/*.tiff

# Pipeline output
output/

# Model exports
models/

# Common large geospatial formats (safety net)
*.nc
*.tif
*.tiff
*.zarr/
"""

init_project ¶

init_project(
    project_dir: Path,
    *,
    force: bool = False,
    registry_path: Path | None = None,
) -> None

Scaffold a hydro-param project directory.

Create the standard directory structure expected by the pipeline:

configs/ -- pipeline and pywatershed run config templates
data/fabrics/ -- target polygon files
data/<category>/ -- one subdirectory per dataset category
output/ -- pipeline results
models/pywatershed/ -- pywatershed model exports
.hydro-param -- marker file for project root discovery
.gitignore -- rules to keep large data out of version control

Existing config templates (pipeline.yml, pywatershed_run.yml) are never overwritten, even with force=True, so user edits are preserved. The .gitignore is always regenerated because it is declarative and safe to replace.

PARAMETER	DESCRIPTION
`project_dir`	Root directory for the new project. Created if it does not exist. TYPE: `Path`
`force`	If `True`, re-initialise an existing project (refreshes the marker and creates missing directories, but never overwrites existing config templates). TYPE: `bool` DEFAULT: `False`
`registry_path`	Optional path to a dataset registry YAML file or directory for category discovery. When `None`, :data:`DEFAULT_CATEGORIES` is used. TYPE: `Path or None` DEFAULT: `None`

RAISES	DESCRIPTION
`SystemExit`	If the directory already contains a `.hydro-param` marker and force is `False`.

Notes

The marker file (.hydro-param) is a YAML file containing the project name and UTC creation timestamp. It is used by :func:find_project_root for upward directory discovery.

Source code in src/hydro_param/project.py

def init_project(
    project_dir: Path,
    *,
    force: bool = False,
    registry_path: Path | None = None,
) -> None:
    """Scaffold a hydro-param project directory.

    Create the standard directory structure expected by the pipeline:

    - ``configs/`` -- pipeline and pywatershed run config templates
    - ``data/fabrics/`` -- target polygon files
    - ``data/<category>/`` -- one subdirectory per dataset category
    - ``output/`` -- pipeline results
    - ``models/pywatershed/`` -- pywatershed model exports
    - ``.hydro-param`` -- marker file for project root discovery
    - ``.gitignore`` -- rules to keep large data out of version control

    Existing config templates (``pipeline.yml``, ``pywatershed_run.yml``)
    are never overwritten, even with ``force=True``, so user edits are
    preserved.  The ``.gitignore`` is always regenerated because it is
    declarative and safe to replace.

    Parameters
    ----------
    project_dir : Path
        Root directory for the new project.  Created if it does not exist.
    force : bool
        If ``True``, re-initialise an existing project (refreshes the
        marker and creates missing directories, but never overwrites
        existing config templates).
    registry_path : Path or None
        Optional path to a dataset registry YAML file or directory for
        category discovery.  When ``None``, :data:`DEFAULT_CATEGORIES`
        is used.

    Raises
    ------
    SystemExit
        If the directory already contains a ``.hydro-param`` marker and
        *force* is ``False``.

    Notes
    -----
    The marker file (``.hydro-param``) is a YAML file containing the
    project name and UTC creation timestamp.  It is used by
    :func:`find_project_root` for upward directory discovery.
    """
    project_dir = project_dir.resolve()
    marker_path = project_dir / MARKER_FILE
    config_path = project_dir / "configs" / "pipeline.yml"

    if marker_path.exists() and not force:
        print(
            f"Error: '{project_dir}' is already a hydro-param project.\n"
            "Use --force to re-initialize.",
            file=sys.stderr,
        )
        raise SystemExit(1)

    project_name = project_dir.name

    # Build list of directories
    categories = get_data_categories(registry_path)
    dirs_to_create = [
        project_dir / "configs",
        project_dir / "data" / "fabrics",
        project_dir / "output",
        project_dir / "models",
        project_dir / "models" / "pywatershed",
    ]
    for cat in categories:
        dirs_to_create.append(project_dir / "data" / cat)

    for d in dirs_to_create:
        d.mkdir(parents=True, exist_ok=True)

    # Marker file
    marker_content = {
        "name": project_name,
        "created": datetime.now(timezone.utc).isoformat(),
    }
    marker_path.write_text(yaml.dump(marker_content, default_flow_style=False))

    # Template pipeline config (never overwrite existing)
    if not config_path.exists():
        config_path.write_text(generate_pipeline_template(project_name))

    # Template pywatershed config (never overwrite existing)
    pws_config_path = project_dir / "configs" / "pywatershed_run.yml"
    if not pws_config_path.exists():
        pws_config_path.write_text(generate_pywatershed_template(project_name))

    # .gitignore (overwrite is safe — declarative)
    (project_dir / ".gitignore").write_text(generate_gitignore())

    # Summary
    print(f"Initialized hydro-param project in {project_dir}/\n")
    print("Created:")
    print(f"  {MARKER_FILE:<28s} Project marker")
    print(f"  {'configs/pipeline.yml':<28s} Pipeline configuration template")
    print(f"  {'configs/pywatershed_run.yml':<28s} pywatershed run config template")
    print(f"  {'data/fabrics/':<28s} Target polygon files")
    for cat in categories:
        print(f"  data/{cat + '/':<22s} Dataset downloads")
    print(f"  {'output/':<28s} Pipeline results")
    print(f"  {'models/':<28s} Model exports")
    print(f"  {'models/pywatershed/':<28s} pywatershed model files")
    print(f"  {'.gitignore':<28s} Git ignore rules")
    print()
    print("Next steps:")
    print("  1. Place your fabric files in data/fabrics/")
    print("  2. Edit configs/pipeline.yml (dataset sources, domain)")
    print("  3. Edit configs/pywatershed_run.yml (time period, output options)")
    print("  4. Run the pipeline:")
    print("       hydro-param run configs/pipeline.yml")
    print("  5. Run pywatershed parameterization:")
    print("       hydro-param pywatershed run configs/pywatershed_run.yml")

project¶

project ¶

MARKER_FILE module-attribute ¶

DEFAULT_CATEGORIES module-attribute ¶

find_project_root ¶

get_data_categories ¶

generate_pipeline_template ¶

generate_pywatershed_template ¶

generate_gitignore ¶

init_project ¶

MARKER_FILE `module-attribute` ¶

DEFAULT_CATEGORIES `module-attribute` ¶