data_access¶
data_access
¶
Data access: STAC COG loading, local GeoTIFF loading, terrain derivation.
Provide unified functions for fetching raster source data from multiple backends (Planetary Computer STAC, local GeoTIFFs, remote VRTs via GDAL vsicurl) and for deriving terrain variables (slope, aspect) from elevation grids using pure numpy.
The module supports three of the five processing strategies defined in the pipeline architecture:
stac_cog-- :func:fetch_stac_cogvia Planetary Computer or USGS GDP STAClocal_tiff-- :func:fetch_local_tifffor local files or HTTP(S) COGsclimr_cat-- :func:build_climr_cat_dictfor ClimateR-Catalog OPeNDAP
The remaining two strategies (nhgf_stac static and temporal) are handled
directly in :mod:hydro_param.processing via gdptools NHGFStacTiffData
and NHGFStacData classes.
See Also
design.md : Sections 6.12 (local data preference) and 11.4 (access strategies). hydro_param.processing : Zonal statistics and temporal aggregation.
References
.. [1] design.md section 6.12 -- Local data over remote services. .. [2] design.md section 11.4 -- Data access strategy details.
derive_slope
¶
derive_slope(
elevation: DataArray,
method: str = "horn",
*,
x_coord: str = "x",
y_coord: str = "y",
) -> xr.DataArray
Compute terrain slope in degrees from an elevation DataArray.
Derive slope from a DEM raster using the Horn (1981) finite-difference
method implemented via numpy.gradient. The result is used by
pywatershed derivation step 3 (topography) for parameters such as
hru_slope.
Elevation units do not matter for slope computation as long as they
are consistent with the cell size (both converted to meters internally
via :func:_cell_sizes_meters).
| PARAMETER | DESCRIPTION |
|---|---|
elevation
|
2-D elevation raster with spatial coordinates. May be in any CRS; geographic CRS coordinates (degrees) are automatically converted to meters for gradient computation.
TYPE:
|
method
|
Derivation method. Only
TYPE:
|
x_coord
|
Name of the x coordinate dimension in the DataArray.
TYPE:
|
y_coord
|
Name of the y coordinate dimension in the DataArray.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataArray
|
Slope in degrees [0, 90], with the same shape, coordinates, and
CRS as the input elevation. Output attributes include
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
Notes
The Horn method computes slope as arctan(sqrt(dz/dx² + dz/dy²)),
where the partial derivatives are estimated by numpy.gradient
using second-order central differences.
References
.. [1] Horn, B.K.P. (1981). "Hill shading and the reflectance map." Proceedings of the IEEE, 69(1), 14-47.
See Also
derive_aspect : Compute terrain aspect from the same elevation input. _cell_sizes_meters : Convert coordinate spacing to meters.
Source code in src/hydro_param/data_access.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | |
derive_aspect
¶
derive_aspect(
elevation: DataArray,
method: str = "horn",
*,
x_coord: str = "x",
y_coord: str = "y",
) -> xr.DataArray
Compute terrain aspect in degrees clockwise from north.
Derive aspect (downslope direction) from a DEM raster using the Horn
(1981) finite-difference method. The result is used by pywatershed
derivation step 3 (topography) for parameters such as hru_aspect.
Aspect is measured as a compass bearing: 0 degrees = north, 90 = east, 180 = south, 270 = west. Flat areas where both partial derivatives are zero will have an aspect of 0 degrees (north by convention).
| PARAMETER | DESCRIPTION |
|---|---|
elevation
|
2-D elevation raster with spatial coordinates. May be in any CRS; geographic CRS coordinates (degrees) are automatically converted to meters for gradient computation.
TYPE:
|
method
|
Derivation method. Only
TYPE:
|
x_coord
|
Name of the x coordinate dimension in the DataArray.
TYPE:
|
y_coord
|
Name of the y coordinate dimension in the DataArray.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataArray
|
Aspect in degrees [0, 360), with the same shape, coordinates, and
CRS as the input elevation. Output attributes include
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
Notes
Aspect is computed as atan2(-dz/dx, -dz/dy) (negated gradients
give the descent direction), then converted from mathematical angle
to compass bearing by adding 360 and taking modulo 360.
References
.. [1] Horn, B.K.P. (1981). "Hill shading and the reflectance map." Proceedings of the IEEE, 69(1), 14-47.
See Also
derive_slope : Compute terrain slope from the same elevation input. _cell_sizes_meters : Convert coordinate spacing to meters.
Source code in src/hydro_param/data_access.py
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | |
derive_sin_aspect
¶
derive_sin_aspect(
elevation: DataArray,
method: str = "horn",
*,
x_coord: str = "x",
y_coord: str = "y",
) -> xr.DataArray
Compute sine of terrain aspect for circular mean aggregation.
Derive per-pixel sin(aspect) from a DEM raster. Arithmetic zonal
mean of sine values is one component of the circular mean: the HRU-level
aspect is recovered as atan2(mean_sin, mean_cos).
| PARAMETER | DESCRIPTION |
|---|---|
elevation
|
2-D elevation raster with spatial coordinates.
TYPE:
|
method
|
Derivation method. Only
TYPE:
|
x_coord
|
Name of the x coordinate dimension in the DataArray.
TYPE:
|
y_coord
|
Name of the y coordinate dimension in the DataArray.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataArray
|
Sine of aspect, unitless [-1, 1], with the same shape, coordinates, and CRS as the input elevation. |
See Also
derive_cos_aspect : Cosine component for circular mean. derive_aspect : Raw aspect in degrees.
Source code in src/hydro_param/data_access.py
derive_cos_aspect
¶
derive_cos_aspect(
elevation: DataArray,
method: str = "horn",
*,
x_coord: str = "x",
y_coord: str = "y",
) -> xr.DataArray
Compute cosine of terrain aspect for circular mean aggregation.
Derive per-pixel cos(aspect) from a DEM raster. Arithmetic zonal
mean of cosine values is one component of the circular mean: the HRU-level
aspect is recovered as atan2(mean_sin, mean_cos).
| PARAMETER | DESCRIPTION |
|---|---|
elevation
|
2-D elevation raster with spatial coordinates.
TYPE:
|
method
|
Derivation method. Only
TYPE:
|
x_coord
|
Name of the x coordinate dimension in the DataArray.
TYPE:
|
y_coord
|
Name of the y coordinate dimension in the DataArray.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataArray
|
Cosine of aspect, unitless [-1, 1], with the same shape, coordinates, and CRS as the input elevation. |
See Also
derive_sin_aspect : Sine component for circular mean. derive_aspect : Raw aspect in degrees.
Source code in src/hydro_param/data_access.py
classify_usda_texture_raster
¶
Classify sand/silt/clay percentage rasters into USDA texture classes.
Wrapper around classify_usda_texture() that normalizes inputs
and handles xarray DataArray I/O. Returns a float64 raster with
class codes (1--12) suitable for categorical zonal statistics.
Before classification, three data-cleaning steps are applied:
- Negative clamping — ML-derived products (e.g. POLARIS) can produce negative regression artifacts. Negative values are clamped to zero with a WARNING log.
- Near-zero guard — Pixels where sand+silt+clay < 0.1% are treated as no-data (set to NaN) since they cannot be meaningfully normalized.
- Sum normalization — Remaining pixels are rescaled so that sand+silt+clay = 100%. This is necessary because POLARIS (and similar products) estimate each fraction independently, so pixel-level sums can deviate significantly from 100%. Normalization preserves the relative proportions — which correctly indicate the texture triangle region — while ensuring each pixel represents a valid soil composition.
| PARAMETER | DESCRIPTION |
|---|---|
sand
|
Sand content as percentage (0--100), 2-D raster.
TYPE:
|
silt
|
Silt content as percentage (0--100), 2-D raster.
TYPE:
|
clay
|
Clay content as percentage (0--100), 2-D raster.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataArray
|
Float64 raster with USDA texture class codes (1--12). Elements where any input is NaN remain NaN. |
Notes
The normalization tolerance (0.01%) is tighter than the downstream
classify_usda_texture() warning threshold (5%), so all pixels
reaching the classifier will have sums within floating-point
precision of 100%.
See Also
hydro_param.classification.classify_usda_texture : Core classifier. hydro_param.classification.USDA_TEXTURE_CLASSES : Code-to-name mapping.
Source code in src/hydro_param/data_access.py
371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 | |
align_rasters
¶
align_rasters(
sources: list[DataArray],
template: DataArray,
method: str = "nearest",
) -> list[xr.DataArray]
Align source rasters to a template grid via reprojection.
Non-template sources are reprojected to match the template's CRS,
resolution, and extent using rioxarray.reproject_match(). The
template itself is passed through unchanged.
| PARAMETER | DESCRIPTION |
|---|---|
sources
|
Source rasters to align.
TYPE:
|
template
|
Reference raster whose grid defines the target CRS, resolution, and extent.
TYPE:
|
method
|
Rasterio resampling method name (default
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[DataArray]
|
Aligned rasters in the same order as sources, all sharing the template's grid. |
Source code in src/hydro_param/data_access.py
apply_raster_operation
¶
Apply an arithmetic operation across source rasters via left-to-right fold.
| PARAMETER | DESCRIPTION |
|---|---|
sources
|
Two or more rasters to combine. Must be on the same grid
(use
TYPE:
|
operation
|
Arithmetic operation to apply.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataArray
|
Result of |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If operation is not one of the supported operations, or if sources contains fewer than 2 rasters. |
Source code in src/hydro_param/data_access.py
query_stac_items
¶
Query a STAC catalog and return matching items for a bounding box.
Handle client creation, optional Planetary Computer signing, spatial
search, and GSD (ground sample distance) filtering. The returned items
can be passed to fetch_stac_cog(..., items=...) to avoid repeated
queries when multiple variables share the same STAC collection within
a single processing batch.
Enables STAC query reuse across variables, reducing redundant network calls during batch processing.
| PARAMETER | DESCRIPTION |
|---|---|
entry
|
Registry entry with
TYPE:
|
bbox
|
Bounding box as
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Any]
|
Matching STAC items (signed if |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
RuntimeError
|
If no STAC items match the given collection and bounding box. |
Notes
When entry.gsd is set, items are filtered to those matching the
target ground sample distance (e.g., 10 m for 3DEP 1/3 arc-second).
If no items match the GSD filter, all unfiltered items are returned
with a warning.
See Also
fetch_stac_cog : Load and mosaic COG data from the returned items.
Source code in src/hydro_param/data_access.py
585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 | |
fetch_stac_cog
¶
fetch_stac_cog(
entry: DatasetEntry,
bbox: list[float],
*,
asset_key: str | None = None,
items: list[Any] | None = None,
) -> xr.DataArray
Load Cloud Optimized GeoTIFF(s) from a STAC catalog, clipped to bbox.
Fetch one or more COG tiles from a STAC collection (e.g., 3DEP on
Planetary Computer, gNATSGO rasters), clip each to the bounding box,
and mosaic them into a single DataArray. This is the primary data
loader for the stac_cog processing strategy.
Multi-tile mosaicing is handled automatically when a bounding box spans multiple STAC items (common at DEM tile boundaries).
| PARAMETER | DESCRIPTION |
|---|---|
entry
|
Registry entry with
TYPE:
|
bbox
|
Bounding box as
TYPE:
|
asset_key
|
Per-variable STAC asset key override. When not
TYPE:
|
items
|
Pre-fetched STAC items from :func:
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataArray
|
2-D raster data clipped to the bounding box, in the source CRS (typically EPSG:4269 for 3DEP, EPSG:5070 for gNATSGO). The band dimension is squeezed out. |
| RAISES | DESCRIPTION |
|---|---|
KeyError
|
If |
RuntimeError
|
If no data remains after clipping all tiles to the bounding box. |
Notes
Tiles that do not overlap the bounding box after precise clipping are silently skipped (logged at DEBUG level). This is expected when STAC search returns items whose coarse footprints overlap but whose actual data does not.
See Also
query_stac_items : Pre-query STAC items for reuse across variables. fetch_local_tiff : Load data from local GeoTIFF files.
Source code in src/hydro_param/data_access.py
679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 | |
fetch_local_tiff
¶
fetch_local_tiff(
entry: DatasetEntry,
bbox: list[float],
*,
dataset_name: str = "unknown",
variable_source: str | None = None,
) -> xr.DataArray
Load a local GeoTIFF or remote COG/VRT clipped to a bounding box.
Read a raster file referenced by variable_source (per-variable
override) or entry.source (dataset-level default), clip it to the
bounding box, and return a 2-D DataArray. This is the primary data
loader for the local_tiff processing strategy.
Despite the name, this function supports both local file paths and
remote HTTP(S) URLs (opened via GDAL vsicurl). The function name
matches the strategy="local_tiff" enum value in the dataset
registry schema.
Source resolution order:
variable_source(per-variable override, e.g., for POLARIS where each soil property is a separate file)entry.source(dataset-level path from pipeline config)- Raise
ValueErrorwith download instructions if available
| PARAMETER | DESCRIPTION |
|---|---|
entry
|
Registry entry with
TYPE:
|
bbox
|
Bounding box as
TYPE:
|
dataset_name
|
Dataset name for use in error messages and download instructions.
TYPE:
|
variable_source
|
Per-variable source path or URL. Takes precedence over
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataArray
|
2-D raster data clipped to the bounding box, in the source CRS. The band dimension is squeezed out. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If no source is available (neither |
FileNotFoundError
|
If the source is a local path that does not exist on disk. |
RuntimeError
|
If no data remains after clipping to the bounding box, or if a remote URL fails to open. |
See Also
fetch_stac_cog : Load data from STAC-cataloged COGs. save_to_geotiff : Write a DataArray back to GeoTIFF format.
Source code in src/hydro_param/data_access.py
812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 | |
save_to_geotiff
¶
Write an xarray DataArray to a GeoTIFF file.
Save raster data as a single-band GeoTIFF via rioxarray. This is used by the pipeline to cache intermediate raster results (e.g., clipped source data for a batch) before passing them to gdptools ZonalGen.
The function temporarily removes _FillValue from attributes and
encoding to avoid conflicts with rioxarray's internal encoding, then
restores them afterward. This avoids a full .copy() of the
DataArray, which was identified as a memory bottleneck in PR #72.
| PARAMETER | DESCRIPTION |
|---|---|
da
|
Raster data with CRS information (via rioxarray accessor).
TYPE:
|
path
|
Output file path. Parent directory must exist.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
The output path (same as input), for convenience in chaining. |
Notes
Uses a sentinel-based pop/restore pattern instead of da.copy()
to avoid doubling memory usage for large rasters (e.g., gNATSGO
tiles at ~1.25 GB each). See PR #72 for the memory optimization
rationale.
See Also
fetch_local_tiff : Load a GeoTIFF back into a DataArray. fetch_stac_cog : Load COG data from STAC.
Source code in src/hydro_param/data_access.py
load_climr_catalog
¶
Load the ClimateR-Catalog as a pandas DataFrame.
Fetch and parse the ClimateR-Catalog parquet file, which indexes
1,700+ climate and environmental datasets with OPeNDAP endpoints.
The catalog is used by the climr_cat processing strategy to
resolve variable-level metadata for gdptools ClimRCatData.
| PARAMETER | DESCRIPTION |
|---|---|
catalog_url
|
URL to the catalog parquet file. Defaults to the June 2024 release from the mikejohnson51/climateR-catalogs repository.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
The full ClimateR catalog with columns including |
References
.. [1] Johnson, J.M. climateR-catalogs. https://github.com/mikejohnson51/climateR-catalogs
See Also
build_climr_cat_dict : Build per-variable dicts from the loaded catalog.
Source code in src/hydro_param/data_access.py
build_climr_cat_dict
¶
build_climr_cat_dict(
catalog: DataFrame,
catalog_id: str,
variable_names: list[str],
) -> dict[str, dict[str, Any]]
Build a source_cat_dict for gdptools ClimRCatData.
Extract per-variable metadata rows from the ClimateR-Catalog and
format them as the dictionary structure expected by gdptools
ClimRCatData constructor. Each entry contains the OPeNDAP URL,
variable name, CRS, and spatial extent needed for data access.
| PARAMETER | DESCRIPTION |
|---|---|
catalog
|
Full ClimateR catalog loaded by :func:
TYPE:
|
catalog_id
|
Dataset identifier in the catalog (e.g.,
TYPE:
|
variable_names
|
Variables to extract (e.g.,
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, dict[str, Any]]
|
Mapping of variable name to catalog row as a dictionary. Each
value contains all columns from the catalog for that variable,
ready for |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
See Also
load_climr_catalog : Load the catalog DataFrame. TemporalProcessor.process_climr_cat : Use the dict for temporal processing.