batching¶
batching
¶
Spatial batching: assign features to spatially contiguous groups.
Group polygon features into spatially contiguous batches using KD-tree recursive bisection. This ensures that each batch's bounding box is compact, which is critical for efficient spatial subsetting of high-resolution source rasters (e.g., 3DEP at 10 m, gNATSGO at 30 m).
Without spatial batching, arbitrarily ordered fabric features would produce large bounding boxes that fetch far more raster data than needed, leading to excessive memory use and slow processing. The KD-tree approach (design.md section 5.5.1, Approach 5) provides O(n log n) partitioning with guaranteed spatial locality.
See Also
design.md : Section 5.5.1 (spatial batching approaches and trade-offs). hydro_param.pipeline : Orchestrator that processes batches sequentially.
References
.. [1] design.md section 5.5.1 -- KD-tree recursive bisection (Approach 5).
spatial_batch
¶
Assign spatially contiguous batch IDs via KD-tree recursive bisection.
Group polygon features into spatially compact batches so that each batch's bounding box covers a small geographic area. This is the primary entry point for spatial batching in the pipeline, called during stage 1 (resolve fabric) before any data access occurs.
Compact bounding boxes are critical for memory efficiency: when the pipeline fetches source rasters clipped to a batch's bbox, a tight bbox means less data loaded into memory. For high-resolution datasets like gNATSGO (30 m, ~1.25 GB per variable), this prevents OOM errors.
| PARAMETER | DESCRIPTION |
|---|---|
gdf
|
Target fabric with polygon geometries (Polygon or MultiPolygon). May be in any CRS -- centroids are computed for spatial grouping only (approximate centroids are sufficient).
TYPE:
|
batch_size
|
Target number of features per batch. The actual batch sizes will
vary due to the recursive bisection algorithm (typically within
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
GeoDataFrame
|
Copy of input with a |
Notes
For fabrics with geographic CRS (e.g., EPSG:4326), the centroid
computation emits a UserWarning about geographic CRS accuracy.
This warning is suppressed because only approximate centroids are
needed for spatial grouping -- the batch boundaries do not need to be
geometrically precise.
The recursion depth is computed as ceil(log2(n_features / batch_size))
to produce approximately the right number of batches. The
min_batch_size is set to batch_size / 2 to prevent excessive
fragmentation.
Examples:
>>> import geopandas as gpd
>>> fabric = gpd.read_file("nhru.gpkg")
>>> batched = spatial_batch(fabric, batch_size=200)
>>> batched["batch_id"].nunique()
4 # for ~765 features with batch_size=200
See Also
_recursive_bisect : The recursive partitioning algorithm. hydro_param.pipeline : Uses batch IDs to iterate over spatial groups.
Source code in src/hydro_param/batching.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |