Emission Factor Uncertainty Mapping

Emission Factor Uncertainty Mapping is the variance-propagation stage that converts deterministic emission factors into spatially explicit confidence envelopes, turning a single carbon-stock number into a defensible probability surface — and it is the statistical backbone of the Spatial Modeling & Carbon Stock Validation framework. Unlike static tabular factors drawn from IPCC default tables, a mapped uncertainty field captures regional heterogeneity, measurement error, and model-induced variance across complex landscapes, then carries that variance forward as a first-class data product rather than a footnote appended after the numbers are produced.

This component sits directly downstream of biomass estimation from LiDAR & SAR fusion, inheriting the calibrated aboveground biomass rasters and their per-pixel error bands, and runs alongside ground-truth alignment for carbon models, which supplies the field-to-sensor variance ratios that anchor the propagation. It depends on deterministic CRS alignment established in the foundational MRV layer to keep every area-weighted variance term honest, and it emits envelopes that must satisfy MRV data lineage requirements before any figure reaches a registry. The core engineering challenge is propagating uncertainty through spatial operations without introducing artificial correlation or masking legitimate ecological signal.

Role in the MRV Workflow

Uncertainty mapping operates at the modeling-synchronization layer of the carbon accounting pipeline, where biomass estimates, soil-carbon proxies, and land-use-change indicators are fused with explicit variance envelopes prior to compliance export. It is a transformation with hard upstream contracts and unforgiving downstream consumers. Upstream, it requires every input raster to arrive with a machine-readable datum tag, a per-pixel error estimate, and a declared spatial support scale; a biomass layer that carries a mean but no sigma cannot be propagated, only guessed at. Downstream, the percentile bounds it emits feed conservative-accounting rules that decide how many tonnes a project may actually claim, so an envelope that is too tight inflates issuance and an envelope that is too wide strands legitimate credit.

The stage consumes the fused biomass and backscatter products handed over by the LiDAR/SAR fusion step and the plot-to-pixel calibration residuals produced during ground-truth alignment. It treats those residuals as the empirical seed for its covariance model rather than assuming a textbook variance. Where the fusion stage reports sensor disagreement or backscatter saturation, the uncertainty map must widen its bounds locally instead of averaging the signal away. This is the difference between a variance surface that reflects real epistemic limits and one that has been smoothed into a comforting but indefensible uniformity.

Crucially, the synchronization stage produces more than a carbon raster — it produces a continuous, propagating field of confidence. Modern pipelines treat that field as a committed artifact: the mean estimate, the lower and upper bounds, and the propagation sigma are serialized together with the covariance assumptions and the propagation pathway, so a downstream verifier can reconstruct not just the number but the reasoning that bounded it. That contract — every carbon figure ships with an addressable, reproducible uncertainty envelope and a recorded propagation method — is what lets the threshold tuning for carbon stock baselines stage apply data-density-aware confidence intervals instead of a blanket global tolerance.

Core Failure Modes

Three failure modes dominate production uncertainty mapping in spatial carbon MRV. Each has a distinct root cause and a measurable impact on the credibility of the reported envelope.

Artificial correlation inflation from independence assumptions. The textbook first-order Taylor propagation assumes inputs are independent, but geospatial layers are strongly autocorrelated: neighbouring pixels share sensor footprints, atmospheric conditions, and allometric calibrations. Treating spatially correlated pixels as independent understates the joint variance of any area-aggregated estimate, because independent errors cancel under summation while correlated errors accumulate. On a 50-hectare parcel aggregated from 30-meter pixels, ignoring positive spatial autocorrelation can collapse the reported standard error of the total stock by 60–80% relative to a covariance-corrected estimate, manufacturing confidence the data does not support and producing an envelope an auditor will reject on its face.
Spatial drift and grid misalignment inflating variance. When emission-factor grids, biomass rasters, and covariate layers are not snapped to a common grid, sub-pixel offsets between sensor footprints and the reference lattice register one phenomenon against another. The resulting mismatch is recorded as variance even though it is a geometry error, not a measurement error. A half-pixel drift across a heterogeneous land-cover boundary can double the apparent local sigma, widening the envelope precisely where the ecological signal is strongest and triggering unwarranted uncertainty deductions over otherwise well-characterized stands.
Overconfident bounds in data-sparse regions. A single global confidence interval applied uniformly masks the reality that field-plot density varies by orders of magnitude across a project. In a zone with five plots per hundred square kilometers, the empirical variance estimate is itself highly uncertain, yet a global threshold reports the same tight bound it reports over densely sampled terrain. The failure is silent: the envelope looks consistent across the project while concealing that whole strata were extrapolated from almost no ground truth, the exact condition third-party verifiers probe first when they sample for field validation.

Deterministic Implementation Architecture

The implementation below propagates variance at every task boundary with explicit validation gates. It uses prefect for orchestration, rioxarray/xarray with dask for chunked raster I/O, rasterio and pyproj for explicit spatial operations, and structlog for audit-ready JSON telemetry. The propagation refuses untagged geometry, rejects misaligned grids before any arithmetic, injects a spatial covariance matrix into the Monte Carlo draw so autocorrelation is preserved rather than assumed away, and widens bounds wherever plot density falls below a configured floor — there is no silent pass-through of an indefensible envelope.

import json
from datetime import datetime, timezone
from pathlib import Path
from typing import Dict

import numpy as np
import rasterio
import rioxarray  # registers the xarray ".rio" accessor + "rasterio" engine
import xarray as xr
import pyproj
import structlog
from prefect import flow, task

# Structured, audit-ready JSON telemetry — one event per propagation boundary.
structlog.configure(
    processors=[
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso", utc=True),
        structlog.processors.JSONRenderer(),
    ]
)
log = structlog.get_logger()

# Validation gates — breaches raise rather than coercing a bad envelope downstream.
CANONICAL_CRS = "EPSG:6933"          # equal-area grid so variance is area-honest
MAX_GRID_OFFSET_PX = 0.10            # reject inputs misaligned by >0.1 pixel
MIN_PLOT_DENSITY = 5.0               # plots / 100 km² floor before bounds are widened
SPARSE_INFLATION = 1.75             # multiplier applied to sigma in under-sampled strata


@task
def align_and_load(ef_path: str, sigma_path: str,
                   target_crs: str = CANONICAL_CRS) -> xr.Dataset:
    """Load EF and its per-pixel sigma, enforce equal-area CRS, gate grid offset."""
    with rasterio.open(ef_path) as ef_src, rasterio.open(sigma_path) as sig_src:
        if ef_src.crs is None or sig_src.crs is None:
            raise ValueError("untagged input; refusing to guess a datum.")
        # Sub-pixel offset between the two grids is a geometry error, not variance.
        ax, ay = ef_src.transform.c, ef_src.transform.f
        bx, by = sig_src.transform.c, sig_src.transform.f
        px = ef_src.transform.a
        offset = max(abs(ax - bx), abs(ay - by)) / abs(px)
        if offset > MAX_GRID_OFFSET_PX:
            raise RuntimeError(
                f"grid offset {offset:.3f}px exceeds gate {MAX_GRID_OFFSET_PX}px; "
                "snap inputs to a common lattice before propagation.")

    ef = xr.open_dataarray(ef_path, engine="rasterio",
                           chunks={"x": 1024, "y": 1024}).rio.write_crs(
                               pyproj.CRS.from_user_input(target_crs))
    sigma = xr.open_dataarray(sigma_path, engine="rasterio",
                              chunks={"x": 1024, "y": 1024})
    log.info("inputs_aligned", crs=target_crs, grid_offset_px=round(offset, 4))
    return xr.Dataset({"emission_factor": ef, "ef_sigma": sigma})


def _covariance_factor(shape: tuple, corr_range_px: float) -> np.ndarray:
    """Approximate a spatially correlated draw via a separable Gaussian smoother.

    A full GMRF/variogram solve is preferred in production; this kernel preserves
    short-range autocorrelation so summed-area variance is not understated.
    """
    from scipy.ndimage import gaussian_filter
    white = np.random.normal(size=shape)
    correlated = gaussian_filter(white, sigma=corr_range_px, mode="reflect")
    # Renormalise so the per-pixel marginal variance stays unit.
    correlated /= correlated.std() or 1.0
    return correlated


@task
def propagate(ds: xr.Dataset, plot_density: xr.DataArray,
              n_samples: int = 2500, corr_range_px: float = 3.0) -> xr.Dataset:
    """Covariance-corrected Monte Carlo propagation with sparse-region inflation."""
    ef = ds["emission_factor"].values
    sigma = ds["ef_sigma"].values

    # Widen sigma where ground truth is too thin to trust the empirical variance.
    sparse = plot_density.values < MIN_PLOT_DENSITY
    sigma_eff = np.where(sparse, sigma * SPARSE_INFLATION, sigma)
    log.info("sparse_inflation_applied",
             sparse_fraction=round(float(sparse.mean()), 4),
             multiplier=SPARSE_INFLATION)

    # Spatially correlated draws — independence would collapse aggregate variance.
    draws = np.empty((n_samples, *ef.shape), dtype="float32")
    for i in range(n_samples):
        z = _covariance_factor(ef.shape, corr_range_px)
        draws[i] = ef + sigma_eff * z

    lower_90 = np.percentile(draws, 5, axis=0)
    upper_90 = np.percentile(draws, 95, axis=0)
    lower_95 = np.percentile(draws, 2.5, axis=0)
    upper_95 = np.percentile(draws, 97.5, axis=0)
    ci_width = (upper_90 - lower_90) / np.where(ef == 0, np.nan, ef)

    dims, coords = ds["emission_factor"].dims, ds["emission_factor"].coords
    out = xr.Dataset({
        "carbon_stock_mean": ds["emission_factor"],
        "carbon_stock_lower_90": xr.DataArray(lower_90, dims=dims, coords=coords),
        "carbon_stock_upper_90": xr.DataArray(upper_90, dims=dims, coords=coords),
        "carbon_stock_lower_95": xr.DataArray(lower_95, dims=dims, coords=coords),
        "carbon_stock_upper_95": xr.DataArray(upper_95, dims=dims, coords=coords),
        "ci_width_relative": xr.DataArray(ci_width, dims=dims, coords=coords),
    })
    log.info("propagation_complete", samples=n_samples,
             corr_range_px=corr_range_px,
             median_ci_width=round(float(np.nanmedian(ci_width)), 4))
    return out


@flow(name="ef_uncertainty_mapping_flow")
def run_uncertainty_pipeline(ef_path: str, sigma_path: str,
                             plot_density_path: str, output_path: str) -> Path:
    ds = align_and_load(ef_path, sigma_path)
    density = xr.open_dataarray(plot_density_path, engine="rasterio",
                                chunks={"x": 1024, "y": 1024})
    result = propagate(ds, density)

    # Lineage metadata travels with the envelope so the figure is reproducible.
    result.attrs.update({
        "compliance_standard": "Verra VM0042 / ISO 14064-3",
        "uncertainty_method": "spatial_monte_carlo_covariance_corrected",
        "confidence_levels": [0.90, 0.95],
        "canonical_crs": CANONICAL_CRS,
        "sparse_inflation": SPARSE_INFLATION,
        "generated_at": datetime.now(timezone.utc).isoformat(),
    })
    result.to_netcdf(output_path, engine="netcdf4")
    log.info("envelope_exported", output=output_path,
             attrs=json.dumps(result.attrs, default=str))
    return Path(output_path)


if __name__ == "__main__":
    run_uncertainty_pipeline(
        "data/ef.tif", "data/ef_sigma.tif",
        "data/plot_density.tif", "output/carbon_stock_uncertainty.nc")

The propagation deliberately keeps the covariance step explicit. The separable Gaussian smoother shown is a tractable stand-in for a full variogram or Gaussian Markov Random Field (GMRF) solve; in production the correlation length and anisotropy are fitted per ecological stratum from the alignment residuals rather than fixed globally. What must not change is the principle: the draw that feeds the percentile bounds carries spatial structure, so the variance of any area-aggregated total reflects the real, accumulating error instead of the artificially cancelled error an independent draw would report.

Validation, Debugging & Compliance Mapping

Each design decision in the implementation maps to a specific regulatory control, which is what makes the envelope a submission artifact rather than a developer convenience. The table below ties the code’s outputs to the requirements they satisfy.

Technical output	Regulatory application	Verification step
90% CI width (`ci_width_relative`)	Uncertainty deduction factor under Verra VM0042 (e.g. a deduction triggered once relative CI exceeds the methodology tolerance)	Third-party auditor recomputes the deduction from the raster
Covariance-corrected variance surface	ISO 14064-3 reproducibility and conservativeness of the reported total stock	Independence assumption validated against logged correlation length
Sparse-region inflation flags	Stratified field-sampling design and CSRD ESRS E1 disclosure of estimation uncertainty	Ground-truth campaign targets the inflated strata
Equal-area CRS + grid-offset gate	Area-honest aggregation required for credit-volume consistency	Reproject metadata checked against the canonical project grid

Map the outputs to controls as follows. The covariance-corrected variance surface answers ISO 14064-3, which expects a reported figure to be both reproducible and conservative; a propagation that preserves autocorrelation will not under-report the aggregate error, so the certified total stays on the conservative side of the true distribution. The relative CI width feeds Verra VM-series uncertainty deductions directly — VM0042 and related methodologies require a quantified deduction when monitored-parameter uncertainty exceeds a threshold, and exporting the width as a continuous raster lets the platform apply that deduction per stratum without manual intervention. The sparse-inflation flags and the disclosed confidence levels satisfy CSRD ESRS E1, which scrutinizes land-use and agriculture disclosures for transparent treatment of estimation uncertainty rather than a single unqualified number.

For debugging, treat the median CI width, the sparse fraction, and the grid offset as monitored signals on every run, including the ones that pass, so a slowly drifting upstream export or a quietly resampled covariate surfaces as a trend long before any single run breaches tolerance. Three recurring silent failures deserve dedicated diagnostics: an independence assumption that survives into a covariance-aware codepath and collapses aggregate variance; a temporal mismatch between a multi-year emission-factor composite and a snapshot biomass layer, which should be propagated as an additive temporal-variance term rather than ignored; and a fallback to IPCC Tier 1 conservative defaults in under-sampled zones that fires so often it has quietly become the de-facto estimate. Validation should include variogram analysis of the propagated residuals, cross-validation against held-out field plots, and sensitivity testing of the correlation length, all calibrated against the field-to-sensor variance ratios produced during ground-truth alignment for carbon models.

Conclusion

Emission Factor Uncertainty Mapping is what converts deterministic carbon accounting into a statistically defensible spatial science. By treating uncertainty as a continuous, propagating field — preserving spatial covariance in the Monte Carlo draw, gating grid offsets before they masquerade as variance, and widening bounds wherever ground truth is too thin to trust — engineering teams eliminate the manufactured confidence that historically triggers verifier rejection. The result is a synchronization stage whose every carbon figure ships with a reproducible envelope, a recorded propagation method, and a deduction an auditor can recompute from the raster. To apply those envelopes as data-density-aware confidence intervals on the baseline itself, continue with Threshold Tuning for Carbon Stock Baselines.

Spatial Modeling & Carbon Stock Validation — the parent framework this variance layer feeds.
Biomass Estimation from LiDAR & SAR Fusion — the upstream stage supplying the AGB rasters and per-pixel error bands.
Ground Truth Alignment for Carbon Models — the plot-to-pixel calibration that anchors the covariance model.
Threshold Tuning for Carbon Stock Baselines — the downstream consumer that applies these envelopes to baseline selection.
MRV Data Lineage & Provenance Tracking — the evidence layer every exported envelope must satisfy.

Emission Factor Uncertainty Mapping #

Role in the MRV Workflow #

Core Failure Modes #

Deterministic Implementation Architecture #

Validation, Debugging & Compliance Mapping #

Conclusion #

Related #