Temporal Aggregation for Land-Use Change

In automated Measurement, Reporting, and Verification (MRV) systems, detecting land-use change requires moving beyond single-pass scene analysis to robust time-series synthesis. Temporal Aggregation for Land-Use Change serves as the critical bridge between raw optical acquisitions and statistically defensible emission factors. Within the broader Satellite Imagery Processing for Emissions Tracking architecture, this stage transforms noisy, irregularly sampled satellite observations into clean, periodic composites that align with IPCC reporting cycles and corporate ESG disclosure requirements. The aggregation routine must reconcile varying revisit frequencies, atmospheric interference, and sensor-specific spectral responses while preserving the abrupt spectral signatures of anthropogenic disturbance.

Pre-Aggregation Quality Control & Validity Tracking

Raw multispectral tiles suffer from cloud contamination, shadow occlusion, and seasonal phenology shifts that introduce severe bias into unfiltered time series. Before any aggregation logic executes, rigorous pixel-level quality assessment must occur. Implementing robust Sentinel-2 & Landsat Cloud Masking Workflows ensures that only valid surface reflectance values contribute to the temporal stack. Without strict cloud, shadow, and aerosol filtering, median or harmonic composites will inherit spectral noise, directly compromising downstream carbon stock estimates and violating MRV auditability standards.

The masking layer is applied as a boolean validity mask that propagates through the aggregation window. This enables per-pixel observation counting, which becomes the primary gating mechanism for statistical confidence. Production systems must track valid_observation_count alongside spectral bands to satisfy third-party verification requirements.

Aggregation Architecture & Memory Management

The core aggregation routine must handle irregular acquisition dates, missing data gaps, and varying spatial resolutions while remaining memory-efficient at continental scale. Production deployments rely on chunked xarray operations backed by dask, with explicit CRS validation and lazy evaluation. When cloud-free observations fall below a configurable threshold within a monthly window, the pipeline must avoid generating artificial spectral values. Instead, it triggers a fallback sequence that either carries forward the last valid composite or applies constrained temporal interpolation, logging the fallback activation for compliance traceability.

This deterministic approach directly feeds Deforestation Alert Generation Pipelines with stable baselines, preventing false-positive disturbance flags caused by atmospheric artifacts or sensor drift.

Production Implementation

The following implementation demonstrates a production-grade temporal aggregation flow using prefect for orchestration, xarray/dask for chunked computation, and rasterio for explicit CRS handling. Structured logging captures fallback activations and observation density, which are required for GHG Protocol and Verra VM0042 audit trails.

import os
import logging
import json
import numpy as np
import xarray as xr
import rasterio
import rioxarray  # registers the xarray ".rio" accessor
from rasterio.crs import CRS
from dask import config as dask_config
from prefect import flow, task
from pathlib import Path
from typing import Optional, Tuple, Dict, Any

# Structured logging configuration for compliance traceability
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
    handlers=[logging.StreamHandler()]
)
logger = logging.getLogger("mrv.temporal_aggregation")

# Dask memory optimization for large-scale tiling
dask_config.set({
    "scheduler": "threads",
    "array.slicing.split_large_chunks": False,
    "array.chunk-size": "100MB",
})

VALID_COUNT_THRESHOLD = 4
FALLBACK_MODE = "carry_forward"  # Options: "carry_forward", "interpolate"

@task(retries=2, retry_delay_seconds=10)
def load_and_validate_stack(tile_paths: list[str], target_crs: str = "EPSG:4326") -> xr.Dataset:
    """Load, align, and attach validity masks to a temporal stack."""
    logger.info("Loading temporal stack: %d tiles", len(tile_paths))
    stack_list = []
    
    for path in sorted(tile_paths):
        with rasterio.open(path) as src:
            if str(src.crs) != target_crs:
                raise ValueError(f"CRS mismatch: {src.crs} vs target {target_crs}")
            
            # Read bands 1-4 (B, G, R, NIR) + QA band
            data = src.read()
            meta = src.meta
            
        # Assume last band is QA/cloud mask (1=valid, 0=invalid)
        valid_mask = data[-1, :, :] == 1
        spectral_data = data[:-1, :, :]
        
        ds = xr.Dataset(
            data_vars={
                "reflectance": (["band", "y", "x"], spectral_data),
                "valid_mask": (["y", "x"], valid_mask.astype(np.uint8))
            },
            coords={
                "band": [1, 2, 3, 4],
                "y": np.arange(src.height),
                "x": np.arange(src.width),
                "time": [meta.get("acquisition_date", "unknown")]
            }
        )
        stack_list.append(ds)
        
    aligned = xr.concat(stack_list, dim="time")
    aligned.rio.write_crs(target_crs, inplace=True)
    return aligned.chunk({"y": 512, "x": 512, "time": -1})

@task
def compute_temporal_aggregate(
    stack: xr.Dataset,
    threshold: int = VALID_COUNT_THRESHOLD,
    fallback: str = FALLBACK_MODE
) -> Tuple[xr.Dataset, Dict[str, Any]]:
    """Aggregate stack using median, enforce thresholds, and apply fallback logic."""
    logger.info("Computing temporal aggregation with threshold=%d, fallback=%s", threshold, fallback)
    
    # Count valid observations per pixel
    valid_counts = stack["valid_mask"].sum(dim="time")
    
    # Mask invalid observations before aggregation
    masked_reflectance = stack["reflectance"].where(stack["valid_mask"] == 1)
    
    # Primary aggregation: temporal median
    composite = masked_reflectance.median(dim="time")
    
    # Fallback routing
    fallback_triggered = False
    low_confidence_mask = valid_counts < threshold
    
    if low_confidence_mask.any().compute():
        fallback_triggered = True
        logger.warning("Low observation count detected in %d pixels. Applying fallback.", low_confidence_mask.sum().compute())
        
        if fallback == "carry_forward":
            # Use last valid acquisition for low-confidence pixels
            last_valid = masked_reflectance.isel(time=-1)
            composite = composite.where(~low_confidence_mask, last_valid)
        elif fallback == "interpolate":
            # Constrained linear interpolation (requires sorted time dim)
            composite = composite.where(~low_confidence_mask, masked_reflectance.interpolate_na(dim="time"))
        else:
            raise ValueError(f"Unsupported fallback mode: {fallback}")
            
    # Attach compliance metadata
    compliance_meta = {
        "aggregation_method": "median",
        "valid_count_threshold": threshold,
        "fallback_mode": fallback,
        "fallback_triggered": bool(fallback_triggered),
        "min_valid_count": int(valid_counts.min().compute()),
        "max_valid_count": int(valid_counts.max().compute()),
        "ipcc_tier_alignment": "Tier 3 (spatially explicit)",
        "audit_timestamp": str(np.datetime64("now", "s"))
    }
    
    # Attach metadata to dataset attributes for downstream verification
    composite.attrs["compliance_metadata"] = json.dumps(compliance_meta)
    composite["valid_observation_count"] = valid_counts
    
    return composite, compliance_meta

@flow(name="mrv_temporal_aggregation")
def run_aggregation_pipeline(tile_directory: str, output_path: str):
    """Orchestrate end-to-end temporal aggregation for MRV compliance."""
    tiles = [str(p) for p in Path(tile_directory).glob("*.tif")]
    if not tiles:
        raise FileNotFoundError(f"No tiles found in {tile_directory}")
        
    stack = load_and_validate_stack(tiles)
    composite, meta = compute_temporal_aggregate(stack)
    
    logger.info("Writing aggregated composite to %s", output_path)
    # Export with explicit CRS and compliance attributes
    composite.rio.to_raster(output_path, driver="GTiff", compress="DEFLATE", dtype="float32")
    logger.info("Pipeline complete. Compliance metadata: %s", meta)
    return meta

if __name__ == "__main__":
    # Example execution
    # run_aggregation_pipeline("/data/input_tiles", "/data/output/monthly_composite.tif")
    pass

Debugging & Validation Protocols

Temporal aggregation introduces several failure modes that require explicit debugging strategies:

  1. Chunk Boundary Artifacts: Dask chunking can produce edge discontinuities if overlapping tiles aren’t padded correctly. Always validate composite.attrs["spatial_ref"] and run a rasterio.merge sanity check on adjacent outputs.
  2. Spectral Drift from Fallbacks: Carry-forward logic preserves spatial continuity but masks phenological change. Log fallback_triggered ratios per administrative boundary. If >15% of pixels trigger fallbacks in a reporting period, flag for manual auditor review.
  3. Temporal Misalignment: Ensure acquisition timestamps are normalized to UTC and sorted before xr.concat. Unsorted time dimensions cause interpolate_na to produce inverted gradients.
  4. Validation Against Ground Truth: Cross-reference aggregated NDVI/EVI trajectories with Monthly Temporal Aggregation of NDVI for Land Cover Change benchmarks. Use xarray’s computation engine to calculate RMSE against in-situ flux tower data where available.

Regulatory & Verification Alignment

Outputs from this aggregation stage must map directly to recognized carbon accounting frameworks:

  • IPCC Tier 3 Methodology: Spatially explicit, high-frequency composites satisfy the requirement for dynamic activity data. The valid_observation_count layer provides the uncertainty quantification needed for Tier 3 error propagation.
  • GHG Protocol Land Sector Guidance: Fallback activation logs serve as explicit documentation of data gaps, satisfying the transparency principle for corporate Scope 3 land-use emissions reporting.
  • Verra VM0042 & ART Jurisdictional REDD+: The boolean validity mask and threshold-enforced composites align with baseline establishment requirements. Auditors require immutable logs of fallback_triggered states to verify that emission reductions aren’t artificially inflated by interpolated spectral values.
  • EU CSRD/ESRS E1: Temporal composites must be reproducible and version-controlled. Embedding compliance_metadata directly into raster attributes ensures that downstream ESG reporting tools inherit the exact aggregation parameters used during calculation.

By enforcing strict validity thresholds, deterministic fallback routing, and explicit audit metadata, temporal aggregation transitions from a purely computational step to a defensible MRV control point. This ensures that land-use change signals are statistically robust, regulatorily compliant, and ready for automated carbon accounting workflows.