Machine Learning Models Inaccurately Predict Current and Future High-Latitude Carbon Balances

Date Published

Process model simulations reveal shortcomings in machine learning techniques commonly used to upscale and forecast ecosystem processes.

Objective

The high-latitude carbon cycle is an important, complex, and highly uncertain component of the global climate system. A growing number of studies have relied on machine learning methods to create regional estimates of current and future ecosystem properties (e.g., carbon balance) based on a small number of site measurements. Because there are few observational data, machine learning model predictions are rarely tested against independent measurements. In this study, a novel approach is used to uncover large biases in machine learning predictions of current and future high-latitude carbon balance.

New Science

In this study, carbon fluxes and environmental data are simulated across Alaska using ecosys, a process-rich terrestrial ecosystem model. Boosted regression tree machine learning algorithms are then applied to different subsets of simulated data that mirror and expand upon existing AmeriFlux eddy-covariance data availability. Machine learning predictions across the entire domain are compared to simulated data to understand how variation in site coverage and climate forcing impacts typical data-driven machine learning upscaling and forecasting approaches.

When current Alaska AmeriFlux data coverage is used for training, machine learning methods incorrectly predict that Alaska is a net carbon source. Machine learning predictions are improved with increased spatial coverage of the training dataset (e.g., bias is halved when 240 modeled sites are used instead of 15). However, even the machine learning model trained with 240 sites does not match the substantial increase in Alaska carbon sink strength simulated by ecosys throughout the 21st century. Convergence cross-mapping is used to show that degradation of machine learning model projections can be ascribed to changes in atmospheric CO2, litter inputs, and vegetation composition. This study reveals large shortcomings in machine learning techniques commonly used to upscale and forecast ecosystem processes.

Impact

Machine learning methods are shown to incorrectly predict that Alaska is currently a net source of carbon when existing site coverage is used for training. This result mirrors a current mismatch between ecosystem model and machine learning estimates of high-latitude carbon balances and points to insufficient site coverage as a likely cause. This study demonstrates that machine learning methods are unable to predict how ecosystem carbon fluxes will respond to climate change because training data cannot capture important relationship changes. These findings highlight the need for cautious interpretation of machine learning predictions of current and future ecosystem processes.

Image with caption
Image
Present net ecosystem exchange across Alaska for target simulated data (left) and machine learning predictions (right) reveal large discrepancies in predictions. Green dots denote locations of sites used to train the machine learning model.

Present net ecosystem exchange across Alaska for target simulated data (left) and machine learning predictions (right) reveal large discrepancies in predictions. Green dots denote locations of sites used to train the machine learning model.

Citation(s)
Text

Shirley, I. A., et al. "Machine Learning Models Inaccurately Predict Current and Future High-Latitude C Balances." Environmental Research Letters 18 (1), 014026  (2023). https://doi.org/10.1088/1748-9326/acacb2.

Funding

This research was supported by the Director, Office of Science, Office of Biological and Environmental Research of the US Department of Energy under Contract No. DE-AC02-05CH11231 as part of the Next-Generation Ecosystem Experiments (NGEE Arctic) project.

For more information, please contact:

Ian Shirley

Field Location(s)
Project Phase(s)