Machine learning models inaccurately predict current and future high-latitude C balances

Introduction

Independent evaluation using process model simulations reveals shortcomings in machine learning techniques commonly used to upscale and forecast ecosystem processes.

Body

The high-latitude carbon cycle is an important, very complex, and highly uncertain component of the global climate system. A growing number of studies have relied on machine learning (ML) methods to create regional estimates of current and future ecosystem properties (e.g., carbon balance) based on a small number of site measurements. Because very few data exist, however, the predictions of these ML models are never tested against independent measurements. In this study, a novel approach to evaluate such techniques was used to uncover large biases in ML predictions of current and future high-latitude carbon balance. First, carbon fluxes and environmental data across Alaska were simulated using ecosys, a process-rich terrestrial ecosystem model; then, boosted regression tree ML algorithms were applied to training data configurations that mirror and expand upon existing AmeriFLUX eddy covariance data availability. The team showed that an ML model trained using ecosys outputs from currently available Alaska AmeriFLUX sites incorrectly predicts that Alaska is presently a modeled net carbon source. Increased spatial coverage of the training dataset improved ML predictions, halving the bias when 240 modeled sites were used instead of 15. However, even this more accurate ML model incorrectly predicts Alaska carbon fluxes under 21st century climate change because of changes in atmospheric CO2, litter inputs, and vegetation composition that affect carbon fluxes, and such changes cannot be inferred from the training data. This study reveals striking shortcomings in ML techniques commonly used to upscale and forecast ecosystem processes, highlighting the need for cautious interpretation of observation-based data products created using ML. 

Citation: Shirley, I. A., Z. A. Mekonnen, R. F. Grant, B. Dafflon, and W. J. Riley. 2023. “Machine learning models inaccurately predict current and future high-latitude C balances.” Environmental Research Letters 18: 014026. https://doi.org/10.1088/1748-9326/acacb2.

Image with caption
Image
Present-day net ecosystem exchange is shown across Alaska for the simulated data (left) and ML predictions (right), revealing very  large discrepancies in the predictions. Black dots denote locations of sites used to train the machine learning model.

Present-day net ecosystem exchange is shown across Alaska for the simulated data (left) and ML predictions (right), revealing very large discrepancies in the predictions. Black dots denote locations of sites used to train the machine learning model.

For more information, please contact:

Ian Shirley

Project Phase(s)