2023
Model calibration is the procedure of finding model settings such that simulated model outputs best match the observed data. Model calibration is necessary when the model parameters cannot directly be measured as is the case with a wide range of environmental models where parameters are conceptually describing upscaled and effective physical processes. Model calibration is therefore an important step of environmental modeling as the model might otherwise provide random outputs if never compared to a ground truth. Model calibration itself is often referred to be an art due to its plenitude of intertwined steps and necessary decisions along the way before a calibration can be carried out or can be regarded successful. This work provides a general guide specifying which steps a modeler needs to undertake, how to diagnose the success of each step, and how to identify the right action to revise steps that were not successful. The procedure is formalized into ten iterative steps generally appearing in calibration experiments. Each step of this “calibration life cycle” is either illustrated with an exemplary calibration experiment or providing an explicit checklist the modeler can follow. These ten strategies are: (1) using sensitivity information to guide the calibration, (2) handling of parameters with constraints, (3) handling of data ranging orders of magnitude, (4) choosing the data to base the calibration on, (5) presenting various methods to sample model parameters, (6) finding appropriate parameter ranges, (7) choosing objective functions, (8) selecting a calibration algorithm, (9) determining the success and quality of a multi-objective calibration, and (10) providing a checklist to diagnose calibration performance using ideas introduced in the previous steps. The formal definition of strategies through the calibration process is providing an overview while shedding a light on connections between these main ingredients to calibrate an environmental model and will therefore enable especially novice modelers to succeed.
DOI
bib
abs
Learning from hydrological models’ challenges: A case study from the Nelson basin model intercomparison project
Mansoor Ahmed,
Tricia A. Stadnyk,
Alain Pietroniro,
Hervé Awoye,
A. R. Bajracharya,
Juliane Mai,
Bryan A. Tolson,
Helen C. Shen,
James R. Craig,
Melissa Gervais,
Kevin Sagan,
Shane G. Wruth,
Kristina Koenig,
Rajtantra Lilhare,
Stephen J. Déry,
Scott Pokorny,
Henry David Venema,
Ameer Muhammad,
Mahkameh Taheri
Journal of Hydrology, Volume 623
Intercomparison studies play an important, but limited role in understanding the usefulness and limitations of currently available hydrological models. Comparison studies are often limited to well-behaved hydrological regimes, where rainfall-runoff processes dominate the hydrological response. These efforts have not covered western Canada due to the difficulty in simulating that region’s complex cold region hydrology with varying spatiotemporal contributing areas. This intercomparison study is the first of a series of studies under the intercomparison project of the international and interprovincial transboundary Nelson-Churchill River Basin (NCRB) in North America (Nelson-MIP), which encompasses different ecozones with major areas of the non-contributing Prairie potholes, forests, glaciers, mountains, and permafrost. The performance of eight hydrological and land surface models is compared at different unregulated watersheds within the NCRB. This is done to assess the models’ streamflow performance and overall fidelity without and with calibration, to capture the underlying physics of the region and to better understand why models struggle to accurately simulate its hydrology. Results show that some of the participating models have difficulties in simulating streamflow and/or internal hydrological variables (e.g., evapotranspiration) over Prairie watersheds but most models performed well elsewhere. This stems from model structural deficiencies, despite the various models being well calibrated to observed streamflow. Some model structural changes are identified for the participating models for future improvement. The outcomes of this study offer guidance for practitioners for the accurate prediction of NCRB streamflow, and for increasing confidence in future projections of water resources supply and management.
2022
Model calibration and validation are critical in hydrological model robustness assessment. Unfortunately, the commonly used split‐sample test (SST) framework for data splitting requires modelers to make subjective decisions without clear guidelines. This large‐sample SST assessment study empirically assesses how different data splitting methods influence post‐validation model testing period performance, thereby identifying optimal data splitting methods under different conditions. This study investigates the performance of two lumped conceptual hydrological models calibrated and tested in 463 catchments across the United States using 50 different data splitting schemes. These schemes are established regarding the data availability, length and data recentness of continuous calibration sub‐periods (CSPs). A full‐period CSP is also included in the experiment, which skips model validation. The assessment approach is novel in multiple ways including how model building decisions are framed as a decision tree problem and viewing the model building process as a formal testing period classification problem, aiming to accurately predict model success/failure in the testing period. Results span different climate and catchment conditions across a 35‐year period with available data, making conclusions quite generalizable. Calibrating to older data and then validating models on newer data produces inferior model testing period performance in every single analysis conducted and should be avoided. Calibrating to the full available data and skipping model validation entirely is the most robust split‐sample decision. Experimental findings remain consistent no matter how model building factors (i.e., catchments, model types, data availability, and testing periods) are varied. Results strongly support revising the traditional split‐sample approach in hydrological modeling.
DOI
bib
abs
The Great Lakes Runoff Intercomparison Project Phase 4: The Great Lakes (GRIP-GL)
Juliane Mai,
Helen C. Shen,
Bryan A. Tolson,
Étienne Gaborit,
Richard Arsenault,
James R. Craig,
Vincent Fortin,
Lauren M. Fry,
Martin Gauch,
Daniel Klotz,
Frederik Kratzert,
Nicole O'Brien,
Daniel Princz,
Sinan Rasiya Koya,
Tirthankar Roy,
Frank Seglenieks,
Narayan Kumar Shrestha,
André Guy Tranquille Temgoua,
Vincent Vionnet,
Jonathan W. Waddell
Hydrology and Earth System Sciences
Abstract. Model intercomparison studies are carried out to test and compare the simulated outputs of various model setups over the same study domain. The Great Lakes region is such a domain of high public interest as it not only resembles a challenging region to model with its trans-boundary location, strong lake effects, and regions of strong human impact but is also one of the most densely populated areas in the United States and Canada. This study brought together a wide range of researchers setting up their models of choice in a highly standardized experimental setup using the same geophysical datasets, forcings, common routing product, and locations of performance evaluation across the 1 million square kilometer study domain. The study comprises 13 models covering a wide range of model types from Machine Learning based, basin-wise, subbasin-based, and gridded models that are either locally or globally calibrated or calibrated for one of each of six predefined regions of the watershed. Unlike most hydrologically focused model intercomparisons, this study not only compares models regarding their capability to simulated streamflow (Q) but also evaluates the quality of simulated actual evapotranspiration (AET), surface soil moisture (SSM), and snow water equivalent (SWE). The latter three outputs are compared against gridded reference datasets. The comparisons are performed in two ways: either by aggregating model outputs and the reference to basin-level or by regridding all model outputs to the reference grid and comparing the model simulations at each grid-cell. The main results of this study are: (1) The comparison of models regarding streamflow reveals the superior quality of the Machine Learning based model in all experiments performance; even for the most challenging spatio-temporal validation the ML model outperforms any other physically based model. (2) While the locally calibrated models lead to good performance in calibration and temporal validation (even outperforming several regionally calibrated models), they lose performance when they are transferred to locations the model has not been calibrated on. This is likely to be improved with more advanced strategies to transfer these models in space. (3) The regionally calibrated models – while losing less performance in spatial and spatio-temporal validation than locally calibrated models – exhibit low performances in highly regulated and urban areas as well as agricultural regions in the US. (4) Comparisons of additional model outputs (AET, SSM, SWE) against gridded reference datasets show that aggregating model outputs and the reference dataset to basin scale can lead to different conclusions than a comparison at the native grid scale. This is especially true for variables with large spatial variability such as SWE. (5) A multi-objective-based analysis of the model performances across all variables (Q, AET, SSM, SWE) reveals overall excellent performing locally calibrated models (i.e., HYMOD2-lumped) as well as regionally calibrated models (i.e., MESH-SVS-Raven and GEM-Hydro-Watroute) due to varying reasons. The Machine Learning based model was not included here as is not setup to simulate AET, SSM, and SWE. (6) All basin-aggregated model outputs and observations for the model variables evaluated in this study are available on an interactive website that enables users to visualize results and download data and model outputs.
A simple algorithm is provided for randomly sampling a set of N +1 weights such that their sum is constrained to be equal to one, analogous to randomly subdividing a pie into N +1 slices where the probability distribution of slice volumes are identically distributed. The cumulative density and probability density functions of the random weights are provided. The algorithmic implementation for the random number sampling are made available. This algorithm has potential applications in calibration, uncertainty analysis, and sensitivity analysis of environmental models. Three example applications are provided to demonstrate the efficiency and superiority of the proposed method compared to alternative sampling methods. • Present unbiased method to sample weights that sum up to 1. • Examples demonstrating the benefit of unbiased sampling. • Code made available in multiple languages.
DOI
bib
abs
The Great Lakes Runoff Intercomparison Project Phase 4: the Great Lakes (GRIP-GL)
Juliane Mai,
Helen C. Shen,
Bryan A. Tolson,
Étienne Gaborit,
Richard Arsenault,
James R. Craig,
Vincent Fortin,
Lauren M. Fry,
Martin Gauch,
Daniel Klotz,
Frederik Kratzert,
Nicole O'Brien,
Daniel Princz,
Sinan Rasiya Koya,
Tirthankar Roy,
Frank Seglenieks,
Narayan Kumar Shrestha,
André Guy Tranquille Temgoua,
Vincent Vionnet,
Jonathan W. Waddell
Hydrology and Earth System Sciences, Volume 26, Issue 13
Abstract. Model intercomparison studies are carried out to test and compare the simulated outputs of various model setups over the same study domain. The Great Lakes region is such a domain of high public interest as it not only resembles a challenging region to model with its transboundary location, strong lake effects, and regions of strong human impact but is also one of the most densely populated areas in the USA and Canada. This study brought together a wide range of researchers setting up their models of choice in a highly standardized experimental setup using the same geophysical datasets, forcings, common routing product, and locations of performance evaluation across the 1×106 km2 study domain. The study comprises 13 models covering a wide range of model types from machine-learning-based, basin-wise, subbasin-based, and gridded models that are either locally or globally calibrated or calibrated for one of each of the six predefined regions of the watershed. Unlike most hydrologically focused model intercomparisons, this study not only compares models regarding their capability to simulate streamflow (Q) but also evaluates the quality of simulated actual evapotranspiration (AET), surface soil moisture (SSM), and snow water equivalent (SWE). The latter three outputs are compared against gridded reference datasets. The comparisons are performed in two ways – either by aggregating model outputs and the reference to basin level or by regridding all model outputs to the reference grid and comparing the model simulations at each grid-cell. The main results of this study are as follows: The comparison of models regarding streamflow reveals the superior quality of the machine-learning-based model in the performance of all experiments; even for the most challenging spatiotemporal validation, the machine learning (ML) model outperforms any other physically based model. While the locally calibrated models lead to good performance in calibration and temporal validation (even outperforming several regionally calibrated models), they lose performance when they are transferred to locations that the model has not been calibrated on. This is likely to be improved with more advanced strategies to transfer these models in space. The regionally calibrated models – while losing less performance in spatial and spatiotemporal validation than locally calibrated models – exhibit low performances in highly regulated and urban areas and agricultural regions in the USA. Comparisons of additional model outputs (AET, SSM, and SWE) against gridded reference datasets show that aggregating model outputs and the reference dataset to the basin scale can lead to different conclusions than a comparison at the native grid scale. The latter is deemed preferable, especially for variables with large spatial variability such as SWE. A multi-objective-based analysis of the model performances across all variables (Q, AET, SSM, and SWE) reveals overall well-performing locally calibrated models (i.e., HYMOD2-lumped) and regionally calibrated models (i.e., MESH-SVS-Raven and GEM-Hydro-Watroute) due to varying reasons. The machine-learning-based model was not included here as it is not set up to simulate AET, SSM, and SWE. All basin-aggregated model outputs and observations for the model variables evaluated in this study are available on an interactive website that enables users to visualize results and download the data and model outputs.
2021
DOI
bib
abs
Ten best practices to strengthen stewardship and sharing of water science data in Canada
Bhaleka Persaud,
K. A. Dukacz,
Gopal Chandra Saha,
A. Peterson,
L. Moradi,
Simon Hearn,
Erin Clary,
Juliane Mai,
Michael Steeleworthy,
Jason J. Venkiteswaran,
Homa Kheyrollah Pour,
Brent B. Wolfe,
Sean K. Carey,
John W. Pomeroy,
C. M. DeBeer,
J. M. Waddington,
Philippe Van Cappellen,
Jimmy Lin
Hydrological Processes, Volume 35, Issue 11
Water science data are a valuable asset that both underpins the original research project and bolsters new research questions, particularly in view of the increasingly complex water issues facing Canada and the world. Whilst there is general support for making data more broadly accessible, and a number of water science journals and funding agencies have adopted policies that require researchers to share data in accordance with the FAIR (Findable, Accessible, Interoperable, Reusable) principles, there are still questions about effective management of data to protect their usefulness over time. Incorporating data management practices and standards at the outset of a water science research project will enable researchers to efficiently locate, analyze and use data throughout the project lifecycle, and will ensure the data maintain their value after the project has ended. Here, some common misconceptions about data management are highlighted, along with insights and practical advice to assist established and early career water science researchers as they integrate data management best practices and tools into their research. Freely available tools and training opportunities made available in Canada through Global Water Futures, the Portage Network, Gordon Foundation's DataStream, Compute Canada, and university libraries, among others are compiled. These include webinars, training videos, and individual support for the water science community that together enable researchers to protect their data assets and meet the expectations of journals and funders. The perspectives shared here have been developed as part of the Global Water Futures programme's efforts to improve data management and promote the use of common data practices and standards in the context of water science in Canada. Ten best practices are proposed that may be broadly applicable to other disciplines in the natural sciences and can be adopted and adapted globally. This article is protected by copyright. All rights reserved.
DOI
bib
abs
Great Lakes Runoff Intercomparison Project Phase 3: Lake Erie (GRIP-E)
Juliane Mai,
Bryan A. Tolson,
Helen C. Shen,
Étienne Gaborit,
Vincent Fortin,
Nicolas Gasset,
Hervé Awoye,
Tricia A. Stadnyk,
Lauren M. Fry,
Emily A. Bradley,
Frank Seglenieks,
André Guy Tranquille Temgoua,
Daniel Princz,
Shervan Gharari,
Amin Haghnegahdar,
Mohamed Elshamy,
Saman Razavi,
Martin Gauch,
Jimmy Lin,
Xiaojing Ni,
Yongping Yuan,
Meghan McLeod,
N. B. Basu,
Rohini Kumar,
Oldřich Rakovec,
Luis Samaniego,
Sabine Attinger,
Narayan Kumar Shrestha,
Prasad Daggupati,
Tirthankar Roy,
Sungwook Wi,
Timothy Hunter,
James R. Craig,
Alain Pietroniro
Journal of Hydrologic Engineering, Volume 26, Issue 9
AbstractHydrologic model intercomparison studies help to evaluate the agility of models to simulate variables such as streamflow, evaporation, and soil moisture. This study is the third in a sequen...
Accurate streamflow prediction largely relies on historical meteorological records and streamflow measurements. For many regions, however, such data are only scarcely available. Facing this problem, many studies simply trained their machine learning models on the region's available data, leaving possible repercussions of this strategy unclear. In this study, we evaluate the sensitivity of tree- and LSTM-based models to limited training data, both in terms of geographic diversity and different time spans. We feed the models meteorological observations disseminated with the CAMELS dataset, and individually restrict the training period length, number of training basins, and input sequence length. We quantify how additional training data improve predictions and how many previous days of forcings we should feed the models to obtain best predictions for each training set size. Further, our findings show that tree- and LSTM-based models provide similarly accurate predictions on small datasets, while LSTMs are superior given more training data.
2020
Data-intensive research and decision-making continue to gain adoption across diverse organizations. As researchers and practitioners increasingly rely on analyzing large data products to both answer scientific questions and for operational needs, data acquisition and pre-processing become critical tasks. For environmental science, the Canadian Surface Prediction Archive (CaSPAr) facilitates easy access to custom subsets of numerical weather predictions. We demonstrate a new open-source interface for CaSPAr that provides easy-to-use map-based querying capabilities and automates data ingestion into the CaSPAr batch processing server.
Lakes and reservoirs have critical impacts on hydrological, biogeochemical, and ecological processes, and they should be an essential component of regional-scale hydrological and eco-hydrological m...