2022
DOI
bib
abs
Community Workflows to Advance Reproducibility in Hydrologic Modeling: Separating Model‐Agnostic and Model‐Specific Configuration Steps in Applications of Large‐Domain Hydrologic Models
Wouter Knoben,
Martyn P. Clark,
Jerad Bales,
Andrew Bennett,
Shervan Gharari,
Christopher B. Marsh,
Bart Nijssen,
Alain Pietroniro,
Raymond J. Spiteri,
Guoqiang Tang,
David G. Tarboton,
Andy Wood
Water Resources Research, Volume 58, Issue 11
Despite the proliferation of computer-based research on hydrology and water resources, such research is typically poorly reproducible. Published studies have low reproducibility due to incomplete availability of data and computer code, and a lack of documentation of workflow processes. This leads to a lack of transparency and efficiency because existing code can neither be quality controlled nor reused. Given the commonalities between existing process-based hydrologic models in terms of their required input data and preprocessing steps, open sharing of code can lead to large efficiency gains for the modeling community. Here, we present a model configuration workflow that provides full reproducibility of the resulting model instantiations in a way that separates the model-agnostic preprocessing of specific data sets from the model-specific requirements that models impose on their input files. We use this workflow to create large-domain (global and continental) and local configurations of the Structure for Unifying Multiple Modeling Alternatives (SUMMA) hydrologic model connected to the mizuRoute routing model. These examples show how a relatively complex model setup over a large domain can be organized in a reproducible and structured way that has the potential to accelerate advances in hydrologic modeling for the community as a whole. We provide a tentative blueprint of how community modeling initiatives can be built on top of workflows such as this. We term our workflow the “Community Workflows to Advance Reproducibility in Hydrologic Modeling” (CWARHM; pronounced “swarm”).
Abstract. The Modular Assessment of Rainfall–Runoff Models Toolbox (MARRMoT) is a flexible modelling framework reproducing the behaviour of 47 established hydrological models. This toolbox can be used to calibrate and run models in a user-friendly and consistent way and is designed to facilitate the sharing of model code for reproducibility and to support intercomparison between hydrological models. Additionally, it allows users to create or modify models using components of existing ones. We present a new MARRMoT release (v2.1) designed for improved speed and ease of use. While improved computational efficiency was the main driver for this redevelopment, MARRMoT v2.1 also succeeds in drastically reducing the verbosity and repetitiveness of the code, which improves readability and facilitates debugging. The process to create new models or modify existing ones within the toolbox is also simplified in this version, making MARRMoT v2.1 accessible for researchers and practitioners at all levels of expertise. These improvements were achieved by implementing an object-oriented structure and aggregating all common model operations into a single class definition from which all models inherit. The new modelling framework maintains and improves on several good practices built into the original MARRMoT and includes a number of new features such as the possibility of retrieving more output in different formats that simplifies troubleshooting, and a new functionality that simplifies the calibration process. We compare outputs of 36 of the models in the framework to an earlier published analysis and demonstrate that MARRMoT v2.1 is highly consistent with the previous version of MARRMoT (v1.4), while achieving a 3.6-fold improvement in runtime on average. The new version of the toolbox and user manual, including several workflow examples for common application, are available from GitHub (https://github.com/wknoben/MARRMoT, last access: 12 May 2022; https://doi.org/10.5281/zenodo.6484372, Trotter and Knoben, 2022b).
2021
Abstract The intent of this paper is to encourage improved numerical implementation of land models. Our contributions in this paper are two-fold. First, we present a unified framework to formulate and implement land model equations. We separate the representation of physical processes from their numerical solution, enabling the use of established robust numerical methods to solve the model equations. Second, we introduce a set of synthetic test cases (the laugh tests) to evaluate the numerical implementation of land models. The test cases include storage and transmission of water in soils, lateral sub-surface flow, coupled hydrological and thermodynamic processes in snow, and cryosuction processes in soil. We consider synthetic test cases as “laugh tests” for land models because they provide the most rudimentary test of model capabilities. The laugh tests presented in this paper are all solved with the Structure for Unifying Multiple Modeling Alternatives model (SUMMA) implemented using the SUite of Nonlinear and DIfferential/Algebraic equation Solvers (SUNDIALS). The numerical simulations from SUMMA/SUNDIALS are compared against (1) solutions to the synthetic test cases from other models documented in the peer-reviewed literature; (2) analytical solutions; and (3) observations made in laboratory experiments. In all cases, the numerical simulations are similar to the benchmarks, building confidence in the numerical model implementation. We posit that some land models may have difficulty in solving these benchmark problems. Dedicating more effort to solving synthetic test cases is critical in order to build confidence in the numerical implementation of land models.
Hydrometeorological flood generating processes (excess rain, short rain, long rain, snowmelt, and rain-on-snow) underpin our understanding of flood behavior. Knowledge about flood generating processes improves hydrological models, flood frequency analysis, estimation of climate change impact on floods, etc. Yet, not much is known about how climate and catchment attributes influence the spatial distribution of flood generating processes. This study aims to offer a comprehensive and structured approach to close this knowledge gap. We employ a large sample approach (671 catchments across the contiguous United States) and evaluate how catchment attributes and climate attributes influence the distribution of flood processes. We use two complementary approaches: A statistics-based approach which compares attribute frequency distributions of different flood processes; and a random forest model in combination with an interpretable machine learning approach (accumulated local effects [ALE]). The ALE method has not been used often in hydrology, and it overcomes a significant obstacle in many statistical methods, the confounding effect of correlated catchment attributes. As expected, we find climate attributes (fraction of snow, aridity, precipitation seasonality, and mean precipitation) to be most influential on flood process distribution. However, the influence of catchment attributes varies both with flood generating process and climate type. We also find flood processes can be predicted for ungauged catchments with relatively high accuracy (R2 between 0.45 and 0.9). The implication of these findings is flood processes should be considered for future climate change impact studies, as the effect of changes in climate on flood characteristics varies between flood processes.
DOI
bib
abs
Flood spatial coherence, triggers, and performance in hydrological simulations: large-sample evaluation of four streamflow-calibrated models
Manuela Irene Brunner,
Lieke Melsen,
Andy Wood,
Oldřich Rakovec,
Naoki Mizukami,
Wouter Knoben,
Martyn P. Clark
Hydrology and Earth System Sciences, Volume 25, Issue 1
Abstract. Floods cause extensive damage, especially if they affect large regions. Assessments of current, local, and regional flood hazards and their future changes often involve the use of hydrologic models. A reliable hydrologic model ideally reproduces both local flood characteristics and spatial aspects of flooding under current and future climate conditions. However, uncertainties in simulated floods can be considerable and yield unreliable hazard and climate change impact assessments. This study evaluates the extent to which models calibrated according to standard model calibration metrics such as the widely used Kling–Gupta efficiency are able to capture flood spatial coherence and triggering mechanisms. To highlight challenges related to flood simulations, we investigate how flood timing, magnitude, and spatial variability are represented by an ensemble of hydrological models when calibrated on streamflow using the Kling–Gupta efficiency metric, an increasingly common metric of hydrologic model performance also in flood-related studies. Specifically, we compare how four well-known models (the Sacramento Soil Moisture Accounting model, SAC; the Hydrologiska Byråns Vattenbalansavdelning model, HBV; the variable infiltration capacity model, VIC; and the mesoscale hydrologic model, mHM) represent (1) flood characteristics and their spatial patterns and (2) how they translate changes in meteorologic variables that trigger floods into changes in flood magnitudes. Our results show that both the modeling of local and spatial flood characteristics are challenging as models underestimate flood magnitude, and flood timing is not necessarily well captured. They further show that changes in precipitation and temperature are not always well translated to changes in flood flow, which makes local and regional flood hazard assessments even more difficult for future conditions. From a large sample of catchments and with multiple models, we conclude that calibration on the integrated Kling–Gupta metric alone is likely to yield models that have limited reliability in flood hazard assessments, undermining their utility for regional and future change assessments. We underscore that such assessments can be improved by developing flood-focused, multi-objective, and spatial calibration metrics, by improving flood generating process representation through model structure comparisons and by considering uncertainty in precipitation input.
DOI
bib
abs
The Abuse of Popular Performance Metrics in Hydrologic Modeling
Martyn P. Clark,
Richard M. Vogel,
Jonathan Lamontagne,
Naoki Mizukami,
Wouter Knoben,
Guoqiang Tang,
Shervan Gharari,
Jim Freer,
Paul H. Whitfield,
Kevin Shook,
Simon Michael Papalexiou
Water Resources Research, Volume 57, Issue 9
The goal of this commentary is to critically evaluate the use of popular performance metrics in hydrologic modeling. We focus on the Nash-Sutcliffe Efficiency (NSE) and the Kling-Gupta Efficiency (KGE) metrics, which are both widely used in hydrologic research and practice around the world. Our specific objectives are: (a) to provide tools that quantify the sampling uncertainty in popular performance metrics; (b) to quantify sampling uncertainty in popular performance metrics across a large sample of catchments; and (c) to prescribe the further research that is, needed to improve the estimation, interpretation, and use of popular performance metrics in hydrologic modeling. Our large-sample analysis demonstrates that there is substantial sampling uncertainty in the NSE and KGE estimators. This occurs because the probability distribution of squared errors between model simulations and observations has heavy tails, meaning that performance metrics can be heavily influenced by just a few data points. Our results highlight obvious (yet ignored) abuses of performance metrics that contaminate the conclusions of many hydrologic modeling studies: It is essential to quantify the sampling uncertainty in performance metrics when justifying the use of a model for a specific purpose and when comparing the performance of competing models.
• Most conceptual bucket models have an upper limit on simulated soil moisture deficit. • Problems arise when the bucket “empties” because ET drops to unrealistic (low) levels. • Alternatives include bottomless buckets or deficit-based soil moisture accounting. • Here, we switch to a deficit-based scheme while keeping everything else constant. • Tested over historic drought, model performance and realism are enhanced. Rainfall-runoff models based on conceptual “buckets” are frequently used in climate change impact studies to provide runoff projections. When these buckets approach empty, the simulated evapotranspiration approaches zero, which places an implicit limit on the soil moisture deficit that can accrue within the model. Such models may cease to properly track the moisture deficit accumulating in reality as dry conditions continue, leading to overestimation of subsequent runoff and possible long-term bias under drying climate. Here, we suggest that model realism may be improved through alternatives which remove the upper limit on simulated soil moisture deficit, such as “bottomless” buckets or deficit-based soil moisture accounting. While some existing models incorporate such measures, no study until now has systematically assessed their impact on model realism under drying climate. Here, we alter a common bucket model by changing the soil moisture storage to a deficit accounting system in such a way as to remove the upper limit on simulated soil moisture deficit. Tested on 38 Australian catchments, the altered model is better able to track the decline in soil moisture at the end of seasonal dry periods, which leads to superior performance over varied historic climate, including the 13-year “Millennium” drought. However, groundwater and GRACE data reveal long-term trends that are not matched in simulations, indicating that further changes may be required. Nonetheless, the results suggest that a broader adoption of bottomless buckets and/or deficit accounting within conceptual rainfall runoff models may improve the realism of runoff projections under drying climate.
Models that mimic an original model might have a different model structure than the original model, that affects model output. This study assesses model structure differences and their impact on output by comparing 7 model implementations that carry the name HBV. We explain and quantify output differences with individual model structure components at both the numerical (e.g., explicit/implicit scheme) and mathematical level (e.g., lineair/power outflow). It was found that none of the numerical and mathematical formulations of the mimicking models were (originally) the same as the benchmark, HBV-light. This led to small but distinct output differences in simulated streamflow for different numerical implementations (KGE difference up to 0.15), and major output differences due to mathematical differences (KGE median loss of 0.27). These differences decreased after calibrating the individual models to the simulated streamflow of the benchmark model. We argue that the lack of systematic model naming has led to a diverging concept of the HBV-model, diminishing the concept of model mimicry. Development of a systematic model naming framework, open accessible model code and more elaborate model descriptions are suggested to enhance model mimicry and model development.
2020
Abstract. Floods cause large damages, especially if they affect large regions. Assessments of current, local and regional flood hazards and their future changes often involve the use of hydrologic models. However, uncertainties in simulated floods can be considerable and yield unreliable hazard and climate change impact assessments. A reliable hydrologic model ideally reproduces both local flood characteristics and spatial aspects of flooding, which is, however, not guaranteed especially when using standard model calibration metrics. In this paper we investigate how flood timing, magnitude and spatial variability are represented by an ensemble of hydrological models when calibrated on streamflow using the Kling–Gupta efficiency metric, an increasingly common metric of hydrologic model performance. We compare how four well-known models (SAC, HBV, VIC, and mHM) represent (1) flood characteristics and their spatial patterns; and (2) how they translate changes in meteorologic variables that trigger floods into changes in flood magnitudes. Our results show that both the modeling of local and spatial flood characteristics is challenging. They further show that changes in precipitation and temperature are not necessarily well translated to changes in flood flow, which makes local and regional flood hazard assessments even more difficult for future conditions. We conclude that models calibrated on integrated metrics such as the Kling–Gupta efficiency alone have limited reliability in flood hazard assessments, in particular in regional and future assessments, and suggest the development of alternative process-based and spatial evaluation metrics.
Abstract. Land models are increasingly used in terrestrial hydrology due to their process-oriented representation of water and energy fluxes. Land models can be set up at a range of spatial configurations, often ranging from grid sizes of 0.02 to 2 degrees (approximately 2 to 200 km) and applied at sub-daily temporal resolutions for simulation of energy fluxes. A priori specification of the grid size of the land models typically is derived from forcing resolutions, modeling objectives, available geo-spatial data and computational resources. Typically, the choice of model configuration and grid size is based on modeling convenience and is rarely examined for adequate physical representation in the context of modeling. The variability of the inputs and parameters, forcings, soil types, and vegetation covers, are masked or aggregated based on the a priori chosen grid size. In this study, we propose an alternative to directly set up a land model based on the concept of Group Response Unit (GRU). Each GRU is a unique combination of land cover, soil type, and other desired geographical features that has hydrological significance, such as elevation zone, slope, and aspect. Computational units are defined as GRUs that are forced at a specific forcing resolution; therefore, each computational unit has a unique combination of specific geo-spatial data and forcings. We set up the Variable Infiltration Capacity (VIC) model, based on the GRU concept (VIC-GRU). Utilizing this model setup and its advantages we try to answer the following questions: (1) how well a model configuration simulates an output variable, such as streamflow, for range of computational units, (2) how well a model configuration with fewer computational units, coarser forcing resolution and less geo-spatial information, reproduces a model set up with more computational units, finer forcing resolution and more geo-spatial information, and finally (3) how uncertain the model structure and parameters are for the land model. Our results, although case dependent, show that the models may similarly reproduce output with a lower number of computational units in the context of modeling (streamflow for example). Our results also show that a model configuration with a lower number of computational units may reproduce the simulations from a model configuration with more computational units. Similarly, this can assist faster parameter identification and model diagnostic suites, such as sensitivity and uncertainty, on a less computationally expensive model setup. Finally, we encourage the land model community to adopt flexible approaches that will provide a better understanding of accuracy-performance tradeoff in land models.
DOI
bib
abs
Many Commonly Used Rainfall‐Runoff Models Lack Long, Slow Dynamics: Implications for Runoff Projections
Keirnan Fowler,
Wouter Knoben,
Murray C. Peel,
Tim Peterson,
Dongryeol Ryu,
Margarita Saft,
Ki‐Weon Seo,
Andrew W. Western
Water Resources Research, Volume 56, Issue 5
Evidence suggests that catchment state variables such as groundwater can exhibit multiyear trends. This means that their state may reflect not only recent climatic conditions but also climatic conditions in past years or even decades. Here we demonstrate that five commonly used conceptual “bucket” rainfall‐runoff models are unable to replicate multiyear trends exhibited by natural systems during the “Millennium Drought” in south‐east Australia. This causes an inability to extrapolate to different climatic conditions, leading to poor performance in split sample tests. Simulations are examined from five models applied in 38 catchments, then compared with groundwater data from 19 bores and Gravity Recovery and Climate Experiment data for two geographic regions. Whereas the groundwater and Gravity Recovery and Climate Experiment data decrease from high to low values gradually over the duration of the 13‐year drought, the model storages go from high to low values in a typical seasonal cycle. This is particularly the case in the drier, flatter catchments. Once the drought begins, there is little room for decline in the simulated storage, because the model “buckets” are already “emptying” on a seasonal basis. Since the effects of sustained dry conditions cannot accumulate within these models, we argue that they should not be used for runoff projections in a drying climate. Further research is required to (a) improve conceptual rainfall‐runoff models, (b) better understand circumstances in which multiyear trends in state variables occur, and (c) investigate links between these multiyear trends and changes in rainfall‐runoff relationships in the context of a changing climate.
The choice of hydrological model structure, that is, a model's selection of states and fluxes and the equations used to describe them, strongly controls model performance and realism. This work investigates differences in performance of 36 lumped conceptual model structures calibrated to and evaluated on daily streamflow data in 559 catchments across the United States. Model performance is compared against a benchmark that accounts for the seasonality of flows in each catchment. We find that our model ensemble struggles to beat the benchmark in snow-dominated catchments. In most other catchments model structure equifinality (i.e., cases where different models achieve similar high efficiency scores) can be very high. We find no relation between the number of model parameters and performance during either calibration or evaluation periods nor evidence of increased risk of overfitting for models with more parameters. Instead, the choice of model parametrization (i.e., which equations are used and how parameters are used within them) dictates the model's strengths and weaknesses. Results suggest that certain model structures are inherently better suited for certain objective functions and thus for certain study purposes. We find no clear relationships between the catchments where any model performs well and descriptors of those catchments' geology, topography, soil, and vegetation characteristics. Instead, model suitability seems to relate strongest to the streamflow regime each catchment generates, and we have formulated several tentative hypotheses that relate commonalities in model structure to similarities in model performance. Modeling results are made publicly available for further investigation.
2019
DOI
bib
abs
Twenty-three unsolved problems in hydrology (UPH) – a community perspective
Günter Blöschl,
M. F. Bierkens,
António Chambel,
Christophe Cudennec,
Georgia Destouni,
Aldo Fiori,
J. W. Kirchner,
Jeffrey J. McDonnell,
H. H. G. Savenije,
Murugesu Sivapalan,
Christine Stumpp,
Elena Toth,
Elena Volpi,
Gemma Carr,
Claire Lupton,
José Luis Salinas,
Borbála Széles,
Alberto Viglione,
Hafzullah Aksoy,
Scott T. Allen,
Anam Amin,
Vazken Andréassian,
Berit Arheimer,
Santosh Aryal,
Victor R. Baker,
Earl Bardsley,
Marlies Barendrecht,
Alena Bartošová,
Okke Batelaan,
Wouter Berghuijs,
Keith Beven,
Theresa Blume,
Thom Bogaard,
Pablo Borges de Amorim,
Michael E. Böttcher,
Gilles Boulet,
Korbinian Breinl,
Mitja Brilly,
Luca Brocca,
Wouter Buytaert,
Attilio Castellarin,
Andrea Castelletti,
Xiaohong Chen,
Yangbo Chen,
Yuanfang Chen,
Peter Chifflard,
Pierluigi Claps,
Martyn P. Clark,
Adrian L. Collins,
Barry Croke,
Annette Dathe,
Paula Cunha David,
Felipe P. J. de Barros,
Gerrit de Rooij,
Giuliano Di Baldassarre,
Jessica M. Driscoll,
Doris Duethmann,
Ravindra Dwivedi,
Ebru Eriş,
William Farmer,
James Feiccabrino,
Grant Ferguson,
Ennio Ferrari,
Stefano Ferraris,
Benjamin Fersch,
David C. Finger,
Laura Foglia,
Keirnan Fowler,
Б. И. Гарцман,
Simon Gascoin,
Éric Gaumé,
Alexander Gelfan,
Josie Geris,
Shervan Gharari,
Tom Gleeson,
Miriam Glendell,
Alena Gonzalez Bevacqua,
M. P. González‐Dugo,
Salvatore Grimaldi,
A.B. Gupta,
Björn Guse,
Dawei Han,
David M. Hannah,
A. A. Harpold,
Stefan Haun,
Kate Heal,
Kay Helfricht,
Mathew Herrnegger,
Matthew R. Hipsey,
Hana Hlaváčiková,
Clara Hohmann,
Ladislav Holko,
C. Hopkinson,
Markus Hrachowitz,
Tissa H. Illangasekare,
Azhar Inam,
Camyla Innocente,
Erkan Istanbulluoglu,
Ben Jarihani,
Zahra Kalantari,
Andis Kalvāns,
Sonu Khanal,
Sina Khatami,
Jens Kiesel,
M. J. Kirkby,
Wouter Knoben,
Krzysztof Kochanek,
Silvia Kohnová,
Alla Kolechkina,
Stefan Krause,
David K. Kreamer,
Heidi Kreibich,
Harald Kunstmann,
Holger Lange,
Margarida L. R. Liberato,
Eric Lindquist,
Timothy E. Link,
Junguo Liu,
Daniel P. Loucks,
Charles H. Luce,
Gil Mahé,
Olga Makarieva,
Julien Malard,
Shamshagul Mashtayeva,
Shreedhar Maskey,
Josep Mas‐Pla,
Maria Mavrova-Guirguinova,
Maurizio Mazzoleni,
Sebastian H. Mernild,
Bruce Misstear,
Alberto Montanari,
Hannes Müller-Thomy,
Alireza Nabizadeh,
Fernando Nardi,
Christopher M. U. Neale,
Nataliia Nesterova,
Bakhram Nurtaev,
V.O. Odongo,
Subhabrata Panda,
Saket Pande,
Zhonghe Pang,
Georgia Papacharalampous,
Charles Perrin,
Laurent Pfister,
Rafael Pimentel,
María José Polo,
David Post,
Cristina Prieto,
Maria‐Helena Ramos,
Maik Renner,
José Eduardo Reynolds,
Elena Ridolfi,
Riccardo Rigon,
Mònica Riva,
David Robertson,
Renzo Rosso,
Tirthankar Roy,
João Henrique Macedo Sá,
Gianfausto Salvadori,
Melody Sandells,
Bettina Schaefli,
Andreas Schumann,
Anna Scolobig,
Jan Seibert,
Éric Servat,
Mojtaba Shafiei,
Ashish Sharma,
Moussa Sidibé,
Roy C. Sidle,
Thomas Skaugen,
Hugh G. Smith,
Sabine M. Spiessl,
Lina Stein,
Ingelin Steinsland,
Ulrich Strasser,
Bob Su,
Ján Szolgay,
David G. Tarboton,
Flavia Tauro,
Guillaume Thirel,
Fuqiang Tian,
Rui Tong,
Kamshat Tussupova,
Hristos Tyralis,
R. Uijlenhoet,
Rens van Beek,
Ruud van der Ent,
Martine van der Ploeg,
Anne F. Van Loon,
Ilja van Meerveld,
Ronald van Nooijen,
Pieter van Oel,
Jean‐Philippe Vidal,
Jana von Freyberg,
Sergiy Vorogushyn,
Przemysław Wachniew,
Andrew J. Wade,
Philip J. Ward,
Ida Westerberg,
Christopher White,
Eric F. Wood,
Ross Woods,
Zongxue Xu,
Koray K. Yılmaz,
Yongqiang Zhang
Hydrological Sciences Journal, Volume 64, Issue 10
This paper is the outcome of a community initiative to identify major unsolved scientific problems in hydrology motivated by a need for stronger harmonisation of research efforts. The procedure involved a public consultation through online media, followed by two workshops through which a large number of potential science questions were collated, prioritised, and synthesised. In spite of the diversity of the participants (230 scientists in total), the process revealed much about community priorities and the state of our science: a preference for continuity in research questions rather than radical departures or redirections from past and current work. Questions remain focused on the process-based understanding of hydrological variability and causality at all space and time scales. Increased attention to environmental change drives a new emphasis on understanding how change propagates across interfaces within the hydrological system and across disciplinary boundaries. In particular, the expansion of the human footprint raises a new set of questions related to human interactions with nature and water cycle feedbacks in the context of complex water management problems. We hope that this reflection and synthesis of the 23 unsolved problems in hydrology will help guide research efforts for some years to come.
The leaky pipeline phenomenon refers to the disproportionate decline of female scientists at higher academic career levels and is a major problem in the natural sciences. Identifying the underlying causes is challenging, and thus, solving the problem remains difficult. To better understand the reasons for the leaky pipeline, we assess the perceptions and impacts of gender bias and imbalance—two major drivers of the leakage—at different academic career levels with an anonymous survey in geoscience academia (n=1,220). The survey results show that both genders view male geoscientists as substantially more gender biased than female scientists. Moreover, female geoscientists are more than twice as likely to experience negative gender bias at their workplaces and scientific organizations compared to male geoscientists. There are also pronounced gender differences regarding (i) the relevance of role models, (ii) family-friendly working conditions, and (iii) the approval of gender quotas for academic positions. Given the male dominance in senior career levels, our results emphasize that those feeling less impacted by the negative consequences of gender bias and imbalance are the ones in position to tackle the problem. We thus call for actions to better address gender biases and to ensure a balanced gender representation at decision-making levels to ultimately retain more women in geoscience academia.