Simon Michael Papalexiou


DOI bib
Changes of Extreme Precipitation in CMIP6 Projections: Should We Use Stationary or Nonstationary Models?
Hebatallah Mohamed Abdelmoaty, Simon Michael Papalexiou
Journal of Climate, Volume 36, Issue 9

Abstract With global warming, the behavior of extreme precipitation shifts toward nonstationarity. Here, we analyze the annual maxima of daily precipitation (AMP) all over the globe using projections of the latest phase of the Coupled Model Intercomparison Project (CMIP6) under four shared socioeconomic pathways (SSPs). The projections were bias corrected using a semiparametric quantile mapping, a novel technique extended to extreme precipitation. This analysis 1) explores the variability of future AMP globally and 2) investigates the performance of stationary and nonstationary models in describing future AMP with trends. The results show that global warming potentially intensifies AMP. For the nonparametric analysis, the 33-yr precipitation levels are increasing up to 33.2 mm compared to the historical period. The parametric analysis shows that the return period of 100-yr historical events will decrease approximately to 50 and 70 years in the Northern and Southern Hemispheres, respectively. Under the highest emission scenario, the projected 100-yr levels are expected to increase by 7.5%–21% over the historical levels. Using stationary models to estimate the 100-yr return level for AMP projections with trends leads to an underestimation of 3.4% on average. Extensive Monte Carlo experiments are implemented to explain this underestimation.

DOI bib
Non-asymptotic Weibull tails explain the statistics of extreme daily precipitation
Francesco Marra, William A. Amponsah, Simon Michael Papalexiou
Advances in Water Resources, Volume 173

The exceedance probability of extreme daily precipitation is usually quantified assuming asymptotic behaviours. Non-asymptotic statistics, however, would allow us to describe extremes with reduced uncertainty and to establish relations between physical processes and emerging extremes. These approaches are still mistrusted by part of the community as they rely on assumptions on the tail behaviour of the daily precipitation distribution. This paper addresses this gap. We use global quality-controlled long rain gauge records to show that daily precipitation annual maxima are samples likely emerging from Weibull tails in most of the stations worldwide. These non-asymptotic tails can explain the statistics of observed extremes better than asymptotic approximations from extreme value theory. We call for a renewed consideration of non-asymptotic statistics for the description of extremes.

DOI bib
Large‐Domain Multisite Precipitation Generation: Operational Blueprint and Demonstration for 1,000 Sites
Simon Michael Papalexiou, Francesco Serinaldi, Martyn P. Clark
Water Resources Research, Volume 59, Issue 3

Abstract Stochastic simulations of spatiotemporal patterns of hydroclimatic processes, such as precipitation, are needed to build alternative but equally plausible inputs for water‐related design and management, and to estimate uncertainty and assess risks. However, while existing stochastic simulation methods are mature enough to deal with relatively small domains and coarse spatiotemporal scales, additional work is required to develop simulation tools for large‐domain analyses, which are more and more common in an increasingly interconnected world. This study proposes a methodological advancement in the CoSMoS framework, which is a flexible simulation framework preserving arbitrary marginal distributions and correlations, to dramatically decrease the computational burden and make the algorithm fast enough to perform large‐domain simulations in short time. The proposed approach focuses on correlated processes with mixed (zero‐inflated) Uniform marginal distributions. These correlated processes act as intermediates between the target process to simulate (precipitation) and parent Gaussian processes that are the core of the simulation algorithm. Working in the mixed‐Uniform space enables a substantial simplification of the so‐called correlation transformation functions, which represent a computational bottle neck in the original CoSMoS formulation. As a proof of concept, we simulate 40 years of daily precipitation records from 1,000 gauging stations in the Mississippi River basin. Moreover, we extend CoSMoS incorporating parent non‐Gaussian processes with different degrees of tail dependence and suggest potential improvements including the separate simulation of occurrence and intensity processes, and the use of advection, anisotropy, and nonstationary spatiotemporal correlation functions.

DOI bib
Precipitation Bias Correction: A Novel Semi‐parametric Quantile Mapping Method
Chandra Rupa Rajulapati, Simon Michael Papalexiou
Earth and Space Science, Volume 10, Issue 4

Bias correction methods are used to adjust simulations from global and regional climate models to use them in informed decision-making. Here we introduce a semi-parametric quantile mapping (SPQM) method to bias-correct daily precipitation. This method uses a parametric probability distribution to describe observations and an empirical distribution for simulations. Bias-correction techniques typically adjust the bias between observation and historical simulations to correct projections. The SPQM however corrects simulations based only on observations assuming the detrended simulations have the same distribution as the observations. Thus, the bias-corrected simulations preserve the climate change signal, including changes in the magnitude and probability dry, and guarantee a smooth transition from observations to future simulations. The results are compared with popular quantile mapping techniques, that is, the quantile delta mapping (QDM) and the statistical transformation of the CDF using splines (SSPLINE). The SPQM performed well in reproducing the observed statistics, marginal distribution, and wet and dry spells. Comparatively, it performed at least equally well as the QDM and SSPLINE, specifically in reproducing observed wet spells and extreme quantiles. The method is further tested in a basin-scale region. The spatial variability and statistics of the observed precipitation are reproduced well in the bias-corrected simulations. Overall, the SPQM is easy to apply, yet robust in bias-correcting daily precipitation simulations.

DOI bib
Mixture Probability Models with Covariates: Applications in Estimating Risk of Hydroclimatic Extremes
Nawres Yousfi, Salaheddine El Adlouni, Simon Michael Papalexiou, Philippe Gachon
Journal of Hydrologic Engineering, Volume 28, Issue 4

Modeling of extreme events is important in many scientific fields, including environmental and civil engineering, and impacts and risk assessments. Among available methods, statistical models that allow estimating extremes’ frequency and intensity are regularly used in procedures to anticipate potential changes in extreme events. Extreme value theory provides a theoretical basis for statistical estimation of extreme events using frequency analysis. The challenge in modeling is knowing when to use the block maxima method or the peaks-over-threshold (POT) method. Each has its drawbacks. POT describes the main characteristics of the observed extreme series; the threshold selection is always challenging and might affect the accuracy of the simulated results and the credibility of changes in extreme values. To encompass this challenge, mixture models offer more flexibility to represent samples with nonhomogeneous data. This study presents the gamma generalized Pareto (GGP) mixture model for estimating risk occurrence of hydroclimatic extremes. The model was developed in its general form, whereas the observed hydrometeorological extreme events depend on multidimensional covariates. A maximum likelihood algorithm is proposed to estimate the parameters with a constraint on the shape parameter of the generalized Pareto (GP) distribution. A Monte Carlo (MC) simulation compared the proposed model with the classical POT approach, with a fixed threshold, and the annual maximum series of streamflow. The approach was applied using a daily hydrological data set from an observed station located in the Saint John River at Fort Kent (01AD002), New Brunswick, Canada. The results show a flexibility to model extremes for dependent or nonstationary time series and adequately describes the central part of the observed frequencies, as well as the tails.


DOI bib
Assessing extremes in hydroclimatology: A review on probabilistic methods
Sofia D. Nerantzaki, Simon Michael Papalexiou
Journal of Hydrology, Volume 605

• Comprehensive and extended review on probabilistic methods for hydroclimatic extremes. • Synthesis of methods used in analyses of extremes in precipitation, streamflow and temperature. • Over 20 probability distribution estimation methods in 25 comparative studies reviewed. • Identification of most promising contemporary probabilistic methods. Here we review methods used for probabilistic analysis of extreme events in Hydroclimatology. We focus on streamflow, precipitation, and temperature extremes at regional and global scales. The review has four thematic sections: (1) probability distributions used to describe hydroclimatic extremes, (2) comparative studies of parameter estimation methods, (3) non-stationarity approaches, and (4) model selection tools. Synthesis of the literature shows that: (1) recent studies, in general, agree that precipitation and streamflow extremes should be described by heavy-tailed distributions, (2) the Method of Moments (MOM) is typically the first choice in estimating distribution parameters but it is outperformed by methods such as L-Moments (LM), Maximum Likelihood (ML), Least Squares (LS), and Bayesian Markov Chain Monte Carlo (BMCMC), (3) there are less popular parameter estimation techniques such as the Maximum Product of Spacings (MPS), the Elemental Percentile (EP), and the Minimum Density Power Divergence Estimator (MDPDE) that have shown competitive performance in fitting extreme value distributions, and (4) non-stationary analyses of extreme events are gaining popularity; the ML is the typically used method, yet literature suggests that the Generalized Maximum Likelihood (GML) and the Weighted Least Squares (WLS) may be better alternatives. The review offers a synthesis of past and contemporary methods used in the analysis of hydroclimatic extremes, aiming to highlight their strengths and weaknesses. Finally, the comparative studies summary helps the reader identify the most suitable modeling framework for their analyses, based on the extreme hydroclimatic variables, sample sizes, locations, and evaluation metrics reviewed.

DOI bib
Exacerbated heat in large Canadian cities
Chandra Rupa Rajulapati, Rohan Kumar Gaddam, Sofia D. Nerantzaki, Simon Michael Papalexiou, Alex J. Cannon, Martyn P. Clark
Urban Climate, Volume 42

Extreme temperature is a major threat to urban populations; thus, it is crucial to understand future changes to plan adaptation and mitigation strategies. We assess historical and CMIP6 projected trends of minimum and maximum temperatures for the 18 most populated Canadian cities. Temperatures increase (on average 0.3°C/decade) in all cities during the historical period (1979–2014), with Prairie cities exhibiting lower rates (0.06°C/decade). Toronto (0.5°C/decade) and Montreal (0.7°C/decade) show high increasing trends in the observation period. Higher-elevation cities, among those with the same population, show slower increasing temperature rates compared to the coastal ones. Projections for cities in the Prairies show 12% more summer days compared to the other regions. The number of heat waves (HWs) increases for all cities, in both the historical and future periods; yet alarming increases are projected for Vancouver, Victoria, and Halifax from no HWs in the historical period to approximately 4 HWs/year on average, towards the end of 2100 for the SSP5–8.5. The cold waves reduce considerably for all cities in the historical period at a rate of 2 CWs/decade on average and are projected to further reduce by 50% compared to the observed period. • CMIP6 simulations for extreme temperature estimation of the largest Canadian cities. • Prairies' cities exhibit a lower rate of temperature increase compared to the cities in Great lakes in observation period. • Cities in Prairies are projected to have 12% more summer days than the rest of the cities. • The number of heat waves increases significantly, especially for Vancouver, Victoria, and Halifax. • Cold waves are expected to decrease by 50% in future.

DOI bib
EM-Earth: The Ensemble Meteorological Dataset for Planet Earth
Guoqiang Tang, Martyn P. Clark, Simon Michael Papalexiou
Bulletin of the American Meteorological Society, Volume 103, Issue 4

Abstract Gridded meteorological estimates are essential for many applications. Most existing meteorological datasets are deterministic and have limitations in representing the inherent uncertainties from both the data and methodology used to create gridded products. We develop the Ensemble Meteorological Dataset for Planet Earth (EM-Earth) for precipitation, mean daily temperature, daily temperature range, and dewpoint temperature at 0.1° spatial resolution over global land areas from 1950 to 2019. EM-Earth provides hourly/daily deterministic estimates, and daily probabilistic estimates (25 ensemble members), to meet the diverse requirements of hydrometeorological applications. To produce EM-Earth, we first developed a station-based Serially Complete Earth (SC-Earth) dataset, which removes the temporal discontinuities in raw station observations. Then, we optimally merged SC-Earth station data and ERA5 estimates to generate EM-Earth deterministic estimates and their uncertainties. The EM-Earth ensemble members are produced by sampling from parametric probability distributions using spatiotemporally correlated random fields. The EM-Earth dataset is evaluated by leave-one-out validation, using independent evaluation stations, and comparing it with many widely used datasets. The results show that EM-Earth is better in Europe, North America, and Oceania than in Africa, Asia, and South America, mainly due to differences in the available stations and differences in climate conditions. Probabilistic spatial meteorological datasets are particularly valuable in regions with large meteorological uncertainties, where almost all existing deterministic datasets face great challenges in obtaining accurate estimates.

DOI bib
Review of GPM IMERG performance: A global perspective
Rajani Kumar Pradhan, Yannis Markonis, Mijael Rodrigo Vargas Godoy, Anahí Villalba-Pradas, Konstantinos M. Andreadis, Efthymios I. Nikolopoulos, Simon Michael Papalexiou, Akif Rahim, Francisco J. Tapiador, Martin Hanel
Remote Sensing of Environment, Volume 268

• A comprehensive review and analysis of IMERG validation studies from 2016 to 2019. • There is robust representation of spatio-temporal patterns of precipitation. • Discrepancies can be found in extreme and light precipitation, and the winter season. • The 30-min scale has not yet been sufficiently evaluated. • Using IMERG in hydrological simulation results to high variance in their performance. Accurate, reliable, and high spatio-temporal resolution precipitation data are vital for many applications, including the study of extreme events, hydrological modeling, water resource management, and hydroclimatic research in general. In this study, we performed a systematic review of the available literature to assess the performance of the Integrated Multi-Satellite Retrievals for GPM (IMERG) products across different geographical locations and climatic conditions around the globe. Asia, and in particular China, are the subject of the largest number of IMERG evaluation studies on the continental and country level. When compared to ground observational records, IMERG is found to vary with seasons, as well as precipitation type, structure, and intensity. It is shown to appropriately estimate and detect regional precipitation patterns, and their spatial mean, while its performance can be improved over mountainous regions characterized by orographic precipitation, complex terrains, and for winter precipitation. Furthermore, despite IMERG's better performance compared to other satellite products in reproducing spatio-temporal patterns and variability of extreme precipitation, some limitations were found regarding the precipitation intensity. At the temporal scales, IMERG performs better at monthly and annual time steps than the daily and sub-daily ones. Finally, in terms of hydrological application, the use of IMERG has resulted in significant discrepancies in streamflow simulation. However, and most importantly, we find that each new version that replaces the previous one, shows substantial improvement in almost every spatiotemporal scale and climatic condition. Thus, despite its limitations, IMERG evolution reveals a promising path for current and future applications.

DOI bib
Continuous hydrologic modelling for small and ungauged basins: A comparison of eight rainfall models for sub-daily runoff simulations
Salvatore Grimaldi, Elena Volpi, Andreas Langousis, Simon Michael Papalexiou, Davide Luciano De Luca, Rodolfo Piscopia, Sofia D. Nerantzaki, Georgia Papacharalampous, Andrea Petroselli‬
Journal of Hydrology, Volume 610

• Eight rainfall models are compared as input for a simplified continuous hydrologic model. • The comparison is performed by investigating the simulated runoff properties. • Results suggest that all rainfall models lead to realistic runoff time series. • Four models will be further optimized to be adapted for data-scarce applications. Continuous hydrologic modelling is a natural evolution of the event-based design approach in modern hydrology. It improves the rainfall-runoff transformation and provides the practitioner with more effective hydrological output information for risk assessment. However, this approach is still not widely adopted, mainly because the choice of the most appropriate rainfall simulation model (which is the core of continuous frameworks) for the specific aim of risk analysis has not been sufficiently investigated. In this paper, we test eight rainfall models by evaluating the performances of the simulated rainfall time series when used as input for a simplified continuous rainfall-runoff model, the COSMO4SUB, which is particularly designed for small and ungauged basins. The comparison confirms the capability of all models to provide realistic flood events and allows identifying the models to be further improved and tailored for data-scarce hydrological risk applications. The suggested framework is transferable to any catchment while different hydrologic and rainfall models can be used.

DOI bib
Increasing trends in rainfall erosivity in the Yellow River basin from 1971 to 2020
W. Wang, Shuiqing Yin, Ge Gao, Simon Michael Papalexiou, Z. Wang
Journal of Hydrology, Volume 610

• Rainfall erosivity for Yellow River basin increased significantly at both event and seasonal scale during 1971–2020. • Storms shifted towards longer durations and higher precipitation amounts. • Extreme precipitation within the basin occurred more frequently and intensely. • The increasing trend became more pronounced in the last two decades. Hourly precipitation data from 1971 to 2020, collected from 98 stations distributed across the Yellow River basin, were analyzed to detect changes in characteristics on rainfall and rainfall erosivity for all storms and storms with extreme erosivity (greater than 90 th percentile). Results showed that over the past 50 years, rainfall erosivity at both event and seasonal scales over the whole basin increased significantly ( p < 0.05) with rates of 5.46% and 6.86% decade -1 , respectively, compared to the 1981–2010 average values. Approximate 80% of 98 stations showed increasing trends and 20% of stations had statistically significant trends ( p < 0.1). The increase of rainfall erosivity resulted from the significant increasing trends of average storm precipitation ( p < 0.1), duration ( p < 0.1), rainfall energy ( p < 0.05) and maximum 1-h intensity ( p < 0.05). In addition, the total extreme erosivity showed significant upward trends at a relative rate of 6.05% decade -1 ( p < 0.05). Extreme erosivity storms occurred more frequently and with higher rainfall energy during the study period ( p < 0.05). Trends for seasonal total and extreme erosivity were also estimated based on daily rainfall data, and the changing magnitudes were similar to those based on hourly rainfall data, which suggested daily rainfall can be applied to detect interannual and long-term variations of rainfall erosivity in the absence of rainfall data with higher resolution. It was suggested that soil and water conservation strategies and vegetation projects conducted within the Yellow River basin should be continued and enhanced in the future.

DOI bib
Extreme Precipitation in China: A Review on Statistical Methods and Applications
Xuezhi Gu, Lei Ye, Qian Xin, Chi Zhang, Fanzhang Zeng, Sofia D. Nerantzaki, Simon Michael Papalexiou
Advances in Water Resources, Volume 163

• A first comprehensive and systematic review on the research of extreme precipitation in China. • Variation and regional characteristics of extreme precipitation under non-stationary conditions due to climate change and human activities. • Supports and basis for engineering application and further research on extreme precipitation and flood in China. Recent years have witnessed global massive property losses and casualties caused by extreme precipitation and its subsequent natural disasters, including floods and landslides. China is one of the countries deeply affected by these casualties. If the statistical characteristics and laws of extreme precipitation could be clearly grasped, then the negative impacts triggered by it may be minimized. China is a vast country and diverse in climate and terrain, hence different regions may be suitable for different analyses and research methods. Therefore, it is necessary to clarify the research progress, methods and current status of extreme precipitation across the country. This paper attempts to provide a comprehensive review of techniques and methods used in extreme precipitation research and engineering practice and their applications. The literature is reviewed focusing on seven aspects: (1) annual maxima method (AM), (2) peaks over threshold method (POT), (3) probable maximum precipitation (PMP), (4) non-stationary analysis of precipitation extremes, (5) intensity-duration-frequency curves (IDF), (6) uncertainty in extreme precipitation frequency analysis, and (7) spatial variability of extreme precipitation. Research on extreme precipitation in China is generally based or centered on the above seven aspects. The current study aims to provide ideas for further research on extreme precipitation frequency analysis and its response to climate change and human activities.

DOI bib
A new very simply explicitly invertible approximation for the standard normal cumulative distribution function
Jessica Lipoth, Yoseph Tereda, Simon Michael Papalexiou, Raymond J. Spiteri
AIMS Mathematics, Volume 7, Issue 7

<abstract><p>This paper proposes a new very simply explicitly invertible function to approximate the standard normal cumulative distribution function (CDF). The new function was fit to the standard normal CDF using both MATLAB's Global Optimization Toolbox and the BARON software package. The results of three separate fits are presented in this paper. Each fit was performed across the range $ 0 \leq z \leq 7 $ and achieved a maximum absolute error (MAE) superior to the best MAE reported for previously published very simply explicitly invertible approximations of the standard normal CDF. The best MAE reported from this study is 2.73e–05, which is nearly a factor of five better than the best MAE reported for other published very simply explicitly invertible approximations.</p></abstract>

DOI bib
Changes in the risk of extreme temperatures in megacities worldwide
Chandra Rupa Rajulapati, Hebatallah Mohamed Abdelmoaty, Sofia D. Nerantzaki, Simon Michael Papalexiou
Climate Risk Management, Volume 36

Globally, extreme temperatures have severe impacts on the economy, human health, food and water security, and ecosystems. Mortality rates have been increased due to heatwaves in several regions. Specifically, megacities have high impacts with the increasing temperature and ever-expanding urban areas; it is important to understand extreme temperature changes in terms of duration, magnitude, and frequency for future risk management and disaster mitigation. Here we framed a novel Semi-Parametric quantile mapping method to bias-correct the CMIP6 minimum and maximum temperature projections for 199 megacities worldwide. The changes in maximum and minimum temperature are quantified in terms of climate indices (ETCCDI and HDWI) for the four Shared Socioeconomic Pathways (SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5). Cities in northern Asia and northern North America (Kazan, Samara, Heihe, Montréal, Edmonton, and Moscow) are warming at a higher rate compared to the other regions. There is an increasing and decreasing trend for the warm and cold extremes respectively. Heatwaves increase exponentially in the future with the increase in warming, that is, from SSP1-2.6 to SSP5-8.5. Among the CMIP6 models, a huge variability is observed, and this further increases as the warming increases. All climate indices have steep slopes for the far future (2066–2100) compared to the near future (2031–2065). Yet the variability among CMIP6 models in near future is high compared to the far future for cold indices.

DOI bib
Detailed investigation of discrepancies in Köppen-Geiger climate classification using seven global gridded products
Salma Hobbi, Simon Michael Papalexiou, Chandra Rupa Rajulapati, Sofia D. Nerantzaki, Yannis Markonis, Guoqiang Tang, Martyn P. Clark
Journal of Hydrology, Volume 612

The Köppen-Geiger (KG) climate classification has been widely used to determine the climate at global and regional scales using precipitation and temperature data. KG maps are typically developed using a single product; however, uncertainties in KG climate types resulting from different precipitation and temperature datasets have not been explored in detail. Here, we assess seven global datasets to show uncertainties in KG classification from 1980 to 2017. Using a pairwise comparison at global and zonal scales, we quantify the similarity among the seven KG maps. Gauge- and reanalysis-based KG maps have a notable difference. Spatially, the highest and lowest similarity is observed for the North and South Temperate zones, respectively. Notably, 17% of grids among the seven maps show variations even in the major KG climate types, while 35% of grids are described by more than one KG climate subtype. Strong uncertainty is observed in south Asia, central and south Africa, western America, and northeastern Australia. We created two KG master maps (0.5° resolution) by merging the climate maps directly and by combining the precipitation and temperature data from the seven datasets. These master maps are more robust than the individual ones showing coherent spatial patterns. This study reveals the large uncertainty in climate classification and offers two robust KG maps that may help to better evaluate historical climate and quantify future climate shifts.

DOI bib
Rainfall Generation Revisited: Introducing CoSMoS‐2s and Advancing Copula‐Based Intermittent Time Series Modeling
Simon Michael Papalexiou
Water Resources Research, Volume 58, Issue 6

Abstract What elements should a parsimonious model reproduce at a single scale to precisely simulate rainfall at many scales? We posit these elements are: (a) the probability of dry and linear correlation structure of the wet/dry sequence as a proxy reproducing the distribution of wet/dry spells, and (b) the marginal distribution of nonzero rainfall and its correlation structure. We build a two‐state rainfall model, the CoSMoS‐2s, that explicitly reproduces these elements and is easily applicable at any timescale. Additionally, the paper: (a) introduces the Generalized Exponential ( ) distribution system comprising six flexible distributions with desired properties to describe nonzero rainfall and facilitate time series generation; (b) extends the CoSMoS framework to allow simulations with negative correlations; (c) simplifies the generation of binary sequences with any correlation structure by analytical approximations; (d) introduces the rank‐based CoSMoS‐2s that preserves Spearman's correlations, has an analytical formulation, and is also applicable for infinite variance time series, (e) introduces the copula‐based CoSMoS‐2s enabling intermittent times series generation with nonzero values having the dependence structure of any desired copula, and (f) offers conceptual generalizations for rainfall modeling and beyond, with specific ideas for future improvements and extensions. The CoSMoS‐2s is tested using four long hourly rainfall records; the simulations reproduce rainfall properties at multiple scales including the wet/dry spells, probability of dry, characteristics of nonzero rainfall, and the behavior of extremes.

DOI bib
Status and prospects for drought forecasting: opportunities in artificial intelligence and hybrid physical–statistical forecasting
Amir AghaKouchak, Baoxiang Pan, Omid Mazdiyasni, Mojtaba Sadegh, Shakil Jiwa, Wenkai Zhang, Charlotte Love, Shahrbanou Madadgar, Simon Michael Papalexiou, Steven J. Davis, Kuolin Hsu, Soroosh Sorooshian
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Volume 380, Issue 2238

Despite major improvements in weather and climate modelling and substantial increases in remotely sensed observations, drought prediction remains a major challenge. After a review of the existing methods, we discuss major research gaps and opportunities to improve drought prediction. We argue that current approaches are top-down, assuming that the process(es) and/or driver(s) are known—i.e. starting with a model and then imposing it on the observed events (reality). With the help of an experiment, we show that there are opportunities to develop bottom-up drought prediction models—i.e. starting from the reality (here, observed events) and searching for model(s) and driver(s) that work. Recent advances in artificial intelligence and machine learning provide significant opportunities for developing bottom-up drought forecasting models. Regardless of the type of drought forecasting model (e.g. machine learning, dynamical simulations, analogue based), we need to shift our attention to robustness of theories and outputs rather than event-based verification. A shift in our focus towards quantifying the stability of uncertainty in drought prediction models, rather than the goodness of fit or reproducing the past, could be the first step towards this goal. Finally, we highlight the advantages of hybrid dynamical and statistical models for improving current drought prediction models. This article is part of the Royal Society Science+ meeting issue ‘Drought risk in the Anthropocene’.


DOI bib
The use of serially complete station data to improve the temporal continuity of gridded precipitation and temperature estimates
Guoqiang Tang, Martyn P. Clark, Simon Michael Papalexiou
Journal of Hydrometeorology

Abstract Stations are an important source of meteorological data, but often suffer from missing values and short observation periods. Gap filling is widely used to generate serially complete datasets (SCDs), which are subsequently used to produce gridded meteorological estimates. However, the value of SCDs in spatial interpolation is scarcely studied. Based on our recent efforts to develop a SCD over North America (SCDNA), we explore the extent to which gap filling improves gridded precipitation and temperature estimates. We address two specific questions: (1) Can SCDNA improve the statistical accuracy of gridded estimates in North America? (2) Can SCDNA improve estimates of trends on gridded data? In addressing these questions, we also evaluate the extent to which results depend on the spatial density of the station network and the spatial interpolation methods used. Results show that the improvement in statistical interpolation due to gap filling is more obvious for precipitation, followed by minimum temperature and maximum temperature. The improvement is larger when the station network is sparse and when simpler interpolation methods are used. SCDs can also notably reduce the uncertainties in spatial interpolation. Our evaluation across North America from 1979 to 2018 demonstrates that SCDs improve the accuracy of interpolated estimates for most stations and days. SCDNA-based interpolation also obtains better trend estimation than observation-based interpolation. This occurs because stations used for interpolation could change during a specific period, causing changepoints in interpolated temperature estimates and affect the long-term trends of observation-based interpolation, which can be avoided using SCDNA. Overall, SCDs improve the performance of gridded precipitation and temperature estimates.

DOI bib
Biases Beyond the Mean in CMIP6 Extreme Precipitation: A Global Investigation
Hebatallah Mohamed Abdelmoaty, Simon Michael Papalexiou, Chandra Rupa Rajulapati, Amir AghaKouchak
Earth's Future, Volume 9, Issue 10

Climate models are crucial for assessing climate variability and change. A reliable model for future climate should reasonably simulate the historical climate. Here, we assess the performance of CMIP6 models in reproducing statistical properties of observed annual maxima of daily precipitation. We go beyond the commonly used methods and assess CMIP6 simulations on three scales by performing: (a) univariate comparison based on L-moments and relative difference measures; (b) bivariate comparison using Kernel densities of mean and L-variation, and of L-skewness and L-kurtosis, and (c) comparison of the entire distribution function using the Generalized Extreme Value () distribution coupled with a novel application of the Anderson-Darling Goodness-of-fit test. The results reveal that the statistical shape properties (related to the frequency and magnitude of extremes) of CMIP6 simulations match well with the observational datasets. The simulated mean and variation differ among the models with 70% of simulations having a difference within 10% from the observations. Biases are observed in the bivariate investigation of mean and variation. Several models perform well with the HadGEM3-GC31-MM model performing well in all three scales when compared to the ground-based Global Precipitation Climatology Centre data. Finally, the study highlights biases of CMIP6 models in simulating extreme precipitation in the Arctic, Tropics, arid and semi-arid regions.

DOI bib
Probabilistic Evaluation of Drought in CMIP6 Simulations
Simon Michael Papalexiou, Chandra Rupa Rajulapati, Konstantinos M. Andreadis, Efi Foufoula‐Georgiou, Martyn P. Clark, Kevin E. Trenberth
Earth's Future, Volume 9, Issue 10

As droughts have widespread social and ecological impacts, it is critical to develop long-term adaptation and mitigation strategies to reduce drought vulnerability. Climate models are important in quantifying drought changes. Here, we assess the ability of 285 CMIP6 historical simulations, from 17 models, to reproduce drought duration and severity in three observational data sets using the Standardized Precipitation Index (SPI). We used summary statistics beyond the mean and standard deviation, and devised a novel probabilistic framework, based on the Hellinger distance, to quantify the difference between observed and simulated drought characteristics. Results show that many simulations have less than ±10% error in reproducing the observed drought summary statistics. The hypothesis that simulations and observations are described by the same distribution cannot be rejected for more than 80% of the grids based on our H distance framework. No single model stood out as demonstrating consistently better performance over large regions of the globe. The variance in drought statistics among the simulations is higher in the tropics compared to other latitudinal zones. Though the models capture the characteristics of dry spells well, there is considerable bias in low precipitation values. Good model performance in terms of SPI does not imply good performance in simulating low precipitation. Our study emphasizes the need to probabilistically evaluate climate model simulations in order to both pinpoint model weaknesses and identify a subset of best-performing models that are useful for impact assessments.

DOI bib
The Abuse of Popular Performance Metrics in Hydrologic Modeling
Martyn P. Clark, Richard M. Vogel, Jonathan Lamontagne, Naoki Mizukami, Wouter Knoben, Guoqiang Tang, Shervan Gharari, Jim Freer, Paul H. Whitfield, Kevin Shook, Simon Michael Papalexiou
Water Resources Research, Volume 57, Issue 9

The goal of this commentary is to critically evaluate the use of popular performance metrics in hydrologic modeling. We focus on the Nash-Sutcliffe Efficiency (NSE) and the Kling-Gupta Efficiency (KGE) metrics, which are both widely used in hydrologic research and practice around the world. Our specific objectives are: (a) to provide tools that quantify the sampling uncertainty in popular performance metrics; (b) to quantify sampling uncertainty in popular performance metrics across a large sample of catchments; and (c) to prescribe the further research that is, needed to improve the estimation, interpretation, and use of popular performance metrics in hydrologic modeling. Our large-sample analysis demonstrates that there is substantial sampling uncertainty in the NSE and KGE estimators. This occurs because the probability distribution of squared errors between model simulations and observations has heavy tails, meaning that performance metrics can be heavily influenced by just a few data points. Our results highlight obvious (yet ignored) abuses of performance metrics that contaminate the conclusions of many hydrologic modeling studies: It is essential to quantify the sampling uncertainty in performance metrics when justifying the use of a model for a specific purpose and when comparing the performance of competing models.

DOI bib
VISCOUS: A Variance‐Based Sensitivity Analysis Using Copulas for Efficient Identification of Dominant Hydrological Processes
Razi Sheikholeslami, Shervan Gharari, Simon Michael Papalexiou, Martyn P. Clark
Water Resources Research, Volume 57, Issue 7

Global sensitivity analysis (GSA) has long been recognized as an indispensable tool for model analysis. GSA has been extensively used for model simplification, identifiability analysis, and diagnostic tests. Nevertheless, computationally efficient methodologies are needed for GSA, not only to reduce the computational overhead, but also to improve the quality and robustness of the results. This is especially the case for process-based hydrologic models, as their simulation time typically exceeds the computational resources available for a comprehensive GSA. To overcome this computational barrier, we propose a data-driven method called VISCOUS, variance-based sensitivity analysis using copulas. VISCOUS uses Gaussian mixture copulas to approximate the joint probability density function of a given set of input-output pairs for estimating the variance-based sensitivity indices. Our method identifies dominant hydrologic factors by recycling existing input-output data, and thus can deal with arbitrary sample sets drawn from the input-output space. We used two hydrologic models of increasing complexity (HBV and VIC) to assess the performance of VISCOUS. Our results confirm that VISCOUS and the conventional variance-based method can detect similar important and unimportant factors. Furthermore, the VISCOUS method can substantially reduce the computational cost required for sensitivity analysis. Our proposed method is particularly useful for process-based models with many uncertain parameters, large domain size, and high spatial and temporal resolution.

DOI bib
SC-Earth: A Station-Based Serially Complete Earth Dataset from 1950 to 2019
Guoqiang Tang, Martyn P. Clark, Simon Michael Papalexiou
Journal of Climate, Volume 34, Issue 16

Abstract Meteorological data from ground stations suffer from temporal discontinuities caused by missing values and short measurement periods. Gap-filling and reconstruction techniques have proven to be effective in producing serially complete station datasets (SCDs) that are used for a myriad of meteorological applications (e.g., developing gridded meteorological datasets and validating models). To our knowledge, all SCDs are developed at regional scales. In this study, we developed the serially complete Earth (SC-Earth) dataset, which provides daily precipitation, mean temperature, temperature range, dewpoint temperature, and wind speed data from 1950 to 2019. SC-Earth utilizes raw station data from the Global Historical Climatology Network–Daily (GHCN-D) and the Global Surface Summary of the Day (GSOD). A unified station repository is generated based on GHCN-D and GSOD after station merging and strict quality control. ERA5 is optimally matched with station data considering the time shift issue and then used to assist the global gap filling. SC-Earth is generated by merging estimates from 15 strategies based on quantile mapping, spatial interpolation, machine learning, and multistrategy merging. The final estimates are bias corrected using a combination of quantile mapping and quantile delta mapping. Comprehensive validation demonstrates that SC-Earth has high accuracy around the globe, with degraded quality in the tropics and oceanic islands due to sparse station networks, strong spatial precipitation gradients, and degraded ERA5 estimates. Meanwhile, SC-Earth inherits potential limitations such as inhomogeneity and precipitation undercatch from raw station data, which may affect its application in some cases. Overall, the high-quality and high-density SC-Earth dataset will benefit research in fields of hydrology, ecology, meteorology, and climate. The dataset is available at .

DOI bib
EMDNA: an Ensemble Meteorological Dataset for North America
Guoqiang Tang, Martyn P. Clark, Simon Michael Papalexiou, Andrew J. Newman, Andy Wood, Dominique Brunet, Paul H. Whitfield
Earth System Science Data, Volume 13, Issue 7

Abstract. Probabilistic methods are useful to estimate the uncertainty in spatial meteorological fields (e.g., the uncertainty in spatial patterns of precipitation and temperature across large domains). In ensemble probabilistic methods, “equally plausible” ensemble members are used to approximate the probability distribution, hence the uncertainty, of a spatially distributed meteorological variable conditioned to the available information. The ensemble members can be used to evaluate the impact of uncertainties in spatial meteorological fields for a myriad of applications. This study develops the Ensemble Meteorological Dataset for North America (EMDNA). EMDNA has 100 ensemble members with daily precipitation amount, mean daily temperature, and daily temperature range at 0.1∘ spatial resolution (approx. 10 km grids) from 1979 to 2018, derived from a fusion of station observations and reanalysis model outputs. The station data used in EMDNA are from a serially complete dataset for North America (SCDNA) that fills gaps in precipitation and temperature measurements using multiple strategies. Outputs from three reanalysis products are regridded, corrected, and merged using Bayesian model averaging. Optimal interpolation (OI) is used to merge station- and reanalysis-based estimates. EMDNA estimates are generated using spatiotemporally correlated random fields to sample from the OI estimates. Evaluation results show that (1) the merged reanalysis estimates outperform raw reanalysis estimates, particularly in high latitudes and mountainous regions; (2) the OI estimates are more accurate than the reanalysis and station-based regression estimates, with the most notable improvements for precipitation evident in sparsely gauged regions; and (3) EMDNA estimates exhibit good performance according to the diagrams and metrics used for probabilistic evaluation. We discuss the limitations of the current framework and highlight that further research is needed to improve ensemble meteorological datasets. Overall, EMDNA is expected to be useful for hydrological and meteorological applications in North America. The entire dataset and a teaser dataset (a small subset of EMDNA for easy download and preview) are available at (Tang et al., 2020a).

DOI bib
The Perils of Regridding: Examples using a Global Precipitation Dataset
Chandra Rupa Rajulapati, Simon Michael Papalexiou, Martyn P. Clark, John W. Pomeroy
Journal of Applied Meteorology and Climatology

Abstract Gridded precipitation datasets are used in many applications such as the analysis of climate variability/change and hydrological modelling. Regridding precipitation datasets is common for model coupling (e.g., coupling atmospheric and hydrological models) or comparing different models and datasets. However, regridding can considerably alter precipitation statistics. In this global analysis, the effects of regridding a precipitation dataset are emphasized using three regridding methods (first order conservative, bilinear, and distance weighted averaging). The differences between the original and regridded dataset are substantial and greatest at high quantiles. Differences of 46 mm and 0.13 mm are noted in high (0.95) and low (0.05) quantiles respectively. The impacts of regridding vary spatially for land and oceanic regions; there are substantial differences at high quantiles in tropical land regions, and at low quantiles in polar regions. These impacts are approximately the same for different regridding methods. The differences increase with the size of the grid at higher quantiles and vice versa for low quantiles. As the grid resolution increases, the difference between original and regridded data declines, yet the shift size dominates for high quantiles for which the differences are higher. Whilst regridding is often necessary to use gridded precipitation datasets, it should be used with great caution for fine resolutions (e.g., daily and sub-daily), as it can severely alter the statistical properties of precipitation, specifically at high and low quantiles.

DOI bib
Spatial variability of precipitation extremes over Italy using a fine-resolution gridded product
Benedetta Moccia, Simon Michael Papalexiou, Fabio Russo, Francesco Napolitano
Journal of Hydrology: Regional Studies, Volume 37

• Analysis shows the G E V distribution can underestimate precipitation extremes. • G E V + and B r X I I describe more consistently extreme precipitation than the G E V . • Maps of rainfall depths for different return periods are provided for Italy. Italy. Knowing magnitude and frequency of extreme precipitation is necessary to reduce their impact on vulnerable areas. Here we investigate the performance of the Generalized Extreme Value ( G E V ) distribution, using a fine-resolution satellite-based gridded product, to analyze 13,247 daily rainfall annual maxima samples. A non-extreme value distribution with a power-type behavior, that is, the Burr Type XII ( B r X I I ), is also evaluated and used to test the reliability of the G E V in describing extreme rainfall. (1) in 44.9 % of the analyzed samples the G E V predicts an upper rainfall limit; we deem this is an artifact due to sample variations; (2) we suggest the G E V + distribution, that is, the G E V with shape parameters restricted only to positive values as a more consistent model complying with the nature of extreme precipitation; (3) G E V , G E V + , and B r X I I performed equally well in describing the observed annual precipitation, yet all distributions underestimate the observed sample maximum; (4) the B r X I I , for large return periods, predicts larger rainfall amounts compared to G E V indicating that G E V estimates could underestimate the risk of extremes; and (5) the correlation between the predicted rainfall and the elevation is investigated. Based on the results of this study, we suggest instead of using the classical G E V to use the G E V + and non-extreme value distributions such as the B r X I I to describe precipitation extremes.

DOI bib
Advancing Space‐Time Simulation of Random Fields: From Storms to Cyclones and Beyond
Simon Michael Papalexiou, Francesco Serinaldi, Emilio Porcu
Water Resources Research, Volume 57, Issue 8

Realistic stochastic simulation of hydro-environmental fluxes in space and time, such as rainfall, is challenging yet of paramount importance to inform environmental risk analysis and decision making under uncertainty. Here, we advance random fields simulation by introducing the concepts of general velocity fields and general anisotropy transformations. This expands the capabilities of the so-called Complete Stochastic Modeling Solution (CoSMoS) framework enabling the simulation of random fields (RF's) preserving: (a) any non-Gaussian marginal distribution, (b) any spatiotemporal correlation structure (STCS), (c) general advection expressed by velocity fields with locally varying speed and direction, and (d) locally varying anisotropy. We also introduce new copula-based STCS's and provide conditions guaranteeing their positive definiteness. To illustrate the potential of CoSMoS, we simulate RF's with complex patterns and motion mimicking rainfall storms moving across an area, spiraling fields resembling weather cyclones, fields converging to (or diverging from) a point, and colliding air masses. The proposed methodology is implemented in the freely available CoSMoS R package.

DOI bib
The Global Water Cycle Budget: A Chronological Review
Mijael Rodrigo Vargas Godoy, Yannis Markonis, Martin Hanel, Jan Kyselý, Simon Michael Papalexiou
Surveys in Geophysics, Volume 42, Issue 5

Like civilization and technology, our understanding of the global water cycle has been continuously evolving, and we have adapted our quantification methods to better exploit new technological resources. The accurate quantification of global water fluxes and storages is crucial in studying the global water cycle. These fluxes and storages physically interact with each other, are related through the water budget, and are constrained by it. First attempts to quantify them date back to the early 1900s, and during the past few decades, they have received an increasing research interest, which is reflected in the vast amount of data sources available nowadays. However, these data have not been comprehensive enough due to the high spatiotemporal variability of the global water cycle. Herein, we provide a comprehensive review of the chronological evolution of global water cycle quantification, the distinct data sources and methods used, and a critical assessment of their contribution to improving the spatiotemporal monitoring of the global water cycle. The chronology of global water cycle components shows that the uncertainty of flux estimates over oceans remains higher than that over land. Comparing the standard deviation and the interquartile range of the estimates from the 2000s onward with those from all the estimates (1905-2019), we can affirm that statistical variability has diminished in recent years. Moreover, the variability of ocean precipitation and evaporation estimates from the 2000 onward was reduced by more than 70% compared with earlier studies. These findings advocate that the consistency of global water cycle quantification has been improved.

DOI bib
Informing Stochastic Streamflow Generation by Large-Scale Climate Indices at Single and Multiple Sites
Masoud Zaerpour, Simon Michael Papalexiou, Ali Nazemi
Advances in Water Resources, Volume 156

• An algorithm for incorporating climate indices in streamflow generation is proposed • The algorithm is based on vine copulas, merged with a formal input selector • The algorithm enables representing dynamic impacts of climate indices on streamflow • The algorithm shows a better prediction skill, particularly in high flow seasons • The algorithm captures modes of streamflow variability better than existing schemes • The algorithm is generic and can be applied in single and multisite modes Despite the existence of several stochastic streamflow generators, not much attention has been given to representing the impacts of large-scale climate indices on seasonal to interannual streamflow variability. By merging a formal predictor selection scheme with vine copulas, we propose a generic approach to explicitly incorporate large-scale climate indices in ensemble streamflow generation at single and multiple sites and in both short-term prediction and long-term projection modes. The proposed framework is applied at three headwater streams in the Oldman River Basin in southern Alberta, Canada. The results demonstrate higher skills than existing models both in terms of representing intra- and inter-annual variability, as well as accuracy and predictability of streamflow, particularly during high flow seasons. The proposed algorithm presents a globally relevant scheme for the stochastic streamflow generation, where the impacts of large-scale climate indices on streamflow variability across time and space are significant.

DOI bib
Quantifying the effects of Prairie depressional storage complexes on drainage basin connectivity
Kevin Shook, Simon Michael Papalexiou, John W. Pomeroy
Journal of Hydrology, Volume 593

• Basins in the Canadian Prairies have varying contributing fractions of their areas. • Caused by the variable storage of water in depressions. • The effects of the spatial and frequency distributions of depressions are quantified. • Will lead to the development of improved hydrological models for the region. Runoff in many locations within the Canadian Prairies is dominated by intermittent fill-and-spill between depressions. As a result, many basins have varying fractions of their areas connected to their outlets, due to changing depressional storage. The objective of this research is to determine the causes of the relationships between water storage and the connected fraction of depression-dominated Prairie basins. It is hypothesized that the shapes of the relationship curves are influenced by both the spatial and frequency distributions of depressional storage. Three sets of numerical experiments are presented to test the hypothesis. The first set of experiments demonstrates that where the number of depressions is small, their size and spatial distributions are important in controlling the relationship between the volume of depressional storage and the connected fraction of a basin. As the number of depressions is increased, the areal fractions of the largest depressions decrease, which reduces the importance of the spatial distribution of depressions. The second set of experiments demonstrates that the curve enveloping the connected fraction of a basin can be derived from the frequency distribution of depression areas, and scaling relationships between the area, volume and catchment area of the depressions, when the area of the largest depression is no greater than approximately 5% of the total. The third set of experiments demonstrates that the presence of a single large depression can strongly influence the relationship between the depressional storage and the connected fraction of a basin, depending on the relative size of the large depression, and its location within the basin. A single depression containing 30% of the total depressional area located near the outlet was shown to cause a basin to be nearly endorheic. A similar depression near the top of a basin was demonstrated not to fill and was therefore unable to contribute flows. The implications of the findings for developing hydrological models of large Prairie drainage basins are discussed.

DOI bib
A cross-scale framework for integrating multi-source data in Earth system sciences
Yannis Markonis, Christoforos Pappas, Martin Hanel, Simon Michael Papalexiou
Environmental Modelling & Software, Volume 139

Abstract Integration of Earth system data from various sources is a challenging task. Except for their qualitative heterogeneity, different data records exist for describing similar Earth system processes at different spatiotemporal scales. Data inter-comparison and validation are usually performed at a single spatial or temporal scale, which could hamper the identification of potential discrepancies in other scales. Here, we propose a simple, yet efficient, graphical method for synthesizing and comparing observed and modelled data across a range of spatiotemporal scales. Instead of focusing at specific scales, such as annual means or original grid resolution, we examine how their statistical properties change across spatiotemporal continuum. The proposed cross-scale framework for integrating multi-source data in Earth system sciences is already developed as a stand-alone R package that is freely available to download.

DOI bib
Global-scale massive feature extraction from monthly hydroclimatic time series: Statistical characterizations, spatial patterns and hydrological similarity
Georgia Papacharalampous, Hristos Tyralis, Simon Michael Papalexiou, Andreas Langousis, Sina Khatami, Elena Volpi, Salvatore Grimaldi
Science of The Total Environment, Volume 767

Hydroclimatic time series analysis focuses on a few feature types (e.g., autocorrelations, trends, extremes), which describe a small portion of the entire information content of the observations. Aiming to exploit a larger part of the available information and, thus, to deliver more reliable results (e.g., in hydroclimatic time series clustering contexts), here we approach hydroclimatic time series analysis differently, i.e., by performing massive feature extraction. In this respect, we develop a big data framework for hydroclimatic variable behaviour characterization. This framework relies on approximately 60 diverse features and is completely automatic (in the sense that it does not depend on the hydroclimatic process at hand). We apply the new framework to characterize mean monthly temperature, total monthly precipitation and mean monthly river flow. The applications are conducted at the global scale by exploiting 40-year-long time series originating from over 13 000 stations. We extract interpretable knowledge on seasonality, trends, autocorrelation, long-range dependence and entropy, and on feature types that are met less frequently. We further compare the examined hydroclimatic variable types in terms of this knowledge and, identify patterns related to the spatial variability of the features. For this latter purpose, we also propose and exploit a hydroclimatic time series clustering methodology. This new methodology is based on Breiman's random forests. The descriptive and exploratory insights gained by the global-scale applications prove the usefulness of the adopted feature compilation in hydroclimatic contexts. Moreover, the spatially coherent patterns characterizing the clusters delivered by the new methodology build confidence in its future exploitation...

DOI bib
Seasonality, Intensity, and Duration of Rainfall Extremes Change in a Warmer Climate
Yiannis Moustakis, Simon Michael Papalexiou, Christian Onof, Alec Paschalis
Earth's Future, Volume 9, Issue 3

Precipitation extremes are expected to intensify under climate change with consequent impacts in flooding and ecosystem functioning. Here we use station data and high-resolution simulations from the WRF convection permitting climate model (∼4 km, 1 h) over the US to assess future changes in hourly precipitation extremes. It is demonstrated that hourly precipitation extremes and storm depths are expected to intensify under climate change and what is now a 20-year rainfall will become a 7-year rainfall on average for ∼ 75% of gridpoints over the US. This intensification is mostly expressed as an increase in rainfall tail heaviness. Statistically significant changes in the seasonality and duration of rainfall extremes are also exhibited over ∼ 95% of the domain. Our results suggest more non-linear future precipitation extremes with shorter spell duration that are distributed more uniformly throughout the year.

DOI bib
Explanation and Probabilistic Prediction of Hydrological Signatures with Statistical Boosting Algorithms
Hristos Tyralis, Georgia Papacharalampous, Andreas Langousis, Simon Michael Papalexiou
Remote Sensing, Volume 13, Issue 3

Hydrological signatures, i.e., statistical features of streamflow time series, are used to characterize the hydrology of a region. A relevant problem is the prediction of hydrological signatures in ungauged regions using the attributes obtained from remote sensing measurements at ungauged and gauged regions together with estimated hydrological signatures from gauged regions. The relevant framework is formulated as a regression problem, where the attributes are the predictor variables and the hydrological signatures are the dependent variables. Here we aim to provide probabilistic predictions of hydrological signatures using statistical boosting in a regression setting. We predict 12 hydrological signatures using 28 attributes in 667 basins in the contiguous US. We provide formal assessment of probabilistic predictions using quantile scores. We also exploit the statistical boosting properties with respect to the interpretability of derived models. It is shown that probabilistic predictions at quantile levels 2.5% and 97.5% using linear models as base learners exhibit better performance compared to more flexible boosting models that use both linear models and stumps (i.e., one-level decision trees). On the contrary, boosting models that use both linear models and stumps perform better than boosting with linear models when used for point predictions. Moreover, it is shown that climatic indices and topographic characteristics are the most important attributes for predicting hydrological signatures.


DOI bib
EMDNA: Ensemble Meteorological Dataset for North America
Guoqiang Tang, Martyn P. Clark, Simon Michael Papalexiou, Andrew J. Newman, Andy Wood, V. Vionnet, Paul H. Whitfield

Abstract. Probabilistic methods are very useful to estimate the spatial variability in meteorological conditions (e.g., spatial patterns of precipitation and temperature across large domains). In ensemble probabilistic methods, equally plausible ensemble members are used to approximate the probability distribution, hence uncertainty, of a spatially distributed meteorological variable conditioned on the available information. The ensemble can be used to evaluate the impact of the uncertainties in a myriad of applications. This study develops the Ensemble Meteorological Dataset for North America (EMDNA). EMDNA has 100 members with daily precipitation amount, mean daily temperature, and daily temperature range at 0.1° spatial resolution from 1979 to 2018, derived from a fusion of station observations and reanalysis model outputs. The station data used in EMDNA are from a serially complete dataset for North America (SCDNA) that fills gaps in precipitation and temperature measurements using multiple strategies. Outputs from three reanalysis products are regridded, corrected, and merged using the Bayesian Model Averaging. Optimal Interpolation (OI) is used to merge station- and reanalysis-based estimates. EMDNA estimates are generated based on OI estimates and spatiotemporally correlated random fields. Evaluation results show that (1) the merged reanalysis estimates outperform raw reanalysis estimates, particularly in high latitudes and mountainous regions; (2) the OI estimates are more accurate than the reanalysis and station-based regression estimates, with the most notable improvement for precipitation occurring in sparsely gauged regions; and (3) EMDNA estimates exhibit good performance according to the diagrams and metrics used for probabilistic evaluation. We also discuss the limitations of the current framework and highlight that persistent efforts are needed to further develop probabilistic methods and ensemble datasets. Overall, EMDNA is expected to be useful for hydrological and meteorological applications in North America. The whole dataset and a teaser dataset (a small subset of EMDNA for easy download and preview) are available at (Tang et al., 2020a).

DOI bib
VISCOUS: A Variance-Based Sensitivity Analysis Using Copulas for Efficient Identification of Dominant Hydrological Processes
Razi Sheikholeslami, Shervan Gharari, Simon Michael Papalexiou, Martyn P. Clark

DOI bib
Revisiting flood peak distributions: A pan-Canadian investigation
Mohanad Zaghloul, Simon Michael Papalexiou, Amin Elshorbagy, Paulin Coulibaly
Advances in Water Resources, Volume 145

• Analysis shows the G E V distribution might not be the best choice for flood frequency analysis. • Burr type III and XII are consistent and robust models to describe annual flood peaks. • Pan-Canadian investigation of annual streamflow peaks. Safe and cost-effective design of infrastructures, such as dams, bridges, highways, often requires knowing the magnitude and frequency of peak floods. The Generalized Extreme Value distribution ( G E V ) prevailed in flood frequency analysis along with distributions comprising location, scale, and shape parameters. Here we explore alternative models and propose power-type models, having one scale and two shape parameters. The Burr type III ( Ɓr III) and XII ( Ɓ rXII) distributions are compared against the G E V in 1088 streamflow records of annual peaks across Canada. A generic L-moment algorithm is devised to fit the distributions, also applicable to distributions without analytical L-moment expressions. The analysis shows: (1) the models perform equally well when describing the observed annual peaks; (2) the right tail appears heavier in the Ɓr III and Ɓr XII models leading to larger streamflow predictions when compared to those of G E V ; (3) the G E V predicts upper streamflow limits in 39.1% of the records—these limits have realistic exceedance probabilities based on the other two models; (4) the tail heaviness estimation seems not robust in the G E V case when compared to the Ɓr III and Ɓr XII models and this could challenge G E V ’s reliability in predicting streamflow at large return periods; and, (5) regional variation is observed in the behaviour of flood peaks across different climatic regions of Canada. The findings of this study reveal potential limitations in using the G E V for flood frequency analysis and suggest the Ɓr III and Ɓr XII as consistent alternatives worth exploring.

DOI bib
Climate Extremes and Compound Hazards in a Warming World
Amir AghaKouchak, Felicia Chiang, Laurie S. Huning, Charlotte Love, Iman Mallakpour, Omid Mazdiyasni, Hamed Moftakhari, Simon Michael Papalexiou, Elisa Ragno, Mojtaba Sadegh
Annual Review of Earth and Planetary Sciences, Volume 48, Issue 1

Climate extremes threaten human health, economic stability, and the well-being of natural and built environments (e.g., 2003 European heat wave). As the world continues to warm, climate hazards are expected to increase in frequency and intensity. The impacts of extreme events will also be more severe due to the increased exposure (growing population and development) and vulnerability (aging infrastructure) of human settlements. Climate models attribute part of the projected increases in the intensity and frequency of natural disasters to anthropogenic emissions and changes in land use and land cover. Here, we review the impacts, historical and projected changes,and theoretical research gaps of key extreme events (heat waves, droughts, wildfires, precipitation, and flooding). We also highlight the need to improve our understanding of the dependence between individual and interrelated climate extremes because anthropogenic-induced warming increases the risk of not only individual climate extremes but also compound (co-occurring) and cascading hazards. ▪ Climate hazards are expected to increase in frequency and intensity in a warming world. ▪ Anthropogenic-induced warming increases the risk of compound and cascading hazards. ▪ We need to improve our understanding of causes and drivers of compound and cascading hazards.

DOI bib
Random Fields Simplified: Preserving Marginal Distributions, Correlations, and Intermittency, With Applications From Rainfall to Humidity
Simon Michael Papalexiou, Francesco Serinaldi
Water Resources Research, Volume 56, Issue 2

Nature manifests itself in space and time. The spatiotemporal complexity of processes such as precipitation, temperature, and wind, does not allow purely deterministic modeling. Spatiotemporal random fields have a long history in modeling such processes, and yet a single unified framework offering the flexibility to simulate processes that may differ profoundly does not exist. Here we introduce a blueprint to efficiently simulate spatiotemporal random fields that preserve any marginal distribution, any valid spatiotemporal correlation structure, and intermittency. We suggest a set of parsimonious yet flexible marginal distributions and provide a rule of thumb for their selection. We propose a new and unified approach to construct flexible spatiotemporal correlation structures by combining copulas and survival functions. The versatility of our framework is demonstrated by simulating conceptual cases of intermittent precipitation, double‐bounded relative humidity, and temperature maxima fields. As a real‐word case we simulate daily precipitation fields. In all cases, we reproduce the desired properties. In an era characterized by advances in remote sensing and increasing availability of spatiotemporal data, we deem that this unified approach offers a valuable and easy‐to‐apply tool for modeling complex spatiotemporal processes.

DOI bib
SCDNA: a serially complete precipitation and temperature dataset for North America from 1979 to 2018
Guoqiang Tang, Martyn P. Clark, Andrew J. Newman, Andy Wood, Simon Michael Papalexiou, Vincent Vionnet, Paul H. Whitfield
Earth System Science Data, Volume 12, Issue 4

Abstract. Station-based serially complete datasets (SCDs) of precipitation and temperature observations are important for hydrometeorological studies. Motivated by the lack of serially complete station observations for North America, this study seeks to develop an SCD from 1979 to 2018 from station data. The new SCD for North America (SCDNA) includes daily precipitation, minimum temperature (Tmin⁡), and maximum temperature (Tmax⁡) data for 27 276 stations. Raw meteorological station data were obtained from the Global Historical Climate Network Daily (GHCN-D), the Global Surface Summary of the Day (GSOD), Environment and Climate Change Canada (ECCC), and a compiled station database in Mexico. Stations with at least 8-year-long records were selected, which underwent location correction and were subjected to strict quality control. Outputs from three reanalysis products (ERA5, JRA-55, and MERRA-2) provided auxiliary information to estimate station records. Infilling during the observation period and reconstruction beyond the observation period were accomplished by combining estimates from 16 strategies (variants of quantile mapping, spatial interpolation, and machine learning). A sensitivity experiment was conducted by assuming that 30 % of observations from stations were missing – this enabled independent validation and provided a reference for reconstruction. Quantile mapping and mean value corrections were applied to the final estimates. The median Kling–Gupta efficiency (KGE′) values of the final SCDNA for all stations are 0.90, 0.98, and 0.99 for precipitation, Tmin⁡, and Tmax⁡, respectively. The SCDNA is closer to station observations than the four benchmark gridded products and can be used in applications that require either quality-controlled meteorological station observations or reconstructed long-term estimates for analysis and modeling. The dataset is available at (Tang et al., 2020).

DOI bib
Have satellite precipitation products improved over last two decades? A comprehensive comparison of GPM IMERG with nine satellite and reanalysis datasets
Guoqiang Tang, Martyn P. Clark, Simon Michael Papalexiou, Zhanshan Ma, Yang Hong
Remote Sensing of Environment, Volume 240

Abstract The Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (IMERG) produces the latest generation of satellite precipitation estimates and has been widely used since its release in 2014. IMERG V06 provides global rainfall and snowfall data beginning from 2000. This study comprehensively analyzes the quality of the IMERG product at daily and hourly scales in China from 2000 to 2018 with special attention paid to snowfall estimates. The performance of IMERG is compared with nine satellite and reanalysis products (TRMM 3B42, CMORPH, PERSIANN-CDR, GSMaP, CHIRPS, SM2RAIN, ERA5, ERA-Interim, and MERRA2). Results show that the IMERG product outperforms other datasets, except the Global Satellite Mapping of Precipitation (GSMaP), which uses daily-scale station data to adjust satellite precipitation estimates. The monthly-scale station data adjustment used by IMERG naturally has a limited impact on estimates of precipitation occurrence and intensity at the daily and hourly time scales. The quality of IMERG has improved over time, attributed to the increasing number of passive microwave samples. SM2RAIN, ERA5, and MERRA2 also exhibit increasing accuracy with time that may cause variable performance in climatological studies. Even relying on monthly station data adjustments, IMERG shows good performance in both accuracy metrics at hourly time scales and the representation of diurnal cycles. In contrast, although ERA5 is acceptable at the daily scale, it degrades at the hourly scale due to the limitation in reproducing the peak time, magnitude and variation of diurnal cycles. IMERG underestimates snowfall compared with gauge and reanalysis data. The triple collocation analysis suggests that IMERG snowfall is worse than reanalysis and gauge data, which partly results in the degraded quality of IMERG in cold climates. This study demonstrates new findings on the uncertainties of various precipitation products and identifies potential directions for algorithm improvement. The results of this study will be useful for both developers and users of satellite rainfall products.

DOI bib
Robustness of CMIP6 Historical Global Mean Temperature Simulations: Trends, Long‐Term Persistence, Autocorrelation, and Distributional Shape
Simon Michael Papalexiou, Chandra Rupa Rajulapati, Martyn P. Clark, Flavio Lehner
Earth's Future, Volume 8, Issue 10

Multi-model climate experiments carried out as part of different phases of the Coupled Model Intercomparison Project (CMIP) are crucial to evaluate past and future climate change. The reliability of models' simulations is often gauged by their ability to reproduce the historical climate across many time scales. This study compares the global mean surface air temperature from 29 CMIP6 models with observations from three datasets. We examine (1) warming and cooling rates in five subperiods from 1880 to 2014, (2) autocorrelation and long-term persistence, (3) models' performance based on probabilistic and entropy metrics, and (4) the distributional shape of temperature. All models simulate the observed long-term warming trend from 1880 to 2014. The late twentieth century warming (1975–2014) and the hiatus (1942–1975) are replicated by most models. The post-1998 warming is overestimated in 90% of the simulations. Only six out of 29 models reproduce the observed long-term persistence. All models show differences in distributional shape when compared with observations. Varying performance across metrics reveals the challenge to determine the "best" model. Thus, we argue that models should be selected, based on case-specific metrics, depending on the intended use. Metrics proposed here facilitate a comprehensive assessment for various applications.

DOI bib
How Probable Is Widespread Flooding in the United States?
Manuela Irene Brunner, Simon Michael Papalexiou, Martyn P. Clark, Eric Gilleland
Water Resources Research, Volume 56, Issue 10

Widespread flooding can cause major damages and substantial recovery costs. Still, estimates of how susceptible a region is to widespread flooding are largely missing mainly because of the sparseness of widespread flood events in records. The aim of this study is to assess the seasonal susceptibility of regions in the United States to widespread flooding using a stochastic streamflow generator, which enables simulating a large number of spatially consistent flood events. Furthermore, we ask which factors influence the strength of regional flood susceptibilities. We show that susceptibilities to widespread flooding vary regionally and seasonally. They are highest in regions where catchments show regimes with a strong seasonality, that is, the Pacific Northwest, the Rocky Mountains, and the Northeast. In contrast, they are low in regions where catchments are characterized by a weak seasonality and intermittent regimes such as the Great Plains. Furthermore, susceptibility is found to be the highest in winter and spring when spatial flood dependencies are strongest because of snowmelt contributions and high soil moisture availability. We conclude that regional flood susceptibilities emerge in river basins with catchments sharing similar streamflow and climatic regimes.

DOI bib
PMP and Climate Variability and Change: A Review
José D. Salas, Michael L. Anderson, Simon Michael Papalexiou, Félix Francés
Journal of Hydrologic Engineering, Volume 25, Issue 12

A state-of-the-art review on the probable maximum precipitation (PMP) as it relates to climate variability and change is presented. The review consists of an examination of the current practice and the various developments published in the literature. The focus is on relevant research where the effect of climate dynamics on the PMP are discussed, as well as statistical methods developed for estimating very large extreme precipitation including the PMP. The review includes interpretation of extreme events arising from the climate system, their physical mechanisms, and statistical properties, together with the effect of the uncertainty of several factors determining them, such as atmospheric moisture, its transport into storms and wind, and their future changes. These issues are examined as well as the underlying historical and proxy data. In addition, the procedures and guidelines established by some countries, states, and organizations for estimating the PMP are summarized. In doing so, attention was paid to whether the current guidelines and research published literature take into consideration the effects of the variability and change of climatic processes and the underlying uncertainties.

DOI bib
Assessment of Extremes in Global Precipitation Products: How Reliable Are They?
Chandra Rupa Rajulapati, Simon Michael Papalexiou, Martyn P. Clark, Saman Razavi, Guoqiang Tang, John W. Pomeroy
Journal of Hydrometeorology, Volume 21, Issue 12

Abstract Global gridded precipitation products have proven essential for many applications ranging from hydrological modeling and climate model validation to natural hazard risk assessment. They provide a global picture of how precipitation varies across time and space, specifically in regions where ground-based observations are scarce. While the application of global precipitation products has become widespread, there is limited knowledge on how well these products represent the magnitude and frequency of extreme precipitation—the key features in triggering flood hazards. Here, five global precipitation datasets (MSWEP, CFSR, CPC, PERSIANN-CDR, and WFDEI) are compared to each other and to surface observations. The spatial variability of relatively high precipitation events (tail heaviness) and the resulting discrepancy among datasets in the predicted precipitation return levels were evaluated for the time period 1979–2017. The analysis shows that 1) these products do not provide a consistent representation of the behavior of extremes as quantified by the tail heaviness, 2) there is strong spatial variability in the tail index, 3) the spatial patterns of the tail heaviness generally match the Köppen–Geiger climate classification, and 4) the predicted return levels for 100 and 1000 years differ significantly among the gridded products. More generally, our findings reveal shortcomings of global precipitation products in representing extremes and highlight that there is no single global product that performs best for all regions and climates.


DOI bib
Assessment of Water Cycle Intensification Over Land using a Multisource Global Gridded Precipitation DataSet
Yannis Markonis, Simon Michael Papalexiou, Marta Martínková, Martin Hanel
Journal of Geophysical Research: Atmospheres, Volume 124, Issue 21

The change in the empirical distribution of future global precipitation is one of the major implications regarding the intensification of global water cycle. Heavier events are expected to occur more often, compensated by decline of light precipitation and/or number of wet days. Here, we scrutinize a new global, high‐resolution precipitation data set, namely, the Multi‐Source Weighted‐Ensemble Precipitation v2.0, to determine changes in the precipitation distribution over land during 1979–2016. To this end, the fluctuations of wet days precipitation quantiles on an annual basis and their interplay with annual totals and number of wet days were investigated. The results show increase in total precipitation, number of wet days, and heavy events over land, as suggested by the intensification hypothesis. However, the decline in light/medium precipitation or wet days was weaker than expected, debating the “compensation” mechanism.

DOI bib
Tails of extremes: Advancing a graphical method and harnessing big data to assess precipitation extremes
Sofia D. Nerantzaki, Simon Michael Papalexiou
Advances in Water Resources, Volume 134

Abstract Extremes are rare and unexpected. This limits observations and constrains our knowledge on their predictability and behavior. Graphical tools are among the many methods developed to study extremes. A major weakness is that they rely on visual-inspection inferences which are subjective and make applications to large datasets time-consuming and impractical. Here, we advance a graphical method, the so-called Mean Excess Function (MEF), into an algorithmic procedure. MEF investigates the mean value of a variable over threshold, and thus, focuses on extremes. We formulate precise and easy-to-apply statistical tests, based on the MEF, to assess if observed data can be described by exponential or heavier tails. As a real-world example, we apply our method in 21,348 daily precipitation records from all over the globe. Results show that the exponential-tail hypothesis is rejected in 75.8% of the records indicating that heavy-tail distributions (alternative hypothesis) can better describe rainfall extremes. The spatial variation of the tail heaviness reveals that heavy tails prevail in regions of Australia and Eurasia, with a “hot spot” found in central Russia and Kazakhstan. We deem this study offers a new diagnostic tool in assessing the behavior of extremes, easy to apply in large databases, and for any variable of interest. Our results on precipitation extremes reinforce past findings and further highlight that exponential tails should be used with caution.