## An analysis of the sample size requirements for acceptable statistical power in water quality monitoring for improvement detection

Christopher Wellen, Philippe Van Cappellen, Larissa Gospodyn, Joseph P. Thomas, Mohamed N. Mohamed

##### Abstract

• Few guidelines on sample size requirements for water quality improvement in streams. • Sample sizes for acceptable statistical power were estimated for common indicators. • 20% reductions of pollutant indicators required decades to centuries of data. • 40% reductions of pollutant indicators varied significantly by site. • 80% reductions required 5 years or less of data. Many water quality managers seek to demonstrate reductions in pollutants after a remedial program or policy change of some sort is implemented, but there is little information in the literature to help guide the extent of water quality sampling that is required to be confident that a change has occurred. Statistical power refers to the likelihood of avoiding a Type II error in hypothesis testing. It is critical to examine statistical power levels to ensure results are not unduly influenced by insufficient quantity of data. This study presents the first published record, to the best of our knowledge, on sample size requirements to achieve acceptable levels of statistical power in hypothesis testing of annual water quality (nutrients) in streams. We examined 13 temperate agricultural watersheds spanning a gradient of size from 11 to 16,000 km 2 using data synthesized from long-term flow and water quality records. We found that achieving commonly accepted levels of statistical power (0.8) after reductions of 20% in load or flow-weighted mean concentration (FWMC) required an inordinate quantity of data (50–250 years for load, 10–120 years for FWMC), while achieving statistical power of 0.8 after reductions of 80% of load or FWMC required very little data (2–4 years for FWMC, 2–7 years for load). Load reductions of 40% required a range of 8–50 years of data depending on analyte, while FWMC reductions of 40% required 3–10 years of total phosphorus (TP) data, 5–25 years for soluble reactive phosphorus (SRP), and 2–6 years for nitrate (NO 3 ). We examined relationships among times to achieve statistical power and a number of common landscape descriptors (discharge, baseflow index, basin size, concentration-discharge slope) and found no discernable relationships for either TP or SRP, whereas catchments with higher baseflow indices were found to have lower data requirements for achieving statistical power of 0.8 for NO 3 . We also show through subsampling experiments that higher frequency sampling tended to reduce data requirements to achieve acceptable statistical power, though these gains diminish as the sample frequency increases. The information presented will help those tasked with watershed monitoring to design appropriate sampling regimes to ensure adequate data are obtained to detect change.- Cite:
- Christopher Wellen, Philippe Van Cappellen, Larissa Gospodyn, Joseph P. Thomas, and Mohamed N. Mohamed. 2020. An analysis of the sample size requirements for acceptable statistical power in water quality monitoring for improvement detection.
*Ecological Indicators, Volume 118*, 118:106684.