A Random Forest in the Great Lakes: Exploring Nutrient Water Quality in the Laurentian Great Lakes Watersheds

John Dony


Abstract
A data driven approach was used in this study to investigate the drivers of nutrient water quality across the Laurentian Great Lakes drainage basin. Monitored time series of nutrient water quality and discharge were modelled using a dynamic regression-based model. Random forest machine learning was used as a framework to assess drivers of nutrient water quality, using mean annual flow-weighted concentrations (FWCs) and ratios calculated from modelled water quality, combined with spatial factors from monitored watersheds. Analysis revealed that landscape variables of developed land use, tile drained land, and wetland area played important roles in controlling nitrate and nitrite (DIN) and soluble reactive phosphorus (SRP) FWCs, while soil type and wetland area was important for controlling particulate phosphorus (PP) FWCs. Fertilizer and manure practices were important controls in nutrient ratios of SRP:Total Phosphorus (TP), and DIN:TP, with developed land use, manure application, and tile drained land important for the former, and developed land use and manure application (vs synthetic fertilizer application) important for the latter. Plots of feature contribution were generated to isolate the effect that spatial variables had in machine learning models and revealed underlying behaviour of important controls in driving nutrient water quality across the basin. Random forest models were further developed to predict FWCs and ratios of nutrients across all watersheds within the Great Lakes drainage basin. Modelled results revealed hot spots of high DIN, SRP and PP in the watersheds along the southeastern shores of Lake Huron, on the eastern watersheds of the Huron-Erie corridor, and in the southwestern watersheds of Lake Erie. High SRP:TP ratio hot spots were seen in watersheds along the southeastern shores of Lake Huron and along the eastern side of the Huron-Erie corridor. Hot spots of low DIN:TP ratios with high nutrient export were seen in the southwestern watersheds of Lake Erie, which has implications for harmful algal growth. Nutrient ratios across the Great Lakes watersheds compared similarly to other heavily human impacted catchments of the Baltic Sea and western Europe. Annual basin loads of DIN, SRP, and TP were estimated from random forest models for each year from 2000-2016. Calculated annual nutrient loadings of SRP and TP were consistent with other published values of Great Lakes watershed estimates and revealed highest loadings during 2011 when the largest recorded algal bloom in Lake Erie occurred to date. Overall, this data-driven analysis of nutrient water quality reinforces and refines our process understanding of nutrient pollution dynamics across the Great Lakes drainage basin.
Cite:
John Dony. 2020. A Random Forest in the Great Lakes: Exploring Nutrient Water Quality in the Laurentian Great Lakes Watersheds. Civil and Environmental Engineering, Master Thesis.
Copy Citation: