Current State of Microplastic Pollution Research Data: Trends in Availability and Sources of Open Data
Dominique G. Roche,
Ebenezer S. Nyadjro,
Leah M. Thornton Hampton,
Sherri A. Mason,
John F. Honek,
Andy M. Booth,
Rodney D. L. Smith,
Philippe Van Cappellen
Frontiers in Environmental Science, Volume 10
The rapid growth in microplastic pollution research is influencing funding priorities, environmental policy, and public perceptions of risks to water quality and environmental and human health. Ensuring that environmental microplastics research data are findable, accessible, interoperable, and reusable (FAIR) is essential to inform policy and mitigation strategies. We present a bibliographic analysis of data sharing practices in the environmental microplastics research community, highlighting the state of openness of microplastics data. A stratified (by year) random subset of 785 of 6,608 microplastics articles indexed in Web of Science indicates that, since 2006, less than a third (28.5%) contained a data sharing statement. These statements further show that most often, the data were provided in the articles’ supplementary material (38.8%) and only 13.8% via a data repository. Of the 279 microplastics datasets found in online data repositories, 20.4% presented only metadata with access to the data requiring additional approval. Although increasing, the rate of microplastic data sharing still lags behind that of publication of peer-reviewed articles on environmental microplastics. About a quarter of the repository data originated from North America (12.8%) and Europe (13.4%). Marine and estuarine environments are the most frequently sampled systems (26.2%); sediments (18.8%) and water (15.3%) are the predominant media. Of the available datasets accessible, 15.4% and 18.2% do not have adequate metadata to determine the sampling location and media type, respectively. We discuss five recommendations to strengthen data sharing practices in the environmental microplastic research community.
Abstract. Lakes are key ecosystems within the global biogeosphere. However, the environmental controls on the biological productivity of lakes – including surface temperature, ice phenology, nutrient loads, and mixing regime – are increasingly altered by climate warming and land-use changes. To better characterize global trends in lake productivity, we assembled a dataset on chlorophyll-a concentrations as well as associated water quality parameters and surface solar radiation for temperate and cold-temperate lakes experiencing seasonal ice cover. We developed a method to identify periods of rapid net increase of in situ chlorophyll-a concentrations from time series data and applied it to data collected between 1964 and 2019 across 343 lakes located north of 40∘. The data show that the spring chlorophyll-a increase periods have been occurring earlier in the year, potentially extending the growing season and increasing the annual productivity of northern lakes. The dataset on chlorophyll-a increase rates and timing can be used to analyze trends and patterns in lake productivity across the northern hemisphere or at smaller, regional scales. We illustrate some trends extracted from the dataset and encourage other researchers to use the open dataset for their own research questions. The PCI dataset and additional data files can be openly accessed at the Federated Research Data Repository at https://doi.org/10.20383/102.0488 (Adams et al., 2021).
Hydrological Perspectives on Integrated, Coordinated, Open, Networked (ICON) Science
Bharat Sharma Acharya,
Robert T. Hensley,
Pamela L. Sullivan,
Earth and Space Science, Volume 9, Issue 4
Abstract Hydrologic sciences depend on data monitoring, analyses, and simulations of hydrologic processes to ensure safe, sufficient, and equal water distribution. These hydrologic data come from but are not limited to primary (lab, plot, and field experiments) and secondary sources (remote sensing, UAVs, hydrologic models) that typically follow FAIR Principles (Findable, Accessible, Interoperable, and Reusable: ( go-fair.org )). Easy availability of FAIR data has become possible because the hydrology‐oriented organizations have pushed the community to increase coordination of the protocols for generating data and sharing model platforms. In addition, networking at all levels has emerged with an invigorated effort to activate community science efforts that complement conventional data collection methods. However, it has become difficult to decipher various complex hydrologic processes with increasing data. Machine learning, a branch of artificial intelligence, provide more accurate and faster alternatives to better understand different hydrological processes. The Integrated, Coordinated, Open, Networked (ICON) framework provides a pathway for water users to include and respect diversity, equity, and inclusivity. In addition, ICONs support the integration of peoples with historically marginalized identities into this professional discipline of water sciences. This article comprises three independent commentaries about the state of ICON principles in hydrology and discusses the opportunities and challenges of adopting them.
Ten best practices to strengthen stewardship and sharing of water science data in Canada
K. A. Dukacz,
Gopal Chandra Saha,
Jason J. Venkiteswaran,
Homa Kheyrollah Pour,
Brent B. Wolfe,
Sean K. Carey,
John W. Pomeroy,
C. M. DeBeer,
J. M. Waddington,
Philippe Van Cappellen,
Hydrological Processes, Volume 35, Issue 11
Water science data are a valuable asset that both underpins the original research project and bolsters new research questions, particularly in view of the increasingly complex water issues facing Canada and the world. Whilst there is general support for making data more broadly accessible, and a number of water science journals and funding agencies have adopted policies that require researchers to share data in accordance with the FAIR (Findable, Accessible, Interoperable, Reusable) principles, there are still questions about effective management of data to protect their usefulness over time. Incorporating data management practices and standards at the outset of a water science research project will enable researchers to efficiently locate, analyze and use data throughout the project lifecycle, and will ensure the data maintain their value after the project has ended. Here, some common misconceptions about data management are highlighted, along with insights and practical advice to assist established and early career water science researchers as they integrate data management best practices and tools into their research. Freely available tools and training opportunities made available in Canada through Global Water Futures, the Portage Network, Gordon Foundation's DataStream, Compute Canada, and university libraries, among others are compiled. These include webinars, training videos, and individual support for the water science community that together enable researchers to protect their data assets and meet the expectations of journals and funders. The perspectives shared here have been developed as part of the Global Water Futures programme's efforts to improve data management and promote the use of common data practices and standards in the context of water science in Canada. Ten best practices are proposed that may be broadly applicable to other disciplines in the natural sciences and can be adopted and adapted globally. This article is protected by copyright. All rights reserved.
The acceleration of climate change and its impact highlight the need for long-term reliable climate data at high spatiotemporal resolution to answer key science questions in cold regions hydrology. Prior to the digital age, climate records were archived on paper. For example, from the 1950s to the 1990s, solar radiation data from recording stations worldwide were published in booklets by the former Union of Soviet Socialist Republics (USSR) Hydrometeorological Service. As a result, the data are not easily accessible by most researchers. The overarching aim of this research is to develop techniques to convert paper-based climate records into a machine-readable format to support environmental research in cold regions. This study compares the performance of a proprietary optical character recognition (OCR) service with an open-source OCR tool for digitizing hydrometeorological data. We built a digitization pipeline combining different image preprocessing techniques, semantic segmentation, and an open-source OCR engine for extracting data and metadata recorded in the scanned documents. Each page contains blocks of text with station names and tables containing the climate data. The process begins with image preprocessing to reduce noise and to improve quality before the page content is segmented to detect tables and finally run through an OCR engine for text extraction. We outline the digitization process and report on initial results, including different segmentation approaches, preprocessing image algorithms, and OCR techniques to ensure accurate extraction and organization of relevant metadata from thousands of scanned climate records. We evaluated the performance of Tesseract OCR and ABBYY FineReader on text extraction. We find that although ABBY FineReader has better accuracy on the sample data, our custom extraction pipeline using Tesseract is efficient and scalable because it is flexible and allows for more customization.