SWMPr: An R Package for Retrieving, Organizing, and Analyzing Environmental Data for Estuaries

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Authors Marcus W. Beck
Journal/Conference Name {The R Journal
Paper Category
Paper Abstract The System-Wide Monitoring Program (SWMP) was implemented in 1995 by the US National Estuarine Research Reserve System. This program has provided two decades of continuous monitoring data at over 140 fixed stations in 28 estuaries. However, the increasing quantity of data provided by the monitoring network has complicated broad-scale comparisons between systems and, in some cases, prevented simple trend analysis of water quality parameters at individual sites. This article describes the SWMPr package that provides several functions that facilitate data retrieval, organization, and analysis of time series data in the reserve estuaries. Previously unavailable functions for estuaries are also provided to estimate rates of ecosystem metabolism using the open-water method. The SWMPr package has facilitated a cross-reserve comparison of water quality trends and links quantitative information with analysis tools that have use for more generic applications to environmental time series. Introduction The development of low-cost, automated sensors that collect data in near real time has enabled a proliferation of standardized environmental monitoring programs (Glasgow et al., 2004; Fries et al., 2008). An invaluable source of monitoring data for coastal regions in the United States is provided by the National Estuarine Research Reserve System (NERRS, http://www.nerrs.noaa.gov/). This network of 28 estuary reserves was created to address long-term research, monitoring, education, and stewardship goals in support of coastal management. The System-Wide Monitoring Program (SWMP) was implemented in 1995 at over 140 stations across the reserves to provide a robust, longterm monitoring system for water quality, weather, and land-use/habitat change. Environmental researchers have expressed a need for quantitative analysis tools to evaluate trends in water quality time series given the quantity and quality of data provided by SWMP (System-Wide Monitoring Program Data Analysis Training, 2014). This article describes the SWMPr package that was developed for estuary monitoring data from the SWMP. Functions provided by SWMPr address many common issues working with large datasets created from automated sensor networks, such as data pre-processing to remove unwanted information, combining data from different sources, and exploratory analyses to identify parameters of interest. Additionally, web applications derived from SWMPr and shiny illustrate potential applications using the functions in this package. The software is provided specifically for use with NERRS data, although many of the applications are relevant for addressing common challenges working with large environmental datasets. Overview of the SWMP network The SWMPr package was developed for the continuous abiotic monitoring network that represents a majority of SWMP data and, consequently, the most challenging to evaluate. Abiotic elements monitored at each reserve include water quality (water temperature, specific conductivity, salinity, dissolved oxygen concentration, dissolved oxygen saturation, depth, pH, turbidity, chlorophyll fluorescence), weather (air temperature, relative humidity, barometric pressure, wind speed, wind direction, photosynthetically active radiation, precipitation), and nutrient data (orthophosphate, ammonium, nitrite, nitrate, nitrite + nitrate, chlorophyll a). Each of the 28 estuary reserves has no fewer than four water quality stations and one weather station at fixed locations. Water quality and weather data are collected at 15 minute intervals, whereas nutrient data are collected monthly at each water quality station. Data are made available through the Centralized Data Management Office (CDMO) web portal (http://cdmo.baruch.sc.edu/), where quality assurance/quality control (QAQC) measures are used to screen the information for accuracy and reliability. The final data include timestamped observations with relevant QAQC flags. At the time of writing, the CDMO web portal provides over 60 million water quality, weather, and nutrient records that have been authenticated through systematic QAQC procedures. Records for each station are identified by a seven or eight character name that specifies the reserve, station, and The R Journal Vol. 8/1, Aug. 2016 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 220 Table 1: Retrieval functions available from the SWMPr package. Full documentation for each function is in the help file (e.g., execute ?all_params for individual functions or help.search(‘retrieve’, package = ‘SWMPr’) for all). Function Description all_params Retrieve records starting with the most recent at a given station, all parameters. Wrapper to exportAllParamsXMLNew function on web services. all_params_dtrng Retrieve records of all parameters within a given date range for a station. Optional argument for a single parameter. Wrapper to exportAllParamsDateRangeXMLNew. import_local Import files from a local path. The files must be in a specific format, such as those returned from the CDMO using the zip downloads option. single_param Retrieve records for a single parameter starting with the most recent at a given station. Wrapper to exportSingleParamXMLNew function on web services. site_codes Get metadata for all stations. Wrapper to exportStationCodesXMLNew function on web services. site_codes_ind Get metadata for all stations at a single site. Wrapper to NERRFilterStationCodesXMLNew function on web services. parameter type. For example, ‘apaebwq’ is the water quality identifier (‘wq’) for the East Bay station (‘eb’) at the Apalachicola reserve (‘apa’). Similarly, a suffix of ‘met’ or ‘nut’ specifies the weather (meteorological) or nutrient stations. All reserve names, stations, and date ranges for each parameter type can be viewed on the CDMO website. Alternatively, the site_codes (all sites) or site_codes_ind (single site) functions provided by SWMPr can be used. As noted below, an IP address must be registered with CDMO before using the data retrieval functions in SWMPr. Web services are provided by CDMO for direct access to SWMP data through http requests, in addition to standard graphical user interface options for selecting data. The data retrieval functions in SWMPr are simple calls to the existing retrieval functions on CDMO web services, as explained below. Structure of the SWMPr package SWMPr functions are categorized by one of three steps in the data workflow: retrieving, organizing, and analyzing. Functions for retrieving are used to import the data into R as a "swmpr" object class. Functions for organizing and analyzing the data provide methods for working with a "swmpr" object. The following describes the package structure, beginning with the retrieval functions, a description of the "swmpr" object returned after retrieval, and, finally, the organizing and analyzing functions. Data retrieval SWMPr can import data into R through direct download from the CDMO or by importing local data that was previously downloaded (Table 1). The IP address for the computer making the request must be registered if the first approach is used (see CDMO website). The site_codes or site_codes_ind functions can be used to view site metadata. # retrieve metadata for all sites site_codes() # retrieve metadata for a single site site_codes_ind('apa') Retrieval functions to import data directly into R from the CDMO include all_params, all_params_dtrng, and single_param. Due to rate limitations on the CDMO server, the retrieval functions return a limited number of records with each request. However, the SWMPr functions use the native CDMO web services iteratively (i.e., within a loop) to obtain all requested records. Download time can be excessive for longer time series. The R Journal Vol. 8/1, Aug. 2016 ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 221 # all parameters for a station, most recent all_params('hudscwq') # get all parameters within a date range all_params_dtrng('hudscwq', dtrng = c('09/01/2013', '10/01/2013')) # get single parameter within a date range all_params_dtrng('hudscwq', dtrng = c('09/01/2013', '10/01/2013'), param = 'do_mgl') # single parameter for a station, most recent single_param('hudscwq', param = 'do_mgl') The second approach for data retrieval is to use the import_local function to import data into R after downloading from CDMO. This approach is most appropriate for large data requests. The import_local function is designed for data from the zip downloads feature in the advanced query section of the CDMO website. The zip downloads feature can be used to obtain a large number of records from multiple stations in one request. The downloaded data will be in a compressed folder that includes multiple .csv files by year for a given data type (e.g., apacpwq2002.csv, apacpwq2003.csv, apacpnut2002.csv, etc.). The import_local function can be used to import files directly from the zipped folder. The "swmpr" object class All data retrieval functions return a "swmpr" object that includes relevant data and several attributes describing the dataset. The data include a datetimestamp column in the timezone for a station and additional parameters for the data type (weather, nutrients, or water quality). Corresponding QAQC columns for each parameter are also returned if provided by the initial data request. The following shows an example of the raw data imported using all_params. # import all paramaters for the station # three most recent records exdat <all_params('apadbwq', Max = 3, trace = F) exdat ## datetimestamp temp f_temp spcond f_spcond sal f_sal do_pct ## 1 2015-11-03 11:15:00 26 0 45 0 29 0 78 ## 2 2015-11-03 11:30:00 26 0 46 0 30 0 76 ## 3 2015-11-03 11:45:00 26 0 46 0 30 0 75 ## f_do_pct do_mgl f_do_mgl depth f_depth ph f_ph turb f_turb chlfluor ## 1 0 5 0 2 0 8 0 2 0 NA ## 2 0 5 0 2 0 8 0 5 0 NA ## 3 0 5 0 2 0 8 0 5 0 NA ## f_chlfluor level f_level cdepth clevel f_cdepth f_clevel ## 1 -2 NA -1 2 NA 3 ## 2 -2 NA -1 2 NA 3 ## 3 -2 NA -1 2
Date of publication 2016
Code Programming Language R

Copyright Researcher 2022