GeoNetParallel analyzes stream networks to detect statistically significant changes between background and potentially impacted sites.
Agarwal, A., Wen, T., Chen, A., Zhang, A.Y., Niu, X., Zhan, X., Xue, L. and Brantley, S.L., 2020. Assessing contamination of stream networks near shale gas development using a new geospatial tool. Environmental Science & Technology, 54(14), pp.8632-8639. Link to paper
This package was last tested in April 2022. Testing environment is listed below:
Required Package | Version |
---|---|
R | 4.0.5 |
tidyverse | 1.3.0 |
geoshpere | 1.5-10 |
network | 1.16.1 |
igraph | 1.2.6 |
mapdata | 2.3.0 |
intergraph | 2.0-2 |
sna | 2.6 |
maps | 3.3.0 |
GGally | 2.1.1 |
MASS | 7.3-53.1 |
foreach | 1.5.1 |
doParallel | 1.0.16 |
data.table | 1.14.0 |
- Make sure all of above dependencies are installed before running the code
- Put your data under the data folder. There are three main files required by GeoNet:
- Shape file for the stream network (shape.RData).
- Analyte location and concentration (analyte_raw.csv)
The analyte csv file should atleast have latitude, longitude, date and concentrations header
Sample anaylte file:
Also rename all the headers as date, lat, lon and conc
date, latitude, longitude, Specific Conductance (conc) 6/8/2000 8:00, 40.3747167, -78.8516, 1660 1/2/2001 9:45, 40.3747167, -78.8516, 1700
- Polluter locations (polluter_raw.csv)
The polluter files should atleast have latitude, longitude and date information.
Sample polluter file:
Also rename the headers as lat, lon and date
ID, latitude, longitude, date 2, 39.8284, -80.323389, 8/26/14 6, 41.560381, -76.263944, 8/4/14
- Run the code/Cl_spill_clust.R.
- Output files will be generated in the inference folder
All output files are generated in the inference folder. The most important file to check is the polluter_test_matrix.RData. It has the statistical inference test results for each polluter provided. To summarize it contains the upstream and downstream concentration values and t-test and wilcoxon test results to denote whether the values differ.
- Make sure you provide the datasets in the exact format as provied in the example. including the name and order of the columns.
- Make sure you update the file_path variable to point to absolute path of the base directory of this repository on your computer.
- If the dataset is large try running each section of the code seperately and check for the intermediate output variables values for NAs
- Refer the data flow diagram for the expected size of the output dataframes after each step
For more information about the code check out https://drive.google.com/file/d/1AFr1qGLGhAfZwWw8E_BCVhmYF6ohJmus/view?usp=sharing
For any questions about the source codes or example datasets, please reach out to Dr. Tao Wen at Syracuse University (https://jaywen.com/)