Skip to content

RyanOlivierM/EAR609-Final-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EAR609 Environmental Data Science Final Project Repository:

Using Network Analysis of Benthic Macroinvertebrates to Unravel the Effect of Oil and Gas Production on River Ecosystem Health

12/15/2024

Ryan Olivier-Meehan (ryanpatom@gmail.com)

Introduction-

Advancements in oil/gas drilling (OGD) technology, specifically the development of horizontal drilling and hydraulic fracturing, have made it profitable to exploit unconventional hydrocarbon reservoirs like shale beds. This has reinvigorated domestic OGD, as the USA is home to several expansive shale formations. The largest is the Marcellus Shale Formation (MSF), spanning 6 northeastern states and estimated to contain 410 trillion cubic feet of shale gas, enough energy to satisfy domestic consumption for hundreds of years (Considine, 2024). This speculation has contributed to a significant increase in shale gas plays into the MSF over the last decade. This boom has been especially impactful to the state of Pennsylvania, where unconventional oil/gas development (UOGD) output has increased 2,750% from 2006 to 2017 (U.S. Energy Information Administration, 2024). The boom has prompted investigation into the environmental impacts of UOGD, particularly the act of hydraulic fracturing, or fracking. Fracking is done to stimulate extraction when hydrocarbons are trapped within the matrix of the source rock. Fluid is pumped at high pressure into the well to create and enlarge fractures within the rock and encourage the flow of gas. This fluid is a mixture of water, sand, and other added chemicals (Mojahan, 2012). This introduction of potential contaminants into the subsurface has elicited concerns regarding water quality in communities with a high density of UOGD. Recent literature has suggested a weak negative relationship between UOGD density and water quality (Shaheen et al, 2024; Trexler et al, 2014; Wen et al, 2018).

This study evaluates a dataset of benthic macroinvertebrate (BMI) samples collected by the Pennsylvania Department of Environmental Protection (PADEP) between 1993-2021 as part of a statewide bioassessment of wadeable and semi-wadeable streams within the boundaries of Pennsylvania to determine if proximity to oil/gas drilling is associated with lower stream health. Benthic macroinvertebrates are small aquatic animals that spend most of their life at the bottom of a river. They are well established proxies for water quality because they are immobile, easy to collect and identify, and have unique tolerances to pollution (Looking Below the Surface, n.d.; Shull & McGarrell, 2018). Because their reaction to stress is predictable and they are constantly exposed to local conditions, their presence or absence can suggest the cumulative effect of pollution on that watershed (What Is the National Rivers and Streams Assessment? | US EPA, 2023). Generally, if a watershed can support a healthy community of taxa, it is likely that the chemical and physical conditions are within a healthy range as well. With their BMI samples, the PADEP aggregated a variety of ecological metrics into a single index of biotic integrity, normalized to a range of 0-100.

Building on this previous research that suggests a relationship between OGD presence and decreased river health, this study uses a framework for creating watershed-scale co-occurrence networks to further analyze the integrity of BMI communities by measuring the topology of the constructed co-occurrence networks (Morueta-Holme et al., 2016; Simons et al., 2019). Using this framework, the study addresses the research question: is there a difference in the topological measures of networks constructed from samples taken proximal / distal to drilling? This study aims to quantify if significant changes to BMI assemblage structure occur once in the proximity of oil/gas development. Our hypotheses regarding the changes we expect to see to assemblage structure, as well as the ecological relevance of these topological measures, is described in the table below.

Capture

Data and methods-

The data download and preprocessing, prior to network creation, is done in Python 3.0. Using the following packages: pandas, geopandas, matplotlib, numpy, and seaborn. The data used for this study was collected and published by the PADEP. Datasets of macroinvertebrate and oil/gas well data were downloaded from the PADEP public data library. Well data contains information such as location, date started, status (active or not) and type (unconventional vs conventional). The macroinvertebrate dataset has two parts. The first sheet is a database of all 15,108 BMI sampling stations distributed across PA. Each has a coordinate location and an index of biotic integrity score, calculated based on the BMI sampled at that location. The second sheet contains the count of each unique taxa sampled at each station. This is in long format, so unique stations appear more than once. Less than 5% of all sampling stations are classified as limestone or multi habitat to reflect samples with unique geologic or environmental conditions. They were remvoed from the study so that we focus only on freestone streams). In this study ther are two sizes of stream. Small wadeable streams and large semi-wadeable streams. The difference is that the average drainage area of a large stream is 192 SqKm, 50x that of small streams (4 SqKm) and so the current was too strong to wade into the middle of these larger streams to take samples across the entire width. Because the sampling methods were different, the PADEP computed separate IBI scores for small and large freestone streams. Because of the differences in catchment size, sampling method, and IBI calculation, large and small stream samples were separated to see if the impact of OGD, whether UOGD or COGD, could be seen at different spatial resolutions. Small streams were evaluated at the HUC10 level while large streams were evaluated at the HUC8 level. Furthermore, we identified our Area of Interest (AOI) as the spatial extent of the MSF. Therefore we only looked at samples that fell within the two main physiographic regions that overlay the Marcelus, the Central Lowlands and Appalachian Plateau. To summarize, this study investigates the effect of proximal OGD on freestone streams within The PADEP classifies some streams as Multihabitat or Limestone to reflect areas with unique geology or environmental conditions. In total these streams comprise less than 5% of the samples, so they were removed and the study focuses exclusively on freestone streams. Similarly, the Area of Interest (AOI) is defined as the spatial extent of the Marcellus Shale. In total this study looks at 1910 samples from large streams and 6398 samples from small streams from freestone streams overlaying the Marcellus Shale Formation.

Small streams were labeled as proximal to conventional and/or unconventional oil/gas development by using the sampling coordinates as the centroid of a 3km buffer, a method used in previous research on the topic (Shaheen et al., 2024). This polygon ca then be used with an sjoin with the points of shapefile of well locations to determine whether there were any active wells within 3 km of the sampling location that were started prior to PADEP sampling. Stations were assigned a boolean value of yes/no for drilling presence, regardless the total number of wells within 3 km for conventional and unconventional wells. For large streams, because the catchments were so much greater and more variable compared to small streams, data was processed via the USGS’s StreamStats batch processor to delineate the drainage area for each station. This involved breaking the ~2500 coordinates into 250 (the max allowed to process at once) point chunks which could be fed one at a time to the StreamStats online Batch Processor. Unfortunately, if a single point in a chunk failed to delineate, then the entire chunk would fail without revealing which point caused the failure. Because of this, approximately 500 large stream data points were lost because their batches failed to process correctly (and I had to get this done to leave for AGU). The processed chunks were shapefiles, in each row the point had been replaced with the polygon representing the area that drains to that point. The chunks then ha to be put back togther. Apart from this, the process was identical to that of the small streams in that streams were labeled based on whether OGD was present in the drainage area. In the end, large and small stream samples datasets added two columns: a boolean for conventional presence and another for unconventional presence. These station variables (U_bool, C_bool, HUC(8/10), and large/small IBI) are merged to the long format taxa count dataset so that the dataset is ready to be used to create networks. All BMI samples can be arranged into one of four groups- neither present, COGD present, UOGD present, and both present.

Network construction and statistical analysis are conducted using R version 4.4.2 using the netassoc, vegan, SjPlot, lme4, and igraph package, For each watershed—HUC8 for large streams and HUC10 for small streams—networks are built using a random subset of 10 samples, provided the group contains at least 15 samples. This is because networks are generated for the same watershed groups 50 times over, drawing from the same population of samples within each watershed. The process of selecting 10 samples out of 15 allows for numerous combinations, resulting in a variety of potential network structures. From those 10 samples, the count data is converted to presence/absence data and a species x site matrix is constructed. This is transformed into a species x species matrix, which shows how often each taxa co occurs with each other. Preserving the row and sum columns of the original species x site matrix, 1000 null models are constructed by randomly scrambling the numbers while preserving the row and column sums. So that each taxa still shows up the same amount of times and each station still contains the same amount of taxa but the values are totally random. We can then transform these into null species x species matrices. This gives us a distribution of how often we could expect each species to co-occur by random, to which we can compare how many times the observed species truly occurred to determine if it's statistically significant. The weights of the significant positive relationships are preserved and normalized with the mean and standard deviation of the null co-occurrence distribution. From this matrix of significant positive associations we can construct our network, in which the nodes are unique taxa and the edges (the connections between nodes) are the significant associations. The length and width of the edge line denotes the strength of the relationship. The topology of these networks can be measured in several ways, as described above in the hypotheses table. These measures were calculated and preserved, along with the mean IBI score between the 10 sampled stations, the mean pollution tolerance value of the taxa in the final network, the HUC code, and the boolean values denoting the presence/absence of OGD. This network construction runs for each valid group in each watershed in the AOI 50 times over, so that there are 1900 large stream networks and 6800 small stream networks to be used for analysis.

image image

Figures

Four figures were created to supplement the statistical results.

  1. The first is a series of boxplots showing the distribution of IBI scores for subsets of the large and small stream samples in each of the 4 OGD groups. This is the prelimary data exploration that suggests an assocation between OGD proximity and lower IBI health that justifies looking deeper at the networks

image

  1. A plot of the HUC10 and HUC8 watersheds with the state boundaries of PA. HUC10 polygons are shaded darker green to denote higher average IBI scores. locations of small (darkred) and large (red) stream sampling locations are also plotted on this map and the entire thing is overlain by the spatial extent of the Marcellus Shale to show the AOI

image

  1. two plots of PA's HUC10 watersheds, shaded in various shades of blue (for conventional) or orange (for unconventional) to denote higher counts of wellpads within that watershed. The various shades were decided by the 50, 75, 90 and 95 percentiles of drilling numbers

image

image

Results

An evaluation of the statistical results. We did two things in this section. First I evaluated whether the topological measures were able to predict the mean IBI score for the stations used to make that network. and second I evaluated topological measure distributions for all OGD combination groups.

  1. Using the network size, connectance, and co-occurence weight as predictors for small stream IBI, I created a linear model that could account for 63% of the variance in small stream IBI. For large streams, using network size, connectance, and modularity we can predict 74% of the variance in large stream IBI. None of these predictors were colinear with eachother in either small or large streams. Given that the topological measures of metrics are reflective of the streams overall biotic integrity, we go ahead to determine how topological measure distributions change across the different OGD combinations.

  2. To explore the impact of UOGD and COGD presence, we use linear mixed effect regression so that each watershed can begin with a random intercept. with lmer we assume that different watersheds have different baslines for topo measures and IBI scores, we can then look at the differences between OGD-proximal vs OGD-distal networks and get the average change across all watersheds. In our mixed effect modeling we use U and C booleans as predictors for mean stream IBI, mean PTV, Network Size, connectance, co-occurence strength, modularity, average node degree, and mean shortest path.

For small streams, UOGD presence had a significantly negative impact on mean IBI, and mean shortest path and co-occurence strength and a positive impact on node degree, connectance, PTV and network size. COGD presence had a significantly positive effect on stream IBI, PTV, network size, and mean shortest path. image

In large streams UOGD had a significantly negative effect on mean IBI, network size, mean node degree. and co-occurence weight. It has a positive effect on mean shortest path, and PTV. COGD has a negative effect on IBI, network and network size, but a positive effect on PTV, connectance, modularity and co-occurence strength.

image

Discussion

There is some disagreement in the models and some are also unstable, returning non-normally distributed residuals. However overall different trends appear in networks at different spatial resolutions. In small streams when in the prescence of UOGD, networks get larger and more connected, while the average pollution tolerance of taxa also increases. COGD precense was similarly linked to larger and more pollution tolerant networks. In Large streams networks are expected to get smaller when in the presence of any type of OGD and COGD proximal networks are more connected and more modular. This provides general support for our hypotheses, which some exceptions. These network topology changes can be interpeted to suggest a replacement of specialist BMI with more generalist taxa that, being more hardy to pollution, are distibuted more homogenously throughout the samples, indifferenct to specific environmental conditions. These results agree with those found in other literature that used this same co-occurence framework to see how BMI respond to stress. There are several improvements that can be made to this project. It may be more appropriate, given the size of large stream drainage areas, to retain a continous count of proximal OGD wells rather then convert to boolean like with small streams. In this case quantiles of the stress variable (OGD) may guide how sampling is done. It would also be more hydrologically accurate to delineate the darinagae area of these small streams using the StreamStats API (which is a thing!) to cut down on the timescale problem. This way we are not relying on an arbitrary 3 km buffer. Considering the % of urban land use in the drainage area of these sampling locations may also yield valuable insights, as devleped land is known to be a significant driver of BMI health. This option is possible via the USGS streamstats. It may also be interesting to consider alternate conduits for contamination. Rather then hypthesizing that simply proximity is the cause of poor health, speculate as to why. Based on recent discussion in the literature. I believe that vibrations caused by fracking may facilitate the migration of contaminants up through abandoned or orphaned wells and into the supra-surface watercycle. So, contamination risk is a function of the number of abandoned wells near active unconventional gas drilling sites.

Conclusions

While it has been suggested in previous literature that unconventional drilling may be associated with slighly worse water quality, the effect of UOGD on proximal stream ecosystems have not been thoroughly explored. This study uses a framwork to create association networks of BMI from groups of samples proximal and distal to COGD and UOGD to determine whether topological network measures change significantly across the groups, We found results to conclude that OGD proximity is associated with several changes in network topology. Furthermore, patterns across a spectrum of spatial resolutions suggests that macroinvertebrate assemblage structure changes to be more connected and pollution tolerant in OGD proximal networks. More work to investigate these relationships shoulc be conducted.

Citations

Considine, T. J. (2024, July 1). API | Marcellus Shale. American Petroleum Institute. Retrieved August 16, 2024, from https://www.api.org/oil-and-natural-gas/energy-primers/hydraulic-fracturing/marcellus-shale

Looking Below the Surface. (n.d.). PA DEP GIS. Retrieved August 16, 2024, from https://gis.dep.pa.gov/macroinvertebrate/index.html

Mohajan, H.K. (2012). Unconventional shale gas extraction: Present and future effects. International Journal of Human Development and Sustainability, 5(2), 9–23.

Shaheen, S., Wen, T., Zheng, Z., Xue, L., Baka, J., & Brantley, S. L. (2024). Wastewaters co-produced with shale gas drive slight regional salinization of groundwater. EarthArxiv. Retrieved December 17, 2024, from https://eartharxiv.org/repository/view/6929/

Shull, D., & McGarrell, C. (2018). Eutrophication cause determination protocol: Technical report. A benthic macroinvertebrate multimetric index for large semi-wadeable rivers: Technical report. Retrieved August 16, 2024, from https://files.dep.state.pa.us/water/Drinking%20Water%20and%20Facility%20RegulatioWaterQualityPortalFiles/Technical%20Documentation/LargeRiver_Semiwadeable_Technical_Report.pdf

U.S. Energy Information Administration. (2024). Table definitions, sources, and explanatory notes. Retrieved December 8, 2024, from https://www.eia.gov/dnav/ng/TblDefs/ng_prod_shalegas_tbddef2.asp

Wen, T., Niu, X., Gonzales, M., Zheng, G., Li, Z., & Brantley, S. L. (2018). Big groundwater data sets reveal possible rare contamination amid otherwise improved water quality for some analytes in a region of Marcellus shale development. Environmental Science & Technology, 52(12), 7149–7159. https://doi.org/10.1021/acs.est.8b01123

What is the National Rivers and Streams Assessment? | US EPA. (2023, December 19). Environmental Protection Agency (EPA). Retrieved August 16, 2024, from https://www.epa.gov/national-aquatic-resource-surveys/what-national-rivers-and-streams-assessment

About

EAR609 Environmental Data Science Final Project: Using Network Analysis of Benthic Macroinvertebrates to Unravel the Effect of Oil and Gas Production on River Ecosystem Health 

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors