Reading layer `WySEaSON_SnakeGreys_StreamTemperature_locations' from data source `C:\Users\jbaldock\OneDrive - DOI\Documents\USGS\PASTA-groundwater\Stream temperature\WySEaSON_SnakeGreys_StreamTemperature_locations.shp'
using driver `ESRI Shapefile'
Simple feature collection with 118 features and 4 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -110.8945 ymin: 42.5621 xmax: -109.9837 ymax: 43.9417
Geodetic CRS: WGS 84
Code
mapview(sites_sp)
View histogram of site-specific groundwater influence. Note: these are only available for the sites Kevin included in the daily stream temperature model. This excludes sites in the Grey’s river drainage, loggers deployed in groundwater spring seeps, and redundant sites (streams within 2+ loggers in close proximity).
Does this hold for summer months (June - September)? No: negative association between Ar and Pl is driven by ~extreme values during winter and early spring.
Get streamflow data from Pacific Creek, a ~central reference gage with long-term streamflow records. We don’t use any of the Snake River gages (e.g., Snake above Palisades Reservoir) because these are all affected by flow regulation at Jackson Lake Dam, and therefore do not provide a good measure of within and among-year variation in climate…particularly during summer months when flows from JLD are a function of downstream irrigation demands. (a metric of within and among year variation in water availability/wetness)
Code
# get mean daily flow in cfs from NWISsite.info <-readNWISsite("13011500")[,c(1:2,7,8,30:31)] # get site info (Pacific Creek)flow <-readNWISdata(sites ="13011500", parameterCd =c("00060"), service ="dv", startDate ="1960-01-01", endDate ="2025-09-01")flow <- flow %>%select(3,4)names(flow) <-c("date", "flow_cfs")# fill missing dates using FASSTRflow <-fill_missing_dates(data = flow, dates = date)# calculate yieldflow <- flow %>%mutate(flow_cms = flow_cfs*0.02831683199881)flow <-add_daily_yield(flow, values = flow_cms, basin_area = site.info$drain_area_va*2.58999)# summarize by year and monthflow_monthly <- flow %>%mutate(month =month(date), year =year(date)) %>%group_by(year, month) %>%summarize(Yield_mm =sum(Yield_mm)) %>%ungroup()
Summarize PASTA by month. Note that we filter to June-September and >= 2020, i.e., months/years with substantial data. Also, again, only considering the sites in Kevin’s temp model (removes high density temperature observations on some streams, loggers in springs, etc.)
Code
dat_pasta_monthly <- dat_pasta %>%filter(R2_water >=0.8, R2_air >=0.8, !is.na(gw_influence)) %>%filter(month %in%c(6:9), year >=2020) %>%# filter to just the years/months with substantial datagroup_by(UpdatedName, year, month) %>%summarize(ndays =n(),amplitude_ratio =mean(amplitude_ratio),phase_lag =mean(phase_lag),mean_offset =mean(mean_offset),mean_ratio =mean(mean_ratio),gw_influence =mean(gw_influence),h2oAreaKm2_log =mean(h2oAreaKm2_log)) %>%ungroup() %>%filter(ndays >=25) %>%left_join(flow_monthly) %>%mutate(gw_influence_log =log(gw_influence),Yield_mm_log =log(Yield_mm)) %>%left_join(dat_climate) %>%mutate(swe_apr1_log =log(swe_apr1+0.01),precip_sum_log =log(precip_sum+0.01)) %>%filter(!UpdatedName %in%c("Lake Creek")) # drop sites that we don't expect to conform to typical paired air-water temp dynamics
View distributions of summer mean PASTA parameters (some sites with multiple years). Note the problematic phase_lags (negative and >5)
Weak correlation between watershed area and groundwater influence: groundwater-dominated streams tend to be smaller.
Weak correlation between groundwater influence and April 1 SWE: sites with strong groundwater influence tend to be at lower elevations, which receive less snowfall.
Strong correlation between month and total Yield: volumetric streamflow declines from June to September. Month ~is a proxy for water availability.
Relationship between groundwater and amplitude ratio is generally negative, with the strongest negative effects typically seen in August and during the driest year (2021)
Plot relationship between key parameters: no relationship…is this weird? Not necessarily given that the negative correlations between amplitude ratio and phase lag are driven by extreme values during ~winter
Code
egg::ggarrange(dat_pasta_monthly_trimmed %>%ggplot(aes(x = amplitude_ratio, y = phase_lag)) +geom_point(), dat_pasta_monthly_trimmed %>%ggplot(aes(x = amplitude_ratio, y = mean_offset)) +geom_point(), dat_pasta_monthly_trimmed %>%ggplot(aes(x = phase_lag, y = mean_offset)) +geom_point(),nrow =2, ncol =2)
Relationship between groundwater and amplitude ratio is generally negative, with the strongest negative effects typically seen in August and during the driest year (2021)
4.2.3.2 Model by covariates
Visualize relationships between amplitude ratio and (log) groundwater influence, (log) total monthly yield, and (log) watershed area.
---title: "PASTA by Groundwater"---```{r include=FALSE}library(tidyverse)library(lme4)library(FedData)library(dataRetrieval)library(fasstr)library(GGally)library(dygraphs)library(ggforce)library(sf)library(daymetr)library(mapview)```## Data### SitesGet site-level data from Kevin's temperature model```{r}# name changes for WyADAPTdat_names <-read_csv("Stream temperature/StreamSiteNames_WYADAPT.csv")# site-level covariates and temp model rmsedat_sites <-read_csv("Stream temperature/model_obs_pred_data.csv") %>%mutate(model_resid = model_pred - meanT,year =as.factor(year(date))) %>%group_by(site) %>%summarize(gw_influence =mean(gw_influence),h2oAreaKm2_log =mean(h2oAreaKm2_log),forest_buff_perc =mean(forest_buff_perc),h2o_slope_mean =mean(h2o_slope_mean),h2o_elev_mean =mean(h2o_elev_mean),sampsize =n(),rmse =sqrt(mean((model_resid)^2)),mae =mean(abs(model_resid))) %>%ungroup() %>%rename("CurrentName"="site") %>%left_join(dat_names)dat_sites```Get location data```{r}sites_sp <-st_read("Stream Temperature/WySEaSON_SnakeGreys_StreamTemperature_locations.shp") %>%filter(name %in%unique(dat_sites$UpdatedName))mapview(sites_sp)```View histogram of site-specific groundwater influence. Note: these are only available for the sites Kevin included in the daily stream temperature model. This excludes sites in the Grey's river drainage, loggers deployed in groundwater spring seeps, and redundant sites (streams within 2+ loggers in close proximity).```{r}dat_sites %>%ggplot() +geom_histogram(aes(gw_influence))```Histogram of site-specific catchment size```{r}dat_sites %>%ggplot() +geom_histogram(aes(h2oAreaKm2_log))```### PASTALoad diel PASTA metrics from Tim Lambert```{r}dat_pasta <-read_csv("WY-GW_daily-metrics.csv") %>%rename("UpdatedName"="site") %>%left_join(dat_sites) %>%mutate(year =year(date),month =month(date),doy =yday(date))dat_pasta# # view(dat_pasta %>%# filter(is.na(gw_influence)) %>% group_by(UpdatedName) %>% summarize(n = n()))# # view(dat_pasta %>%# filter(!is.na(gw_influence)) %>% group_by(UpdatedName) %>% summarize(n = n()))```View histogram of R-squared for water and air temperature sine wave fits```{r}egg::ggarrange(dat_pasta %>%ggplot() +geom_histogram(aes(R2_water)), dat_pasta %>%ggplot() +geom_histogram(aes(R2_air)))```Load flagged phase_lag from Tim L.```{r}phaselag_flags <-read_csv("WY-GW_flag_phase-lag.csv") %>%rename("UpdatedName"="site")phaselag_flags```Plot time series of monthly flags```{r fig.width=7, fig.height=14}phaselag_flags %>%ggplot(aes(x = month_date, y = UpdatedName)) +geom_point(aes(color =as.factor(phase_lag_flag))) +scale_color_manual(values =c("0"="grey", "1"="orange", "2"="red")) +theme_bw() +theme(legend.position ="top")```Get tibble of "good" sites/months...still >1000 site-months with "good" phase lag estimates. Yay!```{r}noflag <- phaselag_flags %>%filter(phase_lag_flag ==0) %>%select(UpdatedName, year, month, phase_lag_flag)noflag```Trim raw PASTA output by phase_lag flags```{r}dat_pasta_trimmed <- dat_pasta %>%left_join(noflag) %>%filter(phase_lag_flag ==0)```#### Inspect PASTASet focal stream: Leidy Creek```{r}myname <-"Leidy Creek"```Inspect PASTA time series```{r}dat_pasta_trimmed %>%filter(UpdatedName == myname) %>%ggplot(aes(x = doy, y = amplitude_ratio)) +geom_point() +facet_wrap(~year) +ylim(0,1) +geom_smooth(se =FALSE) +theme_bw()dat_pasta_trimmed %>%filter(UpdatedName == myname) %>%ggplot(aes(x = doy, y = phase_lag)) +geom_point(color ="red") +facet_wrap(~year) +geom_smooth(se =FALSE, color ="black") +theme_bw()```What is the relationship between amplitude ratio and phase lag?```{r}dat_pasta_trimmed %>%filter(UpdatedName == myname) %>%ggplot(aes(x = amplitude_ratio, y = phase_lag)) +geom_point() +facet_wrap(~year) +xlim(0,1) +ylim(-5,10) +geom_smooth(se =FALSE) +theme_bw()```Does this hold for summer months (June - September)? No: negative association between Ar and Pl is driven by ~extreme values during winter and early spring. ```{r}dat_pasta_trimmed %>%filter(UpdatedName == myname, month %in%c(6:9)) %>%ggplot(aes(x = amplitude_ratio, y = phase_lag)) +geom_point() +facet_wrap(~year) +xlim(0,1) +ylim(-5,10) +geom_smooth(se =FALSE) +theme_bw()```### StreamflowGet streamflow data from Pacific Creek, a ~central reference gage with long-term streamflow records. We don't use any of the Snake River gages (e.g., Snake above Palisades Reservoir) because these are all affected by flow regulation at Jackson Lake Dam, and therefore do not provide a good measure of within and among-year variation in climate...particularly during summer months when flows from JLD are a function of downstream irrigation demands. (a metric of within and among year variation in water availability/wetness)```{r}# get mean daily flow in cfs from NWISsite.info <-readNWISsite("13011500")[,c(1:2,7,8,30:31)] # get site info (Pacific Creek)flow <-readNWISdata(sites ="13011500", parameterCd =c("00060"), service ="dv", startDate ="1960-01-01", endDate ="2025-09-01")flow <- flow %>%select(3,4)names(flow) <-c("date", "flow_cfs")# fill missing dates using FASSTRflow <-fill_missing_dates(data = flow, dates = date)# calculate yieldflow <- flow %>%mutate(flow_cms = flow_cfs*0.02831683199881)flow <-add_daily_yield(flow, values = flow_cms, basin_area = site.info$drain_area_va*2.58999)# summarize by year and monthflow_monthly <- flow %>%mutate(month =month(date), year =year(date)) %>%group_by(year, month) %>%summarize(Yield_mm =sum(Yield_mm)) %>%ungroup()```View full time series of flow```{r}dygraph(flow %>%select(date, Yield_mm)) %>%dyRangeSelector()```View time series of monthly yield ```{r}flow_monthly %>%filter(year %in%c(2019:2024)) %>%ggplot(aes(x = month, y = Yield_mm)) +geom_point() +geom_line() +facet_wrap(~year) +theme_bw() +theme(panel.grid =element_blank())```### ClimatePull site-specific climate data from Daymet```{r eval=FALSE}climlist <-vector("list", length =dim(sites_sp)[1])for (i in1:dim(sites_sp)[1]) { clim <-download_daymet(site = sites_sp$name[i], lat = sites_sp$latitude[i], lon = sites_sp$longitude[i], start =min(year(dat_pasta_trimmed$date))-1, end =max(year(dat_pasta_trimmed$date)), internal = T) climlist[[i]] <-tibble(clim$data) %>%mutate(air_temp_mean = (tmax..deg.c. + tmin..deg.c.)/2, date =as.Date(paste(year, yday, sep ="-"), "%Y-%j"),UpdatedName = sites_sp$name[i]) %>%select(12,2,11,10,4,6,3,5) %>%rename(precip_mmday =5, swe_kgm2 =6, daylength_sec =7, shortrad_wm2 =8)print(i)}climdf <-do.call(rbind, climlist)climdf <-add_date_variables(climdf, dates = date, water_year_start =10) %>%rename(month = Month)write_csv(climdf, "Air temperature/Daymet_daily.csv")``````{r}dat_climate <-read_csv("Air temperature/Daymet_daily.csv") %>%group_by(UpdatedName, WaterYear) %>%summarize(swe_mean =mean(swe_kgm2),swe_sum =sum(swe_kgm2),swe_apr1 = swe_kgm2[yday ==91]) %>%ungroup() %>%left_join(read_csv("Air temperature/Daymet_daily.csv") %>%filter(yday >=152& yday <=273) %>%group_by(UpdatedName, WaterYear, month) %>%summarize(precip_mean =mean(precip_mmday),precip_sum =sum(precip_mmday)) %>%ungroup() ) %>%select(UpdatedName, WaterYear, month, swe_mean, swe_sum, swe_apr1, precip_mean, precip_sum) %>%rename(year = WaterYear)ggpairs(dat_climate %>%select(4:8) )```## Model PASTA### Summarize dataSummarize PASTA by month. Note that we filter to June-September and >= 2020, i.e., months/years with substantial data. Also, again, only considering the sites in Kevin's temp model (removes high density temperature observations on some streams, loggers in springs, etc.)```{r}dat_pasta_monthly <- dat_pasta %>%filter(R2_water >=0.8, R2_air >=0.8, !is.na(gw_influence)) %>%filter(month %in%c(6:9), year >=2020) %>%# filter to just the years/months with substantial datagroup_by(UpdatedName, year, month) %>%summarize(ndays =n(),amplitude_ratio =mean(amplitude_ratio),phase_lag =mean(phase_lag),mean_offset =mean(mean_offset),mean_ratio =mean(mean_ratio),gw_influence =mean(gw_influence),h2oAreaKm2_log =mean(h2oAreaKm2_log)) %>%ungroup() %>%filter(ndays >=25) %>%left_join(flow_monthly) %>%mutate(gw_influence_log =log(gw_influence),Yield_mm_log =log(Yield_mm)) %>%left_join(dat_climate) %>%mutate(swe_apr1_log =log(swe_apr1+0.01),precip_sum_log =log(precip_sum+0.01)) %>%filter(!UpdatedName %in%c("Lake Creek")) # drop sites that we don't expect to conform to typical paired air-water temp dynamics```View distributions of summer mean PASTA parameters (some sites with multiple years). Note the problematic phase_lags (negative and >5)```{r}egg::ggarrange(dat_pasta_monthly %>%ggplot() +geom_histogram(aes(amplitude_ratio)), dat_pasta_monthly %>%ggplot() +geom_histogram(aes(phase_lag)), dat_pasta_monthly %>%ggplot() +geom_histogram(aes(mean_offset)), dat_pasta_monthly %>%ggplot() +geom_histogram(aes(mean_ratio)),nrow =2, ncol =2)```View pairs plot between groundwater influence, total monthly yield, watershed area, and month```{r}ggpairs(dat_pasta_monthly %>%select(gw_influence_log, Yield_mm_log, h2oAreaKm2_log, swe_apr1_log, precip_sum_log, month))```* Weak correlation between watershed area and groundwater influence: groundwater-dominated streams tend to be smaller.* Weak correlation between groundwater influence and April 1 SWE: sites with strong groundwater influence tend to be at lower elevations, which receive less snowfall.* Strong correlation between month and total Yield: volumetric streamflow declines from June to September. Month ~is a proxy for water availability.### Amplitude ratio#### Model by month/yearFacet by year and month```{r fig.width=7, fig.height=9}dat_pasta_monthly %>%ggplot(aes(x = gw_influence_log, y = amplitude_ratio)) +geom_point() +geom_smooth(method ="lm") +facet_grid(rows =vars(year), cols =vars(month)) +theme_bw() +theme(panel.grid =element_blank())```Fit random slope and intercept model, by month and year```{r}mymod <- lmerTest::lmer(amplitude_ratio ~ gw_influence_log + (gw_influence_log | month + year + UpdatedName), data = dat_pasta_monthly)summary(mymod)```View coefficients```{r}coef(mymod)$yearcoef(mymod)$month```*Relationship between groundwater and amplitude ratio is generally negative, with the strongest negative effects typically seen in August and during the driest year (2021)*#### Model by covariatesStandardize covariates```{r}dat_pasta_monthly <- dat_pasta_monthly %>%mutate(z_gw_influence_log =c(scale(gw_influence_log)),z_yield_mm_log =c(scale(Yield_mm_log)),z_h2oareakm2_log =c(scale(h2oAreaKm2_log)),z_swe_apr1_log =c(scale(swe_apr1_log)),z_precip_sum =c(scale(precip_sum)))```Visualize relationships between amplitude ratio and (log) groundwater influence, (log) total monthly yield, and (log) watershed area.```{r}egg::ggarrange( dat_pasta_monthly %>%ggplot(aes(x = z_gw_influence_log, y = amplitude_ratio)) +geom_point() +geom_smooth(method ="lm") +theme_bw() +theme(panel.grid =element_blank()), dat_pasta_monthly %>%ggplot(aes(x = z_yield_mm_log, y = amplitude_ratio)) +geom_point() +geom_smooth(method ="lm") +theme_bw() +theme(panel.grid =element_blank()), dat_pasta_monthly %>%ggplot(aes(x = z_h2oareakm2_log, y = amplitude_ratio)) +geom_point() +geom_smooth(method ="lm") +theme_bw() +theme(panel.grid =element_blank()), dat_pasta_monthly %>%ggplot(aes(x = z_swe_apr1_log, y = amplitude_ratio)) +geom_point() +geom_smooth(method ="lm") +theme_bw() +theme(panel.grid =element_blank()), dat_pasta_monthly %>%ggplot(aes(x = z_precip_sum, y = amplitude_ratio)) +geom_point() +geom_smooth(method ="lm") +theme_bw() +theme(panel.grid =element_blank()),nrow =1)```##### Simple modelSimple linear model with additive effects```{r eval=FALSE}summary(lm(amplitude_ratio ~ z_gw_influence_log + z_yield_mm_log + z_h2oareakm2_log + z_swe_apr1_log + z_precip_sum, dat_pasta_monthly))```Additive effects with random intercept for year```{r}mymod <- lmerTest::lmer(amplitude_ratio ~ z_gw_influence_log + z_yield_mm_log + z_h2oareakm2_log + z_swe_apr1_log + z_precip_sum + (1| UpdatedName), data = dat_pasta_monthly)summary(mymod)```##### By GW and flow*No random effects*Simple linear model, groundwater + flow volume ```{r}mymod <-lm(amplitude_ratio ~ z_gw_influence_log*z_yield_mm_log, dat_pasta_monthly)summary(mymod)sqrt(mean(mymod$residuals^2))```Plot interactive effects```{r}x1 <-seq(min(dat_pasta_monthly$z_gw_influence_log), max(dat_pasta_monthly$z_gw_influence_log), length.out =100)x2 <-seq(min(dat_pasta_monthly$z_yield_mm_log), max(dat_pasta_monthly$z_yield_mm_log), length.out =100)prediction_grid <-expand.grid(z_gw_influence_log = x1, z_yield_mm_log = x2)prediction_grid$y_predicted <-predict(mymod, newdata = prediction_grid)ggplot() +geom_tile(prediction_grid, mapping =aes(x = z_gw_influence_log, y = z_yield_mm_log, fill = y_predicted)) +scale_fill_viridis_c(option ="plasma") +# Use a color scale, e.g., viridislabs(x ="(log) groundwater influence",y ="(log) total yield",fill ="monthly mean\namplitude ratio" ) +geom_point(dat_pasta_monthly, mapping =aes(x = z_gw_influence_log, y = z_yield_mm_log), size =0.2) +theme_bw()```##### By GW and winter snow*No random effects*Simple linear model, groundwater + winter snow```{r}mymod <-lm(amplitude_ratio ~ z_gw_influence_log*z_swe_apr1_log, dat_pasta_monthly)summary(mymod)sqrt(mean(mymod$residuals^2))```Plot interactive effects```{r}x1 <-seq(min(dat_pasta_monthly$z_gw_influence_log), max(dat_pasta_monthly$z_gw_influence_log), length.out =100)x2 <-seq(min(dat_pasta_monthly$z_swe_apr1_log), max(dat_pasta_monthly$z_swe_apr1_log), length.out =100)prediction_grid <-expand.grid(z_gw_influence_log = x1, z_swe_apr1_log = x2)prediction_grid$y_predicted <-predict(mymod, newdata = prediction_grid)ggplot() +geom_tile(prediction_grid, mapping =aes(x = z_gw_influence_log, y = z_swe_apr1_log, fill = y_predicted)) +scale_fill_viridis_c(option ="plasma") +# Use a color scale, e.g., viridislabs(x ="(log) groundwater influence",y ="(log) April 1 SWE",fill ="monthly mean\namplitude ratio" ) +geom_point(dat_pasta_monthly, mapping =aes(x = z_gw_influence_log, y = z_swe_apr1_log), size =0.2) +theme_bw()```##### By GW and summer precip*No random effects*Simple linear model, groundwater + summer precip```{r}mymod <-lm(amplitude_ratio ~ z_gw_influence_log*z_precip_sum, dat_pasta_monthly)summary(mymod)sqrt(mean(mymod$residuals^2))```Plot interactive effects```{r}x1 <-seq(min(dat_pasta_monthly$z_gw_influence_log), max(dat_pasta_monthly$z_gw_influence_log), length.out =100)x2 <-seq(min(dat_pasta_monthly$z_precip_sum), max(dat_pasta_monthly$z_precip_sum), length.out =100)prediction_grid <-expand.grid(z_gw_influence_log = x1, z_precip_sum = x2)prediction_grid$y_predicted <-predict(mymod, newdata = prediction_grid)ggplot() +geom_tile(prediction_grid, mapping =aes(x = z_gw_influence_log, y = z_precip_sum, fill = y_predicted)) +scale_fill_viridis_c(option ="plasma") +# Use a color scale, e.g., viridislabs(x ="(log) groundwater influence",y ="Cumulative summer precip. (mm)",fill ="monthly mean\namplitude ratio" ) +geom_point(dat_pasta_monthly, mapping =aes(x = z_gw_influence_log, y = z_precip_sum), size =0.2) +theme_bw()```##### By GW and area*No random effects*Simple linear model, groundwater + flow volume ```{r}mymod <-lm(amplitude_ratio ~ z_gw_influence_log*z_h2oareakm2_log, dat_pasta_monthly)summary(mymod)sqrt(mean(mymod$residuals^2))```Plot interactive effects```{r}x1 <-seq(min(dat_pasta_monthly$z_gw_influence_log), max(dat_pasta_monthly$z_gw_influence_log), length.out =100)x2 <-seq(min(dat_pasta_monthly$z_h2oareakm2_log), max(dat_pasta_monthly$z_h2oareakm2_log), length.out =100)prediction_grid <-expand.grid(z_gw_influence_log = x1, z_h2oareakm2_log = x2)prediction_grid$y_predicted <-predict(mymod, newdata = prediction_grid)ggplot() +geom_tile(prediction_grid, mapping =aes(x = z_gw_influence_log, y = z_h2oareakm2_log, fill = y_predicted)) +scale_fill_viridis_c(option ="plasma") +# Use a color scale, e.g., viridislabs(x ="(log) groundwater influence",y ="(log) watershed area",fill ="monthly mean\namplitude ratio" ) +geom_point(dat_pasta_monthly, mapping =aes(x = z_gw_influence_log, y = z_h2oareakm2_log), size =0.2) +theme_bw()```##### Full interaction*No random effects*. Fit the model: interactions between groundwater x watershed area and groundwater x flow volume. (with a 3-way interaction, no significant effects)```{r}mymod <-lm(amplitude_ratio ~ z_gw_influence_log*z_h2oareakm2_log + z_gw_influence_log*z_yield_mm_log + z_gw_influence_log*z_swe_apr1_log + z_gw_influence_log*z_precip_sum, dat_pasta_monthly)summary(mymod)```::: panel-tabset###### watershed area```{r}x1 <-seq(min(dat_pasta_monthly$z_gw_influence_log), max(dat_pasta_monthly$z_gw_influence_log), length.out =100)x2 <-seq(min(dat_pasta_monthly$z_h2oareakm2_log), max(dat_pasta_monthly$z_h2oareakm2_log), length.out =100)prediction_grid <-expand.grid(z_gw_influence_log = x1, z_h2oareakm2_log = x2)prediction_grid$z_yield_mm_log <-0prediction_grid$z_swe_apr1_log <-0prediction_grid$z_precip_sum <-0prediction_grid$y_predicted <-predict(mymod, newdata = prediction_grid)ggplot(prediction_grid, aes(x = z_gw_influence_log, y = z_h2oareakm2_log, fill = y_predicted)) +geom_tile() +scale_fill_viridis_c(option ="plasma") +# Use a color scale, e.g., viridislabs(x ="(log) groundwater influence",y ="(log) watershed area",fill ="monthly mean\namplitude ratio" ) +theme_bw()```###### total yield```{r}x1 <-seq(min(dat_pasta_monthly$z_gw_influence_log), max(dat_pasta_monthly$z_gw_influence_log), length.out =100)x2 <-seq(min(dat_pasta_monthly$z_yield_mm_log), max(dat_pasta_monthly$z_yield_mm_log), length.out =100)prediction_grid <-expand.grid(z_gw_influence_log = x1, z_yield_mm_log = x2)prediction_grid$z_h2oareakm2_log <-0prediction_grid$z_swe_apr1_log <-0prediction_grid$z_precip_sum <-0prediction_grid$y_predicted <-predict(mymod, newdata = prediction_grid)ggplot(prediction_grid, aes(x = z_gw_influence_log, y = z_yield_mm_log, fill = y_predicted)) +geom_tile() +scale_fill_viridis_c(option ="plasma") +# Use a color scale, e.g., viridislabs(x ="(log) groundwater influence",y ="(log) total yield",fill ="monthly mean\namplitude ratio" ) +theme_bw()```###### april 1 swe```{r}x1 <-seq(min(dat_pasta_monthly$z_gw_influence_log), max(dat_pasta_monthly$z_gw_influence_log), length.out =100)x2 <-seq(min(dat_pasta_monthly$z_swe_apr1_log), max(dat_pasta_monthly$z_swe_apr1_log), length.out =100)prediction_grid <-expand.grid(z_gw_influence_log = x1, z_swe_apr1_log = x2)prediction_grid$z_h2oareakm2_log <-0prediction_grid$z_yield_mm_log <-0prediction_grid$z_precip_sum <-0prediction_grid$y_predicted <-predict(mymod, newdata = prediction_grid)ggplot(prediction_grid, aes(x = z_gw_influence_log, y = z_swe_apr1_log, fill = y_predicted)) +geom_tile() +scale_fill_viridis_c(option ="plasma") +# Use a color scale, e.g., viridislabs(x ="(log) groundwater influence",y ="(log) April 1 SWE",fill ="monthly mean\namplitude ratio" ) +theme_bw()```###### summer precip```{r}x1 <-seq(min(dat_pasta_monthly$z_gw_influence_log), max(dat_pasta_monthly$z_gw_influence_log), length.out =100)x2 <-seq(min(dat_pasta_monthly$z_precip_sum), max(dat_pasta_monthly$z_precip_sum), length.out =100)prediction_grid <-expand.grid(z_gw_influence_log = x1, z_precip_sum = x2)prediction_grid$z_h2oareakm2_log <-0prediction_grid$z_swe_apr1_log <-0prediction_grid$z_yield_mm_log <-0prediction_grid$y_predicted <-predict(mymod, newdata = prediction_grid)ggplot(prediction_grid, aes(x = z_gw_influence_log, y = z_precip_sum, fill = y_predicted)) +geom_tile() +scale_fill_viridis_c(option ="plasma") +# Use a color scale, e.g., viridislabs(x ="(log) groundwater influence",y ="Cumulative summer precip. (mm)",fill ="monthly mean\namplitude ratio" ) +theme_bw()```:::### Phase lagTrim to only "good" phase lag estimates. Roughly half the number of observations.```{r}dat_pasta_monthly_trimmed <- dat_pasta_monthly %>%left_join(noflag) %>%filter(phase_lag_flag ==0)ggpairs(dat_pasta_monthly_trimmed %>%select(gw_influence_log, Yield_mm_log, h2oAreaKm2_log, month))```Plot relationship between key parameters: no relationship...is this weird? Not necessarily given that the negative correlations between amplitude ratio and phase lag are driven by extreme values during ~winter```{r fig.width=7, fig.height=7}egg::ggarrange(dat_pasta_monthly_trimmed %>%ggplot(aes(x = amplitude_ratio, y = phase_lag)) +geom_point(), dat_pasta_monthly_trimmed %>%ggplot(aes(x = amplitude_ratio, y = mean_offset)) +geom_point(), dat_pasta_monthly_trimmed %>%ggplot(aes(x = phase_lag, y = mean_offset)) +geom_point(),nrow =2, ncol =2)```#### Model by month/yearFacet by year and month```{r fig.width=7, fig.height=9}dat_pasta_monthly_trimmed %>%ggplot(aes(x = gw_influence_log, y = phase_lag)) +geom_point() +geom_smooth(method ="lm") +facet_grid(rows =vars(year), cols =vars(month)) +theme_bw() +theme(panel.grid =element_blank())```Fit random slope and intercept model, by month and year```{r}mymod <- lmerTest::lmer(phase_lag ~ gw_influence_log + (gw_influence_log | month + year + UpdatedName), data = dat_pasta_monthly_trimmed)summary(mymod)```View coefficients```{r}coef(mymod)$yearcoef(mymod)$month```*Relationship between groundwater and amplitude ratio is generally negative, with the strongest negative effects typically seen in August and during the driest year (2021)*#### Model by covariatesVisualize relationships between amplitude ratio and (log) groundwater influence, (log) total monthly yield, and (log) watershed area.```{r}egg::ggarrange( dat_pasta_monthly_trimmed %>%ggplot(aes(x = gw_influence_log, y = phase_lag)) +geom_point() +geom_smooth(method ="lm") +theme_bw() +theme(panel.grid =element_blank()), dat_pasta_monthly_trimmed %>%ggplot(aes(x = Yield_mm_log, y = phase_lag)) +geom_point() +geom_smooth(method ="lm") +theme_bw() +theme(panel.grid =element_blank()), dat_pasta_monthly_trimmed %>%ggplot(aes(x = h2oAreaKm2_log, y = phase_lag)) +geom_point() +geom_smooth(method ="lm") +theme_bw() +theme(panel.grid =element_blank()),nrow =1)```##### Simple modelSimple linear model with additive effects```{r eval=FALSE}summary(lm(phase_lag ~ gw_influence_log + Yield_mm_log + h2oAreaKm2_log, dat_pasta_monthly_trimmed))```Additive effects with random intercept for year```{r}mymod <- lmerTest::lmer(phase_lag ~ gw_influence_log + Yield_mm_log + h2oAreaKm2_log + (1| year), data = dat_pasta_monthly_trimmed)summary(mymod)```##### By GW and flow*No random effects*Simple linear model, groundwater * flow volume ```{r}mymod <-lm(phase_lag ~ gw_influence_log*Yield_mm_log, dat_pasta_monthly_trimmed)summary(mymod)```Plot interactive effects```{r}x1 <-seq(min(dat_pasta_monthly$gw_influence_log), max(dat_pasta_monthly$gw_influence_log), length.out =100)x2 <-seq(min(dat_pasta_monthly$Yield_mm_log), max(dat_pasta_monthly$Yield_mm_log), length.out =100)prediction_grid <-expand.grid(gw_influence_log = x1, Yield_mm_log = x2)prediction_grid$y_predicted <-predict(mymod, newdata = prediction_grid)ggplot(prediction_grid, aes(x = gw_influence_log, y = Yield_mm_log, fill = y_predicted)) +geom_tile() +scale_fill_viridis_c(option ="plasma") +# Use a color scale, e.g., viridislabs(x ="(log) groundwater influence",y ="(log) total yield",fill ="monthly mean\nphase lag" ) +theme_bw()```##### By GW and area*No random effects*Simple linear model, groundwater * watershed area```{r}mymod <-lm(phase_lag ~ gw_influence_log*h2oAreaKm2_log, dat_pasta_monthly_trimmed)summary(mymod)```Plot interactive effects```{r}x1 <-seq(min(dat_pasta_monthly$gw_influence_log), max(dat_pasta_monthly$gw_influence_log), length.out =100)x2 <-seq(min(dat_pasta_monthly$h2oAreaKm2_log), max(dat_pasta_monthly$h2oAreaKm2_log), length.out =100)prediction_grid <-expand.grid(gw_influence_log = x1, h2oAreaKm2_log = x2)prediction_grid$y_predicted <-predict(mymod, newdata = prediction_grid)ggplot(prediction_grid, aes(x = gw_influence_log, y = h2oAreaKm2_log, fill = y_predicted)) +geom_tile() +scale_fill_viridis_c(option ="plasma") +# Use a color scale, e.g., viridislabs(x ="(log) groundwater influence",y ="(log) watershed area",fill ="monthly mean\nphase lag" ) +theme_bw()```##### Full interaction*No random effects*. Fit the model: interactions between groundwater x watershed area and groundwater x flow volume. (with a 3-way interaction, no significant effects)```{r}mymod <-lm(phase_lag ~ gw_influence_log*h2oAreaKm2_log + gw_influence_log*Yield_mm_log, dat_pasta_monthly_trimmed)summary(mymod)```