siteinfo2 <- siteinfo %>%filter(!site_name %in%c("WoundedBuckCreek", "Brackett Creek", "South River Conway NWIS", "Shields River nr Livingston NWIS", "North Fork Flathead River NWIS", "Pacific Creek at Moran NWIS")) %>%mutate(designation =ifelse(site_name %in%c("Donner Blitzen River nr Frenchglen NWIS", "BigCreekLower", "CoalCreekLower", "McGeeCreekLower", "West Brook NWIS", "West Brook 0", "Paine Run 10", "Staunton River 10", "Piney River 10", "Shields River Valley Ranch", "Shields River ab Smith NWIS", "EF Duck Creek be HF","Spread Creek Dam"), "big", "little"))datatable(siteinfo2 %>%arrange(region, basin, subbasin, designation) %>%mutate(lat =round(lat, digits =2), long =round(long, digits =2), area_sqmi =round(area_sqmi, digits =2), elev_ft =round(elev_ft, digits =0)),caption ="EcoDrought monitoring locations and metadata.")
# A tibble: 3,554,087 × 10
region basin subbasin site_name datetime tempc TempReliability
<chr> <chr> <chr> <chr> <dttm> <dbl> <dbl>
1 Flat Flathead Big Creek Big Cree… 2018-09-10 20:00:00 NA 0
2 Flat Flathead Big Creek Big Cree… 2018-09-10 21:00:00 NA 0
3 Flat Flathead Big Creek Big Cree… 2018-09-10 22:00:00 NA 0
4 Flat Flathead Big Creek Big Cree… 2018-09-10 23:00:00 NA 0
5 Flat Flathead Big Creek Big Cree… 2018-09-11 00:00:00 NA 0
6 Flat Flathead Big Creek Big Cree… 2018-09-11 01:00:00 NA 0
7 Flat Flathead Big Creek Big Cree… 2018-09-11 02:00:00 NA 0
8 Flat Flathead Big Creek Big Cree… 2018-09-11 03:00:00 NA 0
9 Flat Flathead Big Creek Big Cree… 2018-09-11 04:00:00 NA 0
10 Flat Flathead Big Creek Big Cree… 2018-09-11 05:00:00 NA 0
# ℹ 3,554,077 more rows
# ℹ 3 more variables: flow <dbl>, DischargeReliability <dbl>, z_flow <dbl[,1]>
7.1.3 Air temp
Apply air temperature data from a single site within/near each sub-basin to all water temperature/flow sites within the same sub-basin. This implicitly assumes that air temperature is homogeneous within each sub-basin, which is an oversimplification, but necessary as we only have air temperature observations at a single site within most sub-basins.
Importantly, SNOTEL data appears to be a reasonable surrogate for in-situ air temperature, at least as shown for the Shields River, Snake River/Spread Creek, and Duck Creek. To maintain consistency across basins (at least in western US) and years where in-situ data is missing, use SNOTEL air temperature data for all western US basins. Note that Duck Creek gets Shields River SNOTEL data, as this was the nearest SNOTEL site, and showed strong concordance with in-situ data. Additionally, using SNOTEL data as a surrogate for in-situ air temperature data may not be as robust in the Donner-Blitzen, where the SNOTEL site is at a considerably higher elevation than the stream monitoring locations.
# A tibble: 6 × 5
basin subbasin source datetime tempc_air
<chr> <chr> <chr> <dttm> <dbl>
1 West Brook West Brook SOConte 2019-12-31 18:00:00 0.920
2 West Brook West Brook SOConte 2019-12-31 19:00:00 0.508
3 West Brook West Brook SOConte 2019-12-31 20:00:00 0.371
4 West Brook West Brook SOConte 2019-12-31 21:00:00 0.149
5 West Brook West Brook SOConte 2019-12-31 22:00:00 0.0378
6 West Brook West Brook SOConte 2019-12-31 23:00:00 0.01
Code
head(airtemp_shen)
# A tibble: 6 × 5
basin subbasin datetime tempc_air source
<chr> <chr> <dttm> <dbl> <chr>
1 Paine Run Paine Run 2014-10-06 17:00:00 18.6 UVA
2 Paine Run Paine Run 2014-10-06 18:00:00 19.4 UVA
3 Paine Run Paine Run 2014-10-06 19:00:00 19.3 UVA
4 Paine Run Paine Run 2014-10-06 20:00:00 19.4 UVA
5 Paine Run Paine Run 2014-10-06 21:00:00 18.9 UVA
6 Paine Run Paine Run 2014-10-06 22:00:00 18.2 UVA
Code
head(airtemp_flat)
# A tibble: 6 × 5
basin source datetime tempc_air subbasin
<chr> <chr> <dttm> <dbl> <chr>
1 Flathead SNOTEL 2017-01-01 08:00:00 -5.61 Big Creek
2 Flathead SNOTEL 2017-01-01 09:00:00 -5.61 Big Creek
3 Flathead SNOTEL 2017-01-01 10:00:00 -5.39 Big Creek
4 Flathead SNOTEL 2017-01-01 11:00:00 -5.28 Big Creek
5 Flathead SNOTEL 2017-01-01 12:00:00 -4.72 Big Creek
6 Flathead SNOTEL 2017-01-01 13:00:00 -5.39 Big Creek
Code
head(airtemp_shields)
# A tibble: 6 × 5
basin subbasin source datetime tempc_air
<chr> <chr> <chr> <dttm> <dbl>
1 Shields River Shields River SNOTEL 2013-01-01 08:00:00 -7.5
2 Shields River Shields River SNOTEL 2013-01-01 09:00:00 -7.22
3 Shields River Shields River SNOTEL 2013-01-01 10:00:00 -7
4 Shields River Shields River SNOTEL 2013-01-01 11:00:00 -6.72
5 Shields River Shields River SNOTEL 2013-01-01 12:00:00 -6.22
6 Shields River Shields River SNOTEL 2013-01-01 13:00:00 -5.39
Code
head(airtemp_snake)
# A tibble: 6 × 5
basin subbasin source datetime tempc_air
<chr> <chr> <chr> <dttm> <dbl>
1 Snake River Snake River SNOTEL 2013-01-01 08:00:00 -13.7
2 Snake River Snake River SNOTEL 2013-01-01 09:00:00 -13.7
3 Snake River Snake River SNOTEL 2013-01-01 10:00:00 -13.6
4 Snake River Snake River SNOTEL 2013-01-01 11:00:00 -13.6
5 Snake River Snake River SNOTEL 2013-01-01 12:00:00 -13.6
6 Snake River Snake River SNOTEL 2013-01-01 13:00:00 -13.6
Al-Chokhachy has in-situ air temperature for Duck Creek, the Shields River, and Spread Creek. For Shields and Spread, compare in-situ air temperature measurements with SNOTEL data. Can we justify using SNOTEL data for these and other western basins?
cat_shields %>%mutate(doy =yday(datetime), year =year(datetime)) %>%filter(!year %in%c(2014,2015,2024)) %>%ggplot(aes(x = tempc_air, y = tempc_air_alc, color = doy)) +geom_point() +geom_abline(intercept =0, slope =1, linetype ="dashed") +scale_color_gradient2(midpoint =182, low ="orange", mid ="purple3", high ="orange") +facet_wrap(~year) +xlab("SNOTEL air temp (C)") +ylab("In-situ air temp (C)")
Notes:
There are a few periods where the in-situ data are clearly wrong, e.g., winter 2013 and November 2016. For the latter, logger appears to be sun affected (in-situ far too warm for August).
Generally, diel cycles match well. Daily max temps are similar between SNOTEL and in-situ data. However, in-situ daily minimums are far lower than SNOTEL. Could this be located in a cold pit/sink?
There is a lot of scatter around 1:1 in the scatter plots. Some of this is probably due to slight differences in the time of sensor readings, but most of it is probably due to the issues mentioned above.
Spread Creek
How well does in-situ air temp align with SNOTEL air temp?
cat_spread %>%mutate(doy =yday(datetime), year =year(datetime)) %>%filter(!year %in%c(2019,2024)) %>%ggplot(aes(x = tempc_air, y = tempc_air_alc, color = doy)) +geom_point() +geom_abline(intercept =0, slope =1, linetype ="dashed") +scale_color_gradient2(midpoint =182, low ="orange", mid ="purple3", high ="orange") +facet_wrap(~year) +xlab("SNOTEL air temp (C)") +ylab("In-situ air temp (C)")
Notes:
There is only one apparent period where the in-situ data are clearly wrong: November 2016. In-situ logger appears to be sun affected (in-situ far too warm for August).
Generally, diel cycles match well (much better than in Shields). Daily max and min temps are similar between SNOTEL and in-situ data, particularly during summer.
There is some scatter around 1:1 in the scatter plots. Much of this appears to be due to ~45 min lag between sensor readings, but less of the disagreement being driven by actual disagreement between sensor readings.
Duck Creek
The closest SNOTEL station is Porcupine in the Shields River basin. How well do these align?
cat_duck %>%mutate(doy =yday(datetime), year =year(datetime)) %>%filter(!year %in%c(2013,2024)) %>%ggplot(aes(x = tempc_air, y = tempc_air_alc, color = doy)) +geom_point() +geom_abline(intercept =0, slope =1, linetype ="dashed") +scale_color_gradient2(midpoint =182, low ="orange", mid ="purple3", high ="orange") +facet_wrap(~year) +xlab("SNOTEL air temp (C)") +ylab("In-situ air temp (C)")
Notes:
There is only one apparent period where the in-situ data are clearly wrong: 2020. Date/times are messed up, lack AM/PMs, drives “shotgun” scatter in scatterplot
Generally, diel cycles match surprisingly well. Daily max and min temps are similar between SNOTEL and in-situ data, particularly during summer. Although in some years daily variability is greater for in-situ data.
There is some scatter around 1:1 in the scatter plots…on par with what we see in the Snake.
7.1.3.2 Final check
Check time zones (all should be UTC)
Code
tz(airtemp_wb$datetime)
[1] "UTC"
Code
tz(airtemp_shen$datetime)
[1] "UTC"
Code
tz(airtemp_flat$datetime)
[1] "UTC"
Code
tz(airtemp_shields$datetime)
[1] "UTC"
Code
tz(airtemp_snake$datetime)
[1] "UTC"
Code
tz(airtemp_db$datetime)
[1] "UTC"
Bind basin-specific air temp and show unique subbasins
# A tibble: 3,554,087 × 12
region basin subbasin site_name datetime tempc TempReliability
<chr> <chr> <chr> <chr> <dttm> <dbl> <dbl>
1 Flat Flathead Big Creek Big Cree… 2018-09-10 20:00:00 NA 0
2 Flat Flathead Big Creek Big Cree… 2018-09-10 21:00:00 NA 0
3 Flat Flathead Big Creek Big Cree… 2018-09-10 22:00:00 NA 0
4 Flat Flathead Big Creek Big Cree… 2018-09-10 23:00:00 NA 0
5 Flat Flathead Big Creek Big Cree… 2018-09-11 00:00:00 NA 0
6 Flat Flathead Big Creek Big Cree… 2018-09-11 01:00:00 NA 0
7 Flat Flathead Big Creek Big Cree… 2018-09-11 02:00:00 NA 0
8 Flat Flathead Big Creek Big Cree… 2018-09-11 03:00:00 NA 0
9 Flat Flathead Big Creek Big Cree… 2018-09-11 04:00:00 NA 0
10 Flat Flathead Big Creek Big Cree… 2018-09-11 05:00:00 NA 0
# ℹ 3,554,077 more rows
# ℹ 5 more variables: flow <dbl>, DischargeReliability <dbl>, z_flow <dbl[,1]>,
# source <chr>, tempc_air <dbl>
7.1.5 View data
Notes:
West Brook
Mitchell, Obear, and WB Lower have erroneous water temp readings after ~Aug. 31, 2021
All WB sites have error in F to C conversion (from Aquarius)
Mitchell, Obear, WB0, and WB Lower stream temp signals appear to be shifted forward in time considerably!
Duck Creek
No air temperature data (yet)
Shields River, Spread Creek, and Donner Blitzen: Should compare SNOTEL air temp data against in-situ/EcoD air temp data, where available
Function to fit sine wave regression models to data and grab amplitude, phase, and mean temperature parameters from fitted model objects:
Code
getParams <-function(dtHOUR, minDataLength =20) {# Models for air temperature modelsAir <- dtHOUR %>%group_by(site_name, year, yday) %>%nest() %>%mutate(dataLength =map_dbl(data, ~length(.x$airTemperature))) |>filter(dataLength > minDataLength) |># filter out daily datasets that are too shortmutate(model0 =map(data, curve_fit_air),model =map(model0, 'model'),rSquared =map(model0, 'rSquared'),params =map(model, extract_params) ) %>%unnest(c(params, rSquared)) |>select(-model0, -model, -data) |>mutate(tempVar ="air")# Models for water temperature modelsWater <- dtHOUR %>%group_by(site_name, year, yday) %>%nest() %>%mutate(dataLength =map_dbl(data, ~length(.x$waterTemperature))) |>filter(dataLength > minDataLength) |># filter out daily datasets that are too shortmutate(model0 =map(data, curve_fit_water),model =map(model0, 'model'),rSquared =map(model0, 'rSquared'),params =map(model, extract_params) ) %>%unnest(c(params, rSquared)) |>select(-model0, -model, -data) |>mutate(tempVar ="water")# Collect air and water models models <-bind_rows(modelsAir, modelsWater) |>filter(!is.na(term)) |>#group_by(site_name, year, yday, tempVar) |>select(-std.error, -statistic, -p.value) |># need to lose these columns so there are no unique values remaining in non-widened colspivot_wider(names_from = term, values_from = estimate)# Calculate amplitude and phase for air and water data params <- models |>group_by(site_name, year, yday, tempVar) |>mutate(amplitude =sqrt(A^2+ B^2),phase =ifelse(A <0,12+ (24/ (2* pi)) *atan(B / A), #switched order of A and B per Tim's email 5/17/24 (24/ (2* pi)) *atan(B / A)),mean = C ) |>left_join( dtHOUR |>select(site_name, year, yday) |>distinct(), by =c("site_name", "year", "yday") ) |>ungroup()# Return amplitude and phase parametersreturn(params)}
7.2.4 Get derived parameters
Function to calculate amplitude ratio, phase lag, and mean ratio parameters from sine wave regression models fits to daily paired air-stream temperature data:
Function to predict air or water temperature from fitted models:
Code
getParamsPred <-function(paramsIn) { uniqueValues <- paramsIn |>distinct(site_name, year, yday, tempVar) preds <-crossing(uniqueValues, hour =0:23) %>%left_join(paramsIn, by =c("site_name", "year", "yday", "tempVar")) %>%mutate(hour_rad = hour * (2* pi /24),predTemp = A *sin(hour_rad) + B *cos(hour_rad) + C )return(preds) }
7.3 Perform PASTA
7.3.1 Get model parameters
Perform PASTA and write output to file
Code
pasta <-getParams(dtHOUR = dat2)write_csv(pasta, "C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/EcoDrought-Analysis/Covariates/pasta_daily_parameters.csv")
Read in
Code
pasta <-read_csv("C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/EcoDrought-Analysis/Covariates/pasta_daily_parameters.csv")head(pasta)
# A tibble: 6 × 12
site_name year yday dataLength rSquared tempVar A B C amplitude
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 BigCreek… 2017 262 24 0.758 air -0.599 1.26 4.97 1.40
2 BigCreek… 2017 263 24 0.711 air -0.822 1.62 3.56 1.81
3 BigCreek… 2017 264 24 0.719 air -1.78 2.16 4.53 2.80
4 BigCreek… 2017 265 24 0.561 air -1.08 2.26 4.58 2.51
5 BigCreek… 2017 266 24 0.949 air -1.74 2.49 4.52 3.04
6 BigCreek… 2017 267 24 0.865 air -4.38 3.23 4.02 5.44
# ℹ 2 more variables: phase <dbl>, mean <dbl>
[1] "BigCreekLower"
[2] "BigCreekMiddle"
[3] "BigCreekUpper"
[4] "HallowattCreekLower"
[5] "LangfordCreekLower"
[6] "LangfordCreekUpper"
[7] "NicolaCreek"
[8] "SkookoleelCreek"
[9] "WernerCreek"
[10] "CoalCreekHeadwaters"
[11] "CoalCreekLower"
[12] "CoalCreekMiddle"
[13] "CoalCreekNorth"
[14] "CycloneCreekLower"
[15] "CycloneCreekMiddle"
[16] "CycloneCreekUpper"
[17] "McGeeCreekLower"
[18] "McGeeCreekTrib"
[19] "McGeeCreekUpper"
[20] "Avery Brook"
[21] "Jimmy Brook"
[22] "Mitchell Brook"
[23] "Obear Brook Lower"
[24] "Sanderson Brook"
[25] "West Brook 0"
[26] "West Brook Lower"
[27] "West Brook Reservoir"
[28] "West Brook Upper"
[29] "West Whately Brook"
[30] "Donner Blitzen River nr Frenchglen NWIS"
[31] "Donner Blitzen ab Fish NWIS"
[32] "Donner Blitzen ab Indian NWIS"
[33] "Donner Blitzen nr Burnt Car NWIS"
[34] "Fish Creek NWIS"
[35] "Indian Creek NWIS"
[36] "Little Blizten River NWIS"
[37] "Paine Run 01"
[38] "Paine Run 02"
[39] "Paine Run 06"
[40] "Paine Run 07"
[41] "Paine Run 08"
[42] "Paine Run 10"
[43] "Piney River 10"
[44] "Staunton River 02"
[45] "Staunton River 03"
[46] "Staunton River 06"
[47] "Staunton River 07"
[48] "Staunton River 09"
[49] "Staunton River 10"
[50] "EF Duck Creek ab HF"
[51] "EF Duck Creek be HF"
[52] "Henrys Fork"
[53] "Buck Creek"
[54] "Crandall Creek"
[55] "Deep Creek"
[56] "Dugout Creek"
[57] "Dugout Creek NWIS"
[58] "Lodgepole Creek"
[59] "Shields River Valley Ranch"
[60] "Shields River ab Dugout"
[61] "Shields River ab Smith NWIS"
[62] "Grizzly Creek"
[63] "Grouse Creek"
[64] "Leidy Creek Mouth"
[65] "Leidy Creek Mouth NWIS"
[66] "Leidy Creek Upper"
[67] "NF Spread Creek Lower"
[68] "NF Spread Creek Upper"
[69] "Rock Creek"
[70] "SF Spread Creek Lower"
[71] "SF Spread Creek Lower NWIS"
[72] "SF Spread Creek Upper"
[73] "Spread Creek Dam"
7.4.1 Single site
Generate interactive time series plots of PASTA output for a single focal site
Set focal site
Code
focal_site <-"Leidy Creek Mouth NWIS"focal_site
[1] "Leidy Creek Mouth NWIS"
7.4.1.1 Amplitude ratio
Amplitude ratio: amplitude_water - amplitude_air. Generally, lower amplitude ratios (greater decoupling between air and water temperature sinusoids) are indicative of greater fractional groundwater inputs. (In Rey et al. 2024, Ar values tend to be most similar among tributary/mainstem sites during periods of high surface water input. I.e., high flows homogenize spatial variation in groundwater-surface water dynamics).
pasta_derived %>%filter(site_name == focal_site, rSquared_air >=0.7& rSquared_water >=0.7) %>%ggplot(aes(x = yday, y = amplitudeRatio, color = z_flow)) +geom_point() +ggtitle(focal_site) +geom_smooth(method ="lm", formula = y ~poly(x, 2)) +facet_wrap(~year)
Code
pasta_derived %>%filter(site_name == focal_site, rSquared_air >=0.7& rSquared_water >=0.7) %>%ggplot(aes(x = z_flow, y = amplitudeRatio, color =as.factor(year))) +geom_point(alpha=0.3) +ggtitle(focal_site) +geom_smooth(se =FALSE)
7.4.1.2 Phase lag
Phase lag: phase_water - phase_air. Phase lags are likely a result of diel groundwater temperature variations that are subsequently transferred to water temperatures signals. Generally, phase lag tends to be inversely related to amplitude ratio and may provide a secondary metric of fractional groundwater contributions.
pasta_derived %>%filter(site_name == focal_site, rSquared_air >=0.7& rSquared_water >=0.7) %>%ggplot(aes(x = z_flow, y = phaseLag, color =as.factor(year))) +geom_point(alpha=0.3) +ggtitle(focal_site) +geom_smooth(se =FALSE)
7.4.1.3 Mean ratio
Mean ratio: mean_water / mean_air. On daily timescales, mean ratio values indicate runoff/quickflow vs. local groundwater influence, with higher, more stable values at sites where hillslope drainage is reduced compared to total channel water volume (e.g., lower in network). I.e., across sites at any given time, low values indicate high fractional groundwater input and high values indicate low groundwater input. Within a site across time, gradual rise/change in Mr may be driven by changes in groundwater temperature over time as terrestrial hillslopes accumulate (spring and early summer) and/or lose (late summer and autumn) heat. Thereby indicating that groundwater flowpaths are shallow enough to be sensitive to seasonal warming/cooling.
(In Rey et al. 2024, Mr values tend to be most similar among tributary/mainstem sites during periods of high surface water input. I.e., high flows homogenize spatial variation in groundwater-surface water dynamics).
Note this plot is trimmed to values [0,2] as very unreasonable values can result when daily mean air temperature is < 1.
This simple OLS model shows that phase lag is negatively related to both amplitude ratio and flow. I.e., the intercept of the Pl ~ Ar relationship declines with increasing streamflow.
Call:
lm(formula = phaseLag ~ amplitudeRatio * z_flow, data = pasta_derived %>%
filter(site_name == focal_site, rSquared_air >= 0.7 & rSquared_water >=
0.7))
Residuals:
Min 1Q Median 3Q Max
-2.39573 -0.21486 0.02893 0.24727 2.79277
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.91249 0.10844 17.636 < 2e-16 ***
amplitudeRatio -1.92841 0.26245 -7.348 6.72e-13 ***
z_flow -0.60032 0.09622 -6.239 8.39e-10 ***
amplitudeRatio:z_flow 0.34113 0.25958 1.314 0.189
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5195 on 592 degrees of freedom
Multiple R-squared: 0.4858, Adjusted R-squared: 0.4832
F-statistic: 186.4 on 3 and 592 DF, p-value: < 2.2e-16
7.4.1.5 Model predictions
Plot predicted air and stream temperature (orange and blue lines, respectively) over observed air and stream temperature (orange and blue points, respectively).
Code
dat2 %>%select(site_name, datetime, airTemperature, waterTemperature) %>%filter(site_name == focal_site) %>%left_join(preds) %>%filter(site_name == focal_site, rSquared_air >=0.7& rSquared_water >=0.7) %>%select(-c(site_name, rSquared_air, rSquared_water)) %>%mutate(datetime = datetime) %>%# need to fudge datetime b/c dyGraphs converts to local time zonedygraph(main = focal_site) %>%dyRangeSelector() %>%#dyAxis("y", label = "Water temp.") %>% #dyAxis("y2", label = "Air temp.") %>% dySeries("waterTemperature", axis ="y", label ="Obs. water temp.", drawPoints =TRUE, strokeWidth =0, pointSize =3, color ="dodgerblue") %>%dySeries("airTemperature", axis ="y", label ="Obs. air temp.", drawPoints =TRUE, strokeWidth =0, pointSize =3, color ="darkorange") %>%dySeries("predTemp_water", axis ="y", label ="Pred. water temp.", color ="dodgerblue") %>%dySeries("predTemp_air", axis ="y", label ="Pred. air temp.", color ="darkorange")