library(tidyverse)
package 㤼㸱tibble㤼㸲 was built under R version 3.5.2package 㤼㸱tidyr㤼㸲 was built under R version 3.5.2package 㤼㸱dplyr㤼㸲 was built under R version 3.5.2
library(sf)
package 㤼㸱sf㤼㸲 was built under R version 3.5.2
library(dataRetrieval) # WQP web service library
library(XML)
library(leaflet)
library(mapview)
replacing previous import 㤼㸱gdalUtils::gdal_rasterize㤼㸲 by 㤼㸱sf::gdal_rasterize㤼㸲 when loading 㤼㸱mapview㤼㸲
knitr::opts_chunk$set(echo = TRUE, message = FALSE)
NHD_URL <- 'https://hydro.nationalmap.gov/arcgis/services/nhd/MapServer/WMSServer'
find_lake_word <- function(data, search_term, state_abbr = NULL) {
if (!missing(state_abbr)) {
if (!state_abbr %in% state.abb) stop("Use a valid 2-letter state abbreviation")
}
if (missing(state_abbr)) {
filtered_data <- data[apply(data, 1, function(x) any(grepl(search_term, x, ignore.case = TRUE))),]
}
else {
filtered_data <- data[apply(data, 1, function(x) any(grepl(search_term, x, ignore.case = TRUE))) & data$State == state.abb,]
}
return_data <- filtered_data %>%
distinct(lagoslakeid) %>%
inner_join(data, by = "lagoslakeid")
return(return_data)
}
This notebook will show some examples of the complex relationships that can be found in the LAGOS Lake Link (lake identifier crosswalk) as well as examples of problems that could possibly be solved with some more work. For a more detailed introduction to LAGOS Lake Link and a description of the creation process, see the document “LAGOS Lake Link: Creation”
“Lake”: Permanent lake or reservoir.
LAGOS Lake Link is a tabular dataset (a crosswalk table) that can be used to connect one lake-related dataset to another for many common lake datasets. Several lake datasets are in common use on their own or as a base for scientific data products and LAGOS Lake Link is intended to make it easier to combine lake-related data between multiple sources. The table can be searched to find identifiers and location for a particular lake, or it can be used in data join operations to convert identifiers en masse.
data_dictionary <- read_csv("LAGOS_Lake_Link_data_dictionary.csv")
knitr::kable(data_dictionary, "html")
<table>
<thead>
<tr>
<th style="text-align:left;"> Column Name </th>
<th style="text-align:left;"> Column Type </th>
<th style="text-align:left;"> Is Nullable </th>
<th style="text-align:left;"> Definition </th>
<th style="text-align:left;"> Source Dataset </th>
<th style="text-align:left;"> Source Column Name </th>
<th style="text-align:left;"> Example </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> lagoslakeid </td>
<td style="text-align:left;"> int </td>
<td style="text-align:left;"> N </td>
<td style="text-align:left;"> Unique lake identifier developed for LAGOS (US). </td>
<td style="text-align:left;"> LAGOS-US </td>
<td style="text-align:left;"> lagoslakeid </td>
<td style="text-align:left;"> 6340 </td>
</tr>
<tr>
<td style="text-align:left;"> nhdhr_permanentidentifier </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> N </td>
<td style="text-align:left;"> 40-char GUID value that uniquely identifies the occurrence of each feature in The National Map. National Database primary key </td>
<td style="text-align:left;"> NHD-HR </td>
<td style="text-align:left;"> Permanent_Identifier </td>
<td style="text-align:left;"> {9D15C911-618A-4921-9352-18250F36ED65} </td>
</tr>
<tr>
<td style="text-align:left;"> nhdhr_reachcode </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Unique identifier. The first eight digits are the WBD_HUC8. 愼㸰The next six digits are randomly assigned, sequential numbers that are unique within a HUC8. </td>
<td style="text-align:left;"> NHD-HR </td>
<td style="text-align:left;"> ReachCode </td>
<td style="text-align:left;"> 4090000000000 </td>
</tr>
<tr>
<td style="text-align:left;"> nhdhr_areasqkm </td>
<td style="text-align:left;"> decimal </td>
<td style="text-align:left;"> N </td>
<td style="text-align:left;"> Area of areal feature based on Albers Equal Area. </td>
<td style="text-align:left;"> NHD-HR </td>
<td style="text-align:left;"> AreaSqKm </td>
<td style="text-align:left;"> 1.742 </td>
</tr>
<tr>
<td style="text-align:left;"> nhdhr_fdate </td>
<td style="text-align:left;"> datetime </td>
<td style="text-align:left;"> N </td>
<td style="text-align:left;"> Date of last feature modification in National Hydrography Dataset. </td>
<td style="text-align:left;"> NHD-HR </td>
<td style="text-align:left;"> Fdate </td>
<td style="text-align:left;"> 3/16/2004 1:37:00 PM </td>
</tr>
<tr>
<td style="text-align:left;"> gnis_id </td>
<td style="text-align:left;"> int </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Unique identifier assigned by GNIS. </td>
<td style="text-align:left;"> NHD-HR </td>
<td style="text-align:left;"> GNIS_ID </td>
<td style="text-align:left;"> 1616968 </td>
</tr>
<tr>
<td style="text-align:left;"> gnis_name </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Proper name, specific term, or expression by which a particular geographic entity is known. </td>
<td style="text-align:left;"> NHD-HR </td>
<td style="text-align:left;"> GNIS_Name </td>
<td style="text-align:left;"> Zukey Lake </td>
</tr>
<tr>
<td style="text-align:left;"> lagos_lakename </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Alternate (GNIS) names for the lake, separated by semi-colons. </td>
<td style="text-align:left;"> GNIS </td>
<td style="text-align:left;"> GNIS_Name </td>
<td style="text-align:left;"> Zukey Lake; Strawberry Lake </td>
</tr>
<tr>
<td style="text-align:left;"> lagos_countyname </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Current name and the translated legal/statistical area description code for county. </td>
<td style="text-align:left;"> TIGER County </td>
<td style="text-align:left;"> NAMELSAD </td>
<td style="text-align:left;"> Ingham County </td>
</tr>
<tr>
<td style="text-align:left;"> lagos_countyfips </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> 5-digit concatenation of current state FIPS code and county FIPS code. </td>
<td style="text-align:left;"> TIGER County </td>
<td style="text-align:left;"> GEOID </td>
<td style="text-align:left;"> 26065 </td>
</tr>
<tr>
<td style="text-align:left;"> lagos_lakelatitude </td>
<td style="text-align:left;"> decimal </td>
<td style="text-align:left;"> N </td>
<td style="text-align:left;"> The latitude of the lake centroid (NAD83). </td>
<td style="text-align:left;"> LAGOS-US </td>
<td style="text-align:left;"> LatitudeNAD83 </td>
<td style="text-align:left;"> 42.45195437 </td>
</tr>
<tr>
<td style="text-align:left;"> lagos_lakelongitude </td>
<td style="text-align:left;"> decimal </td>
<td style="text-align:left;"> N </td>
<td style="text-align:left;"> The longitude of the lake centroid (NAD83). </td>
<td style="text-align:left;"> LAGOS-US </td>
<td style="text-align:left;"> LongitudeNAD83 </td>
<td style="text-align:left;"> -83.84438734 </td>
</tr>
<tr>
<td style="text-align:left;"> lagos_state </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> N </td>
<td style="text-align:left;"> 2-character state name abbreviation. Border lakes are assigned to a a single state. </td>
<td style="text-align:left;"> TIGER State </td>
<td style="text-align:left;"> STATE </td>
<td style="text-align:left;"> MI </td>
</tr>
<tr>
<td style="text-align:left;"> wqp_monitoringlocationidentifier </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> A designator used to describe the unique name, number, or code assigned to identify the monitoring location. </td>
<td style="text-align:left;"> WQP </td>
<td style="text-align:left;"> MonitoringLocationIdentifier </td>
<td style="text-align:left;"> 21MICH-470570 </td>
</tr>
<tr>
<td style="text-align:left;"> wqp_monitoringlocationname </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> The designator specified by the sampling organization for the site at which sampling or other activities are conducted. </td>
<td style="text-align:left;"> WQP </td>
<td style="text-align:left;"> MonitoringLocationName </td>
<td style="text-align:left;"> STRAWBERRY LAKE NORTHWEST BASIN, HAMBURG TOWNSHIP, SECTION 27 </td>
</tr>
<tr>
<td style="text-align:left;"> wqp_providername </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> The source system that provided data to the Water Quality Portal (NWIS, STORET, STEWARDS, etc). </td>
<td style="text-align:left;"> WQP </td>
<td style="text-align:left;"> ProviderName </td>
<td style="text-align:left;"> STORET </td>
</tr>
<tr>
<td style="text-align:left;"> nhdplusv2_comid </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Common identifier of the NHD Waterbody feature. </td>
<td style="text-align:left;"> NHDPlusv2 </td>
<td style="text-align:left;"> COMID </td>
<td style="text-align:left;"> 13174565 </td>
</tr>
<tr>
<td style="text-align:left;"> nhdplusv2_reachcode </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Reach Code assigned to feature. </td>
<td style="text-align:left;"> NHDPlusv2 </td>
<td style="text-align:left;"> REACHCODE </td>
<td style="text-align:left;"> 4090000000000 </td>
</tr>
<tr>
<td style="text-align:left;"> nhdplusv2_areasqkm </td>
<td style="text-align:left;"> decimal </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Feature area in square kilometers. </td>
<td style="text-align:left;"> NHDPlusV2 </td>
<td style="text-align:left;"> AreaSqKm </td>
<td style="text-align:left;"> 1.593 </td>
</tr>
<tr>
<td style="text-align:left;"> lagosne_lagoslakeid </td>
<td style="text-align:left;"> int </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Unique lake identifier developed for LAGOS (NE). </td>
<td style="text-align:left;"> LAGOS-NE </td>
<td style="text-align:left;"> lagoslakeid </td>
<td style="text-align:left;"> 6340 </td>
</tr>
<tr>
<td style="text-align:left;"> lagosne_legacyid </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Unique lake identifier from the original LAGOS-NE water quality source dataset. This is not standardized and each source dataset has its own system. </td>
<td style="text-align:left;"> LAGOS-NE </td>
<td style="text-align:left;"> legacyid </td>
<td style="text-align:left;"> 210426X </td>
</tr>
<tr>
<td style="text-align:left;"> nla2007_siteid </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Lake site identifier from National Lakes Assessment 2007 survey </td>
<td style="text-align:left;"> NLA 2012 </td>
<td style="text-align:left;"> SITEID_07 </td>
<td style="text-align:left;"> NLA06608-2187 </td>
</tr>
<tr>
<td style="text-align:left;"> nla2012_siteid </td>
<td style="text-align:left;"> char </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Lake site identifier from National Lakes Assessment 2012 survey </td>
<td style="text-align:left;"> NLA 2012 </td>
<td style="text-align:left;"> SITE_ID </td>
<td style="text-align:left;"> NLA12_AL-101 </td>
</tr>
<tr>
<td style="text-align:left;"> count_wqp_per_lagos_id </td>
<td style="text-align:left;"> int </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Count of WQP sites within the LAGOS-US lake. </td>
<td style="text-align:left;"> LAGOS-US </td>
<td style="text-align:left;"> count_wqp_per_lagos_id </td>
<td style="text-align:left;"> 11 </td>
</tr>
<tr>
<td style="text-align:left;"> count_nhdplusv2_per_lagos_id </td>
<td style="text-align:left;"> int </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Count of NHDPlusV2 polygons matching the LAGOS-US lagoslakeid. </td>
<td style="text-align:left;"> LAGOS-US </td>
<td style="text-align:left;"> count_nhdplusv2_per_lagos_id </td>
<td style="text-align:left;"> 1 </td>
</tr>
<tr>
<td style="text-align:left;"> count_lagos_per_nhdplusv2_id </td>
<td style="text-align:left;"> int </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Count of LAGOS-US polygons matching the NHDPlusV2 COMID. </td>
<td style="text-align:left;"> LAGOS-US </td>
<td style="text-align:left;"> count_lagos_per_nhdplusv2_id </td>
<td style="text-align:left;"> 2 </td>
</tr>
<tr>
<td style="text-align:left;"> count_lagosne_per_lagos_id </td>
<td style="text-align:left;"> int </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Count of LAGOS-NE lagoslakeids matching the LAGOS-US lagoslakeid. </td>
<td style="text-align:left;"> LAGOS-US </td>
<td style="text-align:left;"> count_lagosNE_per_lagos_id </td>
<td style="text-align:left;"> 1 </td>
</tr>
<tr>
<td style="text-align:left;"> count_lagos_per_lagosne_id </td>
<td style="text-align:left;"> int </td>
<td style="text-align:left;"> Y </td>
<td style="text-align:left;"> Count of LAGOS-US lagoslakeids matching the LAGOS-NE lagoslakeid. </td>
<td style="text-align:left;"> LAGOS-US </td>
<td style="text-align:left;"> count_lagos_per_lagosNE_id </td>
<td style="text-align:left;"> 2 </td>
</tr>
</tbody>
</table>
Preview of the crosswalk table. As a reminder, there are 479950 LAGOS-US lakes.
glimpse(xwalk, width = 100)
Observations: 594,622
Variables: 28
$ lagoslakeid [3m[38;5;246m<int>[39m[23m 31198, 31198, 31198, 31198, 31198, 31198, 122, 122, 12...
$ nhdhr_permanentidentifier [3m[38;5;246m<chr>[39m[23m "120020498", "120020498", "120020498", "120020498", "1...
$ nhdhr_reachcode [3m[38;5;246m<chr>[39m[23m "04010201003330", "04010201003330", "04010201003330", ...
$ nhdhr_areasqkm [3m[38;5;246m<dbl>[39m[23m 2.745000, 2.745000, 2.745000, 2.745000, 2.745000, 2.74...
$ nhdhr_fdate [3m[38;5;246m<dttm>[39m[23m 2004-03-16 18:37:00, 2004-03-16 18:37:00, 2004-03-16 ...
$ gnis_id [3m[38;5;246m<int>[39m[23m 649289, 649289, 649289, 649289, 649289, 649289, 653169...
$ gnis_name [3m[38;5;246m<chr>[39m[23m "Perch Lake", "Perch Lake", "Perch Lake", "Perch Lake"...
$ lagos_lakename [3m[38;5;246m<chr>[39m[23m "Perch Lake", "Perch Lake", "Perch Lake", "Perch Lake"...
$ lagos_countyname [3m[38;5;246m<chr>[39m[23m "Carlton County", "Carlton County", "Carlton County", ...
$ lagos_countyfips [3m[38;5;246m<int>[39m[23m 27017, 27017, 27017, 27017, 27017, 27017, 27017, 27017...
$ lagos_lakelatitude [3m[38;5;246m<dbl>[39m[23m 46.68943, 46.68943, 46.68943, 46.68943, 46.68943, 46.6...
$ lagos_lakelongitude [3m[38;5;246m<dbl>[39m[23m -92.67066, -92.67066, -92.67066, -92.67066, -92.67066,...
$ lagos_state [3m[38;5;246m<chr>[39m[23m "MN", "MN", "MN", "MN", "MN", "MN", "MN", "MN", "MN", ...
$ wqp_monitoringlocationidentifier [3m[38;5;246m<chr>[39m[23m "FONDULAC-114A", "FONDULAC-114B", "FONDULAC_WQX-114A",...
$ wqp_monitoringlocationname [3m[38;5;246m<chr>[39m[23m "Perch Lake (North Basin)", "Perch Lake (South Basin)"...
$ wqp_providername [3m[38;5;246m<chr>[39m[23m "STORET", "STORET", "STORET", "STORET", "STORET", "STO...
$ nhdplusv2_comid [3m[38;5;246m<chr>[39m[23m "1776072", "1776072", "1776072", "1776072", "1776072",...
$ nhdplusv2_reachcode [3m[38;5;246m<chr>[39m[23m "04010201003330", "04010201003330", "04010201003330", ...
$ nhdplusv2_areasqkm [3m[38;5;246m<dbl>[39m[23m 2.720, 2.720, 2.720, 2.720, 2.720, 2.720, 1.348, 1.348...
$ lagosne_lagoslakeid [3m[38;5;246m<int>[39m[23m 31198, 31198, 31198, 31198, 31198, 31198, 122, 122, 12...
$ lagosne_legacyid [3m[38;5;246m<chr>[39m[23m "09-0036-00-201", "09-0036-00-201", "09-0036-00-201", ...
$ nla2007_siteid [3m[38;5;246m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
$ nla2012_siteid [3m[38;5;246m<chr>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
$ count_wqp_per_lagos_id [3m[38;5;246m<int>[39m[23m 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 0, 0, 0, 0, 4, ...
$ count_nhdplusv2_per_lagos_id [3m[38;5;246m<int>[39m[23m 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 0, 1, 2, ...
$ count_lagos_per_nhdplusv2_id [3m[38;5;246m<int>[39m[23m 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, 1, 1,...
$ count_lagosne_per_lagos_id [3m[38;5;246m<int>[39m[23m 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ count_lagos_per_lagosne_id [3m[38;5;246m<int>[39m[23m 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
A large proportion of lakes connect easily and don’t split or merge polygons over time. The crosswalk has all lakes greater than or equal to 1 hectare, but some of those are too small to be in the NHDPlusv2, so let’s only check lakes greater than or equal to 4 hectares.
lagos_lake_count_4ha <- xwalk %>%
filter(nhdhr_areasqkm >= 0.04) %>%
distinct(lagoslakeid) %>%
nrow()
count_easy <- xwalk %>%
filter(nhdhr_areasqkm >= 0.04) %>%
filter(count_lagos_per_lagosne_id <= 1 & # LAGOS-NE only has partial coverage so 0 is just fine
count_lagos_per_nhdplusv2_id == 1 &
count_lagosne_per_lagos_id == 1 &
count_nhdplusv2_per_lagos_id == 1) %>%
distinct(lagoslakeid) %>%
nrow()
pct1 <- round(100*count_easy/lagos_lake_count_4ha, 0)
30% of lakes are very simple! They may have multiple WQP sites, which is a bit more intuitive than the other 1: many relationships.
Here is an example lake. Bask in its comforting consistency, because things are about to get more complex.
xwalk %>% filter(lagoslakeid == 1)
Equally simple, a large proportion of lakes have no matches for anything. Here is an example:
xwalk %>% filter(lagoslakeid == 256832)
(n_lagos_wqp <- xwalk %>%
mutate(Number_of_wqp_sites_per_lagos_lake = case_when(count_wqp_per_lagos_id == 0 ~ '0', count_wqp_per_lagos_id == 1 ~ '1', count_wqp_per_lagos_id > 1 ~ '>1')) %>%
distinct(Number_of_wqp_sites_per_lagos_lake, lagoslakeid) %>%
count(Number_of_wqp_sites_per_lagos_lake)
)
Out of the lakes that have a WQP sampling site, about half of them have more than one site within. In total, 29475 LAGOS lakes have WQP sites linked. The initial release of LAGOS-US will increase the number of links available, due to the inclusion of a manual linking process designed to review sampling location information.
This is what a user will see in the crosswalk that should indicate to them that multiple WQP sites are found within a lake. This lake has 10 sites. The lake information is duplicated for every site (i.e., the crosswalk table is not normalized).
(ex1 <- xwalk %>% filter(lagoslakeid == 3476))
If the lake also joins to multiple other lakes in either the NHDPlusv2 or LAGOS-NE, the user will see something like this. This site has 7 WQP sites and 2 NHDPlusv2 polygons for a total of 14 rows in the crosswalk. It’s not possible (currently) to tell which WQP sites go with which NHDPlusv2 polygon. Now both lake and WQP information are duplicated.
(ex2 <- xwalk %>% filter(lagoslakeid == 4069))
load('./rdata/wqp_sf.RData')
wqp_no_link <- wqp_sf %>%
anti_join(distinct(xwalk, wqp_monitoringlocationidentifier), by = c("MonitoringLocationIdentifier" = "wqp_monitoringlocationidentifier"))
28786 out of 131005 sites did not link (22%).
As you can see, Great Lakes sites don’t always have the “Great Lake” designation in the WQP and we are fine with these sites not linking.
wqp_no_link %>%
filter(startsWith(HUCEightDigitCode,'0406')) %>%
select(MonitoringLocationIdentifier, MonitoringLocationName, MonitoringLocationDescriptionText, MonitoringLocationTypeName) %>%
mapview()
I would like to ask the limnologists: What do you make of these samples clustered around large lakes? The point is on the intended site as you can tell by looking at where the indicated township section would be. What are they sampling here? Should it be linked along with the other lake water quality samples?
We could improve some of the WQP matches by doing a “closest” join and enforcing a requirement for shared words.
NHDPlusv2 within LAGOS Lake:
(count_lagos_nhd <- xwalk %>%
mutate(number_of_nhdplusv2_per_lagos_lake = case_when(count_nhdplusv2_per_lagos_id == 0 ~ '0', count_nhdplusv2_per_lagos_id == 1 ~ '1', count_nhdplusv2_per_lagos_id > 1 ~ '>1')) %>%
distinct(number_of_nhdplusv2_per_lagos_lake, lagoslakeid) %>%
count(number_of_nhdplusv2_per_lagos_lake)
)
And vice versa:
(count_nhd_lagos <- xwalk %>%
mutate(Number_of_LAGOS_Lakes_per_NHDPlusv2 = case_when(count_lagos_per_nhdplusv2_id == 0 ~ '0', count_lagos_per_nhdplusv2_id == 1 ~ '1', count_lagos_per_nhdplusv2_id > 1 ~ '>1')) %>%
distinct(Number_of_LAGOS_Lakes_per_NHDPlusv2, lagoslakeid) %>%
count(Number_of_LAGOS_Lakes_per_NHDPlusv2) %>%
filter(!is.na(Number_of_LAGOS_Lakes_per_NHDPlusv2))
)
Note that “0” doesn’t appear in the table because any NHDPlusv2 lake that appears in the table must be linked to at least one LAGOS-US lake, by definition of the table. It seems it’s more common for NHDPlusv2 lakes to split into multiple LAGOS-US lakes than the other way around, but both occur. Occurs rarely: less than 1% of lakes.
A large proportion of LAGOS lakes have no NHDPlusv2 lake connection, but I don’t actually expect that the smallest lakes would be found in the NHD medium-resolution. Check it out using a size cutoff of 10 hectares, which is more than large enough to appear in both.
(count_lagos_nhd_4ha <- xwalk %>%
filter(nhdhr_areasqkm >= 0.1) %>%
mutate(Number_of_NHDPlusv2_per_LAGOS_Lake_over10ha = case_when(count_nhdplusv2_per_lagos_id == 0 ~ '0', count_nhdplusv2_per_lagos_id == 1 ~ '1', count_nhdplusv2_per_lagos_id > 1 ~ '>1')) %>%
distinct(Number_of_NHDPlusv2_per_LAGOS_Lake_over10ha, lagoslakeid) %>%
count(Number_of_NHDPlusv2_per_LAGOS_Lake_over10ha)
)
Still, over 1/5 of 10-hectare lakes have no connection. We decreased the size of this gap by about 5 percentage points (nearly 20,000 lakes) by including a spatial join in our methodology. However, these two NHD products include features mapped at very different times, and visual inspection of the remaining unlinked features reveals several common explanations.
The figures are slightly better when we look only at all lakes with a WQP site.
xwalk %>%
filter(count_wqp_per_lagos_id > 0) %>%
mutate(Number_of_NHDPlusv2_per_LAGOS_Lake_withWQP = case_when(count_nhdplusv2_per_lagos_id == 0 ~ '0', count_nhdplusv2_per_lagos_id == 1 ~ '1', count_nhdplusv2_per_lagos_id > 1 ~ '>1')) %>%
distinct(Number_of_NHDPlusv2_per_LAGOS_Lake_withWQP, lagoslakeid) %>%
count(Number_of_NHDPlusv2_per_LAGOS_Lake_withWQP) %>%
mutate(pct = round(100*n/sum(n), 2))
This is what a user will see in the crosswalk that should indicate to them that multiple NHDPlusv2 lakes have been condensed to a single LAGOS-US lake. This lake has 4 NHDPlusv2 connections and 2 WQP sites, so 8 rows appear.
xwalk %>% filter(count_nhdplusv2_per_lagos_id > 3 & count_nhdplusv2_per_lagos_id <5) %>% sample_n(10)
xwalk %>% filter(lagoslakeid == 2107)
This is what a user will see in the crosswalk that should indicate to them that 1 NHDPlusv2 lake has been split into multiple LAGOS-US lakes. These 4 lagos lakes used to be 1 NHDPlusv2 lake. 1 of them has 4 WQP sites, while the others have none, resulting in 7 rows appearing. Searching by lagoslakeid alone will not indicate the related lakes, of course.
xwalk %>% filter(nhdplusv2_comid == '18469416')
The relationships seen here bear a strong resemblence to those seen with the NHDPlusv2, above. This table compares only lakes for the 17-state LAGOS-NE region.
lagos_ne_states <- c('ME', 'NH', 'VT', 'CT', 'RI', 'NY', 'PA', 'NY', 'OH', 'NJ', 'IN', 'IL', 'MI', 'WI', 'IA', 'MN', 'MO')
(count_ne_per_us <- xwalk %>%
filter(lagos_state %in% lagos_ne_states) %>%
mutate(Number_of_NE_per_US_Lake = case_when(count_lagosne_per_lagos_id == 0 ~ '0', count_lagosne_per_lagos_id == 1 ~ '1', count_lagosne_per_lagos_id > 1 ~ '>1')) %>%
distinct(Number_of_NE_per_US_Lake, lagoslakeid) %>%
count(Number_of_NE_per_US_Lake)
)
And vice versa:
(count_us_per_ne <- xwalk %>%
filter(lagos_state %in% lagos_ne_states) %>%
mutate(Number_of_US_per_NE_Lake = case_when(count_lagos_per_lagosne_id == 0 ~ '0', count_lagos_per_lagosne_id == 1 ~ '1', count_lagos_per_lagosne_id > 1 ~ '>1')) %>%
distinct(Number_of_US_per_NE_Lake, lagoslakeid) %>%
count(Number_of_US_per_NE_Lake) %>%
filter(!is.na(Number_of_US_per_NE_Lake))
)
Again, “0” does not appear on the left of this table, by definition.
Here’s what happens when not only is a LAGOS-US lake associated with multiple LAGOS-NE lake polygons, but also when it has multiples of everything. This lake has 2 WQP sites, was once 4 LAGOS-NE lakes, and is associated with 5 NHDPlusv2 lakes (3 through ReachCode, and 2 via spatial join). It’s not possibly to untangle which sample sites are where or how the LAGOS-NE lakes might relate to the NHDPlusv2 lakes, currently. 40 rows appear for this lake.
xwalk %>% filter(lagoslakeid == 268427)
It is possible to use names to search through all available information in the crosswalk if you build the right sort of function. I have done so in the setup code at the top of this document. Here are the results for the search term ‘Cochituate’. This is a chain of lakes known as Lake Cochituate. Even though the NHD doesn’t identify all of them by name, the linked WQP sites indicate the association.
find_lake_word(xwalk, 'Cochituate')
This chunk identifies lakes with a missing NHDPlusv2 connection.
missing_nhdplusV2 <- xwalk %>%
filter(nhdhr_areasqkm >= 0.1) %>%
filter(count_nhdplusv2_per_lagos_id < 1) %>%
filter(lagoslakeid < 5000) %>%
select(-wqp_monitoringlocationidentifier, -wqp_monitoringlocationname, -wqp_providername) %>%
distinct()
And with the help of ArcMap to find equivalent lakes visually, this bit of code helps search for the cause of the missing link.
load('./rdata/nhd_xref.RData')
# slow but effective
search_nhd_xref <- function(codes, depth = 1) {
list_results <- lapply(codes, function(x) filter(nhd_xref, OldReachCode == x | NewReachCode == x))
df1 <- do.call("rbind", list_results)
if (depth == 2) {
codes = c(pull(df1, OldReachCode), pull(df1, NewReachCode))
list_results2 <- lapply(codes, function(x) filter(nhd_xref, OldReachCode == x | NewReachCode == x))
df2 <- do.call("rbind", list_results)
} else {
df2 <- df1
}
return(df2)
}
(result <- search_nhd_xref(c('04050001011691','04050001016492'), depth = 2) %>%
mutate(OldReachCode = as.character(OldReachCode), NewReachCode = as.character(NewReachCode)))
OldReachCode OldReachDate NewReachCode NewReachDate
1 <NA> <NA> 04050001011691 2007-03-23
# need to have nhd_plus_orig object loaded
#nhd_plus_orig %>% filter(REACHCODE %in% pull(result, OldReachCode))