Channel Measurements and more#865
Conversation
… we still want to convert those types.
…take these functions out.
jzemmels
left a comment
There was a problem hiding this comment.
These changes look great! I'm excited about the new metadata endpoints and continuous vignette.
I have a few thoughts but didn't run into any major issues, so I'll approve the PR.
| #' | ||
| #' groundwater <- read_waterdata_field_meta(monitoring_location_id = "USGS-375907091432201") | ||
| #' | ||
| #' gwl_data <- read_waterdata_field_meta(monitoring_location_id = "USGS-02238500", |
There was a problem hiding this comment.
Does gwl stand for groundwater level? If yes, I don't think the data pulled here are groundwater.
| httr2::req_url_path_append("samples-data") |> | ||
| httr2::req_url_query(mimeType = "text/csv") | ||
|
|
||
| token <- Sys.getenv("API_USGS_PAT") |
There was a problem hiding this comment.
Does samples-data use API tokens too?
There was a problem hiding this comment.
What we're just learning is YES! Was working with a developer, and it seems like the API token can be used throughout the api.gov universe. It's different limits vs the OGC APIs, but still useful.
| httr2::req_headers(`Accept-Encoding` = "gzip") |> | ||
| httr2::req_timeout(seconds = 180) |> | ||
| httr2::req_url_path_append(paste0("v", version)) |> | ||
| httr2::req_url_path_append(getOption("dataRetrieval.api_version_stat")) |> |
| httr2::req_url_path_append(getOption("dataRetrieval.api_version_stat")) |> | ||
| httr2::req_url_path_append(paste0("observation", service)) | ||
|
|
||
| token <- Sys.getenv("API_USGS_PAT") |
There was a problem hiding this comment.
Is this to future-proof these functions if they eventually use API request limits? Would this header field be ignored currently?
There was a problem hiding this comment.
The idea is: there will probably be a time when both v0 and v1 exist (or more). Right now the user can't pass in the default value in the top level functions, so they couldn't compare the different versions. BUT, by putting it in the package options, there is a mechanism to change the version (without needing to have a "version" argument get passed around all over the place). So if v1 came out tomorrow, users could theoretically run:
options("dataRetrieval.api_version_stat" = "v1")
x1 <- read_waterdata_stats_por(
monitoring_location_id = c("USGS-02319394", "USGS-02171500")
)
Requesting:
https://api.waterdata.usgs.gov/statistics/v1/observationNormals?monitoring_location_id=USGS-02319394&monitoring_location_id=USGS-02171500Since there is no v1, that results in:
Error in `req_perform()`:
! HTTP 404 Not Found.so set it back and it works again:
options("dataRetrieval.api_version_stat" = "v0")
x1 <- read_waterdata_stats_por(
monitoring_location_id = c("USGS-02319394", "USGS-02171500")
)Since updates to versions are usually pretty infrequent...and most users won't know/care 99% of the time, this seems like a great place for options
There was a problem hiding this comment.
"if they eventually use API request limits": what I was recently told is that all api.gov endpoints do use a form of limiting requests per token, so this should be good to go. It's just that in the OGC APIs that the limits are low enough that users need a token much sooner than the other services (where the request limits are set by the api.gov rules)
|
|
||
| ``` | ||
|
|
||
| There is an increasing amount of continuous data available from the USGS. Continuous data are collected via automated sensors installed at a monitoring location. They are collected at a high frequency and often at a fixed 15-minute interval. Depending on the specific monitoring location, the data may be transmitted automatically via telemetry and be available on WDFN within minutes of collection, while other times the delivery of data may be delayed if the monitoring location does not have the capacity to automatically transmit data. Continuous data are described by parameter name and parameter code (pcode). These data might also be referred to as "instantaneous values" or "IV". |
There was a problem hiding this comment.
Is uv ever used anymore for continuous data?
There was a problem hiding this comment.
not to my knowledge. This text comes directly from the continuous schema:
dataRetrieval:::get_description("continuous")|
|
||
| ``` | ||
|
|
||
| That's all fine and good if everything works perfectly. What if something goes wrong in the middle of the pull? You could put some `tryCatch` statements in the above code, post process out what was missed, and re-request the missing data...OR...consider using a `targets` pipeline to take care of all of that! |
|
|
||
| ```{r} | ||
| library(future.apply) | ||
| plan(multisession) |
There was a problem hiding this comment.
I think Macs can use multicore also due to their Unix-based architecture, which can be more efficient than multisession. Might be worth mentioning here, or linking to the future/related documentation.
There was a problem hiding this comment.
so like this?
library(future.apply)
plan(multicore)or something else? I know when I was trying to write parallel docs for EGRET there were some hiccups in getting the Macs to work right (mostly because I don't have access)
There was a problem hiding this comment.
Yep, that is what a Mac use could run. I don't know if it's absolutely necessary to address here, but you might link to future documentation, similar to how you've linked to targets documentation.
| tar_load(all_data) | ||
| ``` | ||
|
|
||
| ## Run in parallel |
There was a problem hiding this comment.
Great addition! This will come in handy for lots of technical users.
|
|
||
| ## Run in parallel | ||
|
|
||
| As mentioned above, if you are running on a fairly standard laptop, feel free to make requests in parallel. However, please don't run queries in parallel on a supercomputer or HPC type environment, your requests will be stopped/killed. There may be techniques to not overwhelm the system, contact comptools@usgs.gov if you need help figuring that out. |
There was a problem hiding this comment.
This maybe doesn't require clarification here, but can the API automatically detect whether a request originates from a HPC environment? Or is it just assumed that there would be too many simultaneous requests originating from the associated IP address?
I'm honestly not sure what the fix would be here. Maybe adding a randomized Sys.sleep() after each call?
There was a problem hiding this comment.
I think that's probably the idea (a randomized Sys.sleep). I'm guessing they've got some automated way to detect if a IP or API token is making >X simultaneous requests and kill it (the text in the vignette comes directly from the api developers).
vignettes/Status.Rmd
Outdated
| "readNWISpCode", | ||
| "readNWISgwl", | ||
| "readNWISmeas", | ||
| "readNWISgwl (deprecated)", |
There was a problem hiding this comment.
Should these still be exported in the NAMESPACE if the associated services are completely offline?
There was a problem hiding this comment.
they aren't in the NAMESPACE anymore. I'm switching it to "defunct" to match the python docs.
There was a problem hiding this comment.
Okay, I must have messed up loading the package locally because I still saw it in the tab-complete. I can they're gone.
|
Forgot to add that I ran each of the new functions a couple times to verify the output. I also re-ran the unit tests locally and everything passed. |
This PR adds:
and read_waterdata_channel