Skip to content

Channel Measurements and more#865

Merged
ldecicco-USGS merged 23 commits intoDOI-USGS:developfrom
ldecicco-USGS:develop
Mar 2, 2026
Merged

Channel Measurements and more#865
ldecicco-USGS merged 23 commits intoDOI-USGS:developfrom
ldecicco-USGS:develop

Conversation

@ldecicco-USGS
Copy link
Collaborator

@ldecicco-USGS ldecicco-USGS commented Feb 24, 2026

This PR adds:

  • Improved error handling in WQP functions
  • Added read_waterdata_field_meta, read_waterdata_combine_meta,
    and read_waterdata_channel
  • Removed readNWISgwl and readNWISmeas as services have been turned off

Copy link
Collaborator

@jzemmels jzemmels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look great! I'm excited about the new metadata endpoints and continuous vignette.

I have a few thoughts but didn't run into any major issues, so I'll approve the PR.

#'
#' groundwater <- read_waterdata_field_meta(monitoring_location_id = "USGS-375907091432201")
#'
#' gwl_data <- read_waterdata_field_meta(monitoring_location_id = "USGS-02238500",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does gwl stand for groundwater level? If yes, I don't think the data pulled here are groundwater.

httr2::req_url_path_append("samples-data") |>
httr2::req_url_query(mimeType = "text/csv")

token <- Sys.getenv("API_USGS_PAT")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does samples-data use API tokens too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we're just learning is YES! Was working with a developer, and it seems like the API token can be used throughout the api.gov universe. It's different limits vs the OGC APIs, but still useful.

httr2::req_headers(`Accept-Encoding` = "gzip") |>
httr2::req_timeout(seconds = 180) |>
httr2::req_url_path_append(paste0("v", version)) |>
httr2::req_url_path_append(getOption("dataRetrieval.api_version_stat")) |>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

httr2::req_url_path_append(getOption("dataRetrieval.api_version_stat")) |>
httr2::req_url_path_append(paste0("observation", service))

token <- Sys.getenv("API_USGS_PAT")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to future-proof these functions if they eventually use API request limits? Would this header field be ignored currently?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is: there will probably be a time when both v0 and v1 exist (or more). Right now the user can't pass in the default value in the top level functions, so they couldn't compare the different versions. BUT, by putting it in the package options, there is a mechanism to change the version (without needing to have a "version" argument get passed around all over the place). So if v1 came out tomorrow, users could theoretically run:

options("dataRetrieval.api_version_stat" = "v1")
 x1 <- read_waterdata_stats_por(
    monitoring_location_id = c("USGS-02319394", "USGS-02171500")
  )
Requesting:
https://api.waterdata.usgs.gov/statistics/v1/observationNormals?monitoring_location_id=USGS-02319394&monitoring_location_id=USGS-02171500

Since there is no v1, that results in:

Error in `req_perform()`:
! HTTP 404 Not Found.

so set it back and it works again:

options("dataRetrieval.api_version_stat" = "v0")
 x1 <- read_waterdata_stats_por(
    monitoring_location_id = c("USGS-02319394", "USGS-02171500")
  )

Since updates to versions are usually pretty infrequent...and most users won't know/care 99% of the time, this seems like a great place for options

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if they eventually use API request limits": what I was recently told is that all api.gov endpoints do use a form of limiting requests per token, so this should be good to go. It's just that in the OGC APIs that the limits are low enough that users need a token much sooner than the other services (where the request limits are set by the api.gov rules)


```

There is an increasing amount of continuous data available from the USGS. Continuous data are collected via automated sensors installed at a monitoring location. They are collected at a high frequency and often at a fixed 15-minute interval. Depending on the specific monitoring location, the data may be transmitted automatically via telemetry and be available on WDFN within minutes of collection, while other times the delivery of data may be delayed if the monitoring location does not have the capacity to automatically transmit data. Continuous data are described by parameter name and parameter code (pcode). These data might also be referred to as "instantaneous values" or "IV".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is uv ever used anymore for continuous data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to my knowledge. This text comes directly from the continuous schema:

dataRetrieval:::get_description("continuous")


```

That's all fine and good if everything works perfectly. What if something goes wrong in the middle of the pull? You could put some `tryCatch` statements in the above code, post process out what was missed, and re-request the missing data...OR...consider using a `targets` pipeline to take care of all of that!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the retry package!


```{r}
library(future.apply)
plan(multisession)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Macs can use multicore also due to their Unix-based architecture, which can be more efficient than multisession. Might be worth mentioning here, or linking to the future/related documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so like this?

library(future.apply)
plan(multicore)

or something else? I know when I was trying to write parallel docs for EGRET there were some hiccups in getting the Macs to work right (mostly because I don't have access)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that is what a Mac use could run. I don't know if it's absolutely necessary to address here, but you might link to future documentation, similar to how you've linked to targets documentation.

tar_load(all_data)
```

## Run in parallel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition! This will come in handy for lots of technical users.


## Run in parallel

As mentioned above, if you are running on a fairly standard laptop, feel free to make requests in parallel. However, please don't run queries in parallel on a supercomputer or HPC type environment, your requests will be stopped/killed. There may be techniques to not overwhelm the system, contact comptools@usgs.gov if you need help figuring that out.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe doesn't require clarification here, but can the API automatically detect whether a request originates from a HPC environment? Or is it just assumed that there would be too many simultaneous requests originating from the associated IP address?

I'm honestly not sure what the fix would be here. Maybe adding a randomized Sys.sleep() after each call?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's probably the idea (a randomized Sys.sleep). I'm guessing they've got some automated way to detect if a IP or API token is making >X simultaneous requests and kill it (the text in the vignette comes directly from the api developers).

"readNWISpCode",
"readNWISgwl",
"readNWISmeas",
"readNWISgwl (deprecated)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these still be exported in the NAMESPACE if the associated services are completely offline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they aren't in the NAMESPACE anymore. I'm switching it to "defunct" to match the python docs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I must have messed up loading the package locally because I still saw it in the tab-complete. I can they're gone.

@jzemmels
Copy link
Collaborator

Forgot to add that I ran each of the new functions a couple times to verify the output. I also re-ran the unit tests locally and everything passed.

@ldecicco-USGS ldecicco-USGS merged commit df7edad into DOI-USGS:develop Mar 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants