Package 'appc'

Title: Air Pollution Predictor Commons
Description: Functions for geomarker assessment for s2 locations and dates. These are used to train and predict daily ambient air pollution concentrations across the contiguous US January 2017 - October 2025.
Authors: Cole Brokamp [aut, cre] (ORCID: <https://orcid.org/0000-0002-0289-3151>), Erika Manning [aut], Qing Duan [aut]
Maintainer: Cole Brokamp <[email protected]>
License: MIT + file LICENSE
Version: 1.0.2
Built: 2026-05-26 06:18:40 UTC
Source: https://github.com/geomarker-io/appc

Help Index


delete all installed data files in the user's data directory for the appc package

Description

delete all installed data files in the user's data directory for the appc package

Usage

appc_clean_data_directory()

Assemble a tibble of required predictors for the exposure assessment model

Description

Assemble a tibble of required predictors for the exposure assessment model

Usage

assemble_predictors(x, dates, pollutant = c("pm25"))

Arguments

x

a vector of s2 cell identifers (s2_cell object); currently required to be within the contiguous united states

dates

a list of date vectors for the predictions, must be the same length as x

pollutant

ignored now, but reserved for future sets of predictors specific to different pollutants

Value

a tibble with one row for each unique s2 location - date combination where columns are predictors required for the exposure assessment model

Examples

d <- list(
  "8841b39a7c46e25f" = as.Date(c("2023-05-18", "2023-11-06")),
  "8841a45555555555" = as.Date(c("2023-06-22", "2023-08-15", "2023-09-30"))
)
assemble_predictors(x = s2::as_s2_cell(names(d)), dates = d) |>
  tibble::glimpse()

Get the closest years to a vector of dates

Description

The time between a date and year is calculated using July 1st of the year.

Usage

get_closest_year(x, years = as.character(1800:2400))

Arguments

x

a date vector

years

vector of characters (or numerics) representing years to choose from

Value

a character vector of the closest year in years for each date in x

Examples

get_closest_year(x = as.Date(c("2021-05-15", "2022-09-01")), years = c("2020", "2022"))

Get daily AQS concentrations

Description

Pre-generated daily summary files are downloaded from the EPA AQS website and filtered/harmonized as described in the Details.

Usage

get_daily_aqs(
  pollutant = c("pm25", "ozone", "no2"),
  year = as.character(2017:2025)
)

Arguments

pollutant

one of "pm25", "ozone", or "no2"

year

calendar year

Details

For PM2.5 (FRM, non-FRM, and speciation), data is filtered to only observations with a sample duration of "24 HOURS". All pollutants measurements are removed if the observation percent for the sampling period is less than 75. When a pollutant is measured by more than one device on the same day at the same s2 location, the average measurement is returned, ensuring unique measurements for each pollutant-location-day

Note: Historical measurements are subject to change and the EPA AQS website only stores the latest versions. Since this function always downloads the latest data from EPA AQS, that means that it will could different results depending on the date it was run. Similarly, the most recent year might not contain measurements for the entire calendar year.

Get all the files on the page and the date they were last updated: readr::read_csv("https://aqs.epa.gov/aqsweb/airdata/file_list.csv")

Value

data.frame/tibble of pollutant concentrations with site id, lat/lon, and date

Examples

## Not run: 
get_daily_aqs("pm25", "2024")

## End(Not run)

Get elevation summary data

Description

The fun (e.g. median() or sd()) of the elevations (captured at a spatial resolution of 800 by 800 m) within the buffer distance of each s2 geohash.

Usage

get_elevation_summary(x, fun = stats::median, buffer = 800)

install_elevation_data()

Arguments

x

a vector of s2 cell identifers (s2_cell object)

fun

function to summarize extracted data

buffer

distance from s2 cell (in meters) to summarize data

Value

for get_elevation_summary(), a numeric vector of elevation summaries, the same length as x

for install_elevation_data(), a character string path to elevation raster

References

https://prism.oregonstate.edu/normals/

Examples

get_elevation_summary(s2::as_s2_cell(c("8841b399ced97c47", "8841b38578834123")))

Get gridMET surface meteorological data

Description

Daily, high spatial resolution (~4-km) data comes from the Climatology Lab and is available for the contiguous US from 1979-yesterday.

Usage

get_gridmet_data(
  x,
  dates,
  gridmet_var = c("tmmx", "tmmn", "pr", "srad", "vs", "th", "rmax", "rmin", "sph")
)

install_gridmet_data(
  gridmet_var = c("tmmx", "tmmn", "pr", "srad", "vs", "th", "rmax", "rmin", "sph"),
  gridmet_year = as.character(1979:format(Sys.Date(), "%Y")),
  force_reinstall = FALSE
)

Arguments

x

a vector of s2 cell identifers (s2_cell object)

dates

a list of date vectors for the NARR data, must be the same length as x

gridmet_var

a character string that is the name of a gridMET variable

gridmet_year

a character string that is the year for the gridMET data; see details

force_reinstall

logical; download data from original source instead of reusing older downloads

Details

GRIDMET data comes as 1/24th degree gridded data, which is about 4 sq km resolution. s2 geohashes are intersected with this grid for matching with daily weather values.

gridMET variables are named:

gridmet_variable <- c(
  temperature_max = "tmmx",
  temperature_min = "tmmn",
  precipitation = "pr",
  solar_radiation = "srad",
  wind_speed = "vs",
  wind_direction = "th",
  relative_humidity_max = "rmax",
  relative_humidity_min = "rmin"
  specific_humidity = "sph"
)

Value

for get_gridmet_data(), a list of numeric vectors of gridMET values (the same length as x and dates)

for install_gridmet_data(), a character string path to gridMET raster data

References

https://www.climatologylab.org/gridmet.html

https://www.northwestknowledge.net/metdata/data/

Examples

d <- list(
  "8841b39a7c46e25f" = as.Date(c("2023-05-18", "2023-11-06")),
  "8841a45555555555" = as.Date(c("2023-06-22", "2023-08-15"))
)
get_gridmet_data(x = s2::as_s2_cell(names(d)), dates = d, gridmet_var = "tmmx")
get_gridmet_data(x = s2::as_s2_cell(names(d)), dates = d, gridmet_var = "pr")

Get smoke plume data from the NOAA's Hazard Mapping System

Description

The HMS operates daily in near real-time by outlining the smoke polygon of each distinct smoke plume and classifying it as "light", "medium", and "heavy". Since multiple plumes of varying or the same classification can cover one another, the total smoke plume exposure is estimated as the weighted sum of all plumes, where "light" = 1, "medium" = 2, and "heavy" = 3.

Usage

get_hms_smoke_data(x, dates)

install_hms_smoke_data(
  hms_smoke_start_date = as.Date("2017-01-01"),
  hms_smoke_end_date = as.Date("2025-10-31"),
  force_reinstall = FALSE
)

Arguments

x

a vector of s2 cell identifers (s2_cell object); currently required to be within the contiguous united states

dates

a list of date vectors for the predictions, must be the same length as x

hms_smoke_start_date

a date object that is the first day of hms smoke data installed

hms_smoke_end_date

a date object that is the last day of hms smoke data installed

force_reinstall

logical; download data from original source instead of reusing older downloads

Details

Daily HMS shapefiles are missing for 7 days within 2017-2023 ("2017-04-27", "2017-05-31", "2017-06-01", "2017-06-01" "2017-06-22", "2017-11-12", "2018-12-31") and will return zero values. If files are available but no smoke plumes intersect, then a zero values is also returned.

Value

for get_hms_smoke_data(), a list of numeric vectors of smoke plume scores (the same length as x and dates)

for install_hms_smoke_data(), a character string path to the installed RDS file

References

https://www.ospo.noaa.gov/Products/land/hms.html#about

Examples

d <- list(
  "8841b39a7c46e25f" = as.Date(c("2023-05-18", "2017-11-06")),
  "8841a45555555555" = as.Date(c("2017-06-22", "2023-08-15", "2024-12-30"))
)
get_hms_smoke_data(x = s2::as_s2_cell(names(d)), dates = d)

Get MERRA-2 aerosol diagnostics data

Description

Total and component (Dust, OC, BC, SS, SO4) surface PM2.5 concentrations from the MERRA-2 M2T1NXAER v5.12.4 product. Because installing MERRA-2 data takes a long time, "pre-compiled" data binaries for each year are available as pre-releases specific to MERRA data on GitHub.

Usage

get_merra_data(x, dates, merra_release = "merra-2025-12-29")

install_merra_data(
  merra_year = as.character(2017:2025),
  merra_release = "merra-2025-12-29"
)

create_daily_merra_data(merra_date)

Arguments

x

a vector of s2 cell identifers (s2_cell object)

dates

a list of date vectors for the MERRA data, must be the same length as x

merra_release

a character string of a release tag from which "pre-compiled" MERRA data binary is used instead of installing latest data from source; see details

merra_year

a character string that is the year for the merra data

merra_date

a date object that is the date for the merra data

Details

  • Installed data are filtered to a bounding box around the contiguous US, averaged to daily values, and converted to micrograms per cubic meter ($ug/m^3$).

  • Total surface PM2.5 mass is calculated according to the formula in https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/FAQ/#Q4

  • Set options("appc_install_data_from_source"), or the environment variable APPC_INSTALL_DATA_FROM_SOURCE to any non-empty value to install MERRA-2 directly from their sources instead of using the released GitHub data binary.

    • An Earthdata account linked with permissions for GES DISC is required. The EARTHDATA_USER and EARTHDATA_PASSWORD must be set. If a .env file is present, environment variables will be loaded using the dotenv package.

    • Under the hood, appc creates a secure netrc file for earthdata.nasa.gov using the provided credentials and uses httr2 package to download eathdata files with curl

Value

for get_merra_data(), a list of tibbles the same length as x, each containing merra data columns (merra_dust, merra_oc, merra_bc, merra_ss, merra_so4, merra_pm25) with one row per date in dates

for install_merra_data(), a character string path to the merra data

for create_daily_merra_data(), a tibble with columns for s2, date, and concentrations of PM2.5 total, dust, oc, bc, ss, so4

Examples

d <- list(
  "8841b39a7c46e25f" = as.Date(c("2023-05-18", "2023-11-06")),
  "8841a45555555555" = as.Date(c("2023-06-22", "2025-10-31"))
)
get_merra_data(x = s2::as_s2_cell(names(d)), dates = d)

Get daily North American Regional Reanalysis (NARR) weather data

Description

Get daily North American Regional Reanalysis (NARR) weather data

Installs NARR raster data into user's data directory for the appc package

Usage

get_narr_data(
  x,
  dates,
  narr_var = c("air.2m", "hpbl", "acpcp", "rhum.2m", "vis", "pres.sfc", "uwnd.10m",
    "vwnd.10m")
)

install_narr_data(
  narr_var = c("air.2m", "hpbl", "acpcp", "rhum.2m", "vis", "pres.sfc", "uwnd.10m",
    "vwnd.10m"),
  narr_year = as.character(2016:2025),
  force_reinstall = FALSE
)

Arguments

x

a vector of s2 cell identifers (s2_cell object)

dates

a list of date vectors for the NARR data, must be the same length as x

narr_var

a character string that is the name of a NARR variable

narr_year

a character string that is the year for the NARR data

force_reinstall

logical; download data from original source instead of reusing older downloads

Details

NARR data comes as 0.3 degrees gridded data, which is about 32 sq km resolution. s2 geohashes are intersected with this 0.3 degree grid for matching with daily weather values.

Value

for get_narr_data(), a list of numeric vectors of NARR values (the same length as x and dates)

for install_narr_data(), a character string path to NARR raster data

References

https://psl.noaa.gov/data/gridded/data.narr.html

https://www.ncei.noaa.gov/products/weather-climate-models/north-american-regional

Examples

d <- list(
  "8841b39a7c46e25f" = as.Date(c("2023-05-18", "2023-11-06")),
  "8841a45555555555" = as.Date(c("2023-06-22", "2023-08-15"))
)
get_narr_data(x = s2::as_s2_cell(names(d)), dates = d, narr_var = "air.2m")

Get NEI point summary data

Description

National Emissions Inventory (NEI) data is summarized as the sum of all point emissions within the buffer distance of each s2 geohash weighted by the inverse of the distance squared to each emission point.

Usage

get_nei_point_summary(
  x,
  year = c("2020", "2017"),
  pollutant_code = c("PM25-PRI", "EC", "OC", "SO4", "NO3", "PMFINE"),
  buffer = 1000
)

install_nei_point_data(year = c("2020", "2017"))

Arguments

x

a vector of s2 cell identifers (s2_cell object)

year

a character string that is the year of the NEI data

pollutant_code

the NEI pollutant to summarize

buffer

distance from s2 cell (in meters) to summarize data

Details

The full NEI is conducted every three years, with the latest release being 2020.

The NEI file is downloaded, unzipped, and filtered to observations with a pollutant code of EC, OC, SO4, NO3, PMFINE, or PM25-PRI. Latitude and longitude are encoded as an s2 vector, column names are cleaned, and rows with missing values (including total emissions or emissions units) are excluded.

Value

for get_nei_point_summary(), a numeric vector (the same length as x)

for get_nei_point_data(), a character string path to NEI point data RDS file

References

https://www.epa.gov/air-emissions-inventories/national-emissions-inventory-nei

https://www.epa.gov/air-emissions-inventories/2020-national-emissions-inventory-nei-technical-support-document-tsd

Examples

get_nei_point_summary(s2::as_s2_cell(c("8841b399ced97c47", "8841b38578834123")))

Get NLCD Fractional Impervious Surface

Description

NLCD data is from v1 of the Annual NLCD

Usage

get_nlcd_frac_imperv(x, dates, fun = stats::median, buffer = 400)

install_nlcd_frac_imperv_data(nlcd_year = as.character(2025:2017))

Arguments

x

a vector of s2 cell identifers (s2_cell object)

dates

a list of date vectors for the NLCD data, must be the same length as x

fun

function to summarize extracted data

buffer

distance from s2 cell (in meters) to summarize data

nlcd_year

a character string that is the year for the NLCD data

Value

for get_nlcd_frac_imperv(), a list of numeric vectors of fractional impervious surface pixel summaries, the same length as x; each vector has values for each date in dates, named according to the NLCD product year

for install_nlcd_frac_imperv_data(), a character string path to NLCD raster data

References

https://www.usgs.gov/centers/eros/science/annual-nlcd-fractional-impervious-surface

Examples

d <- list(
  "8841b39a7c46e25f" = as.Date(c("2023-05-18", "2023-11-06")),
  "8841a45555555555" = as.Date(c("2023-06-22", "2023-08-15"))
)
get_nlcd_frac_imperv(x = s2::as_s2_cell(names(d)), dates = d)
get_nlcd_frac_imperv(x = s2::as_s2_cell(names(d)), dates = d, fun = mean, buffer = 1000)

Get traffic summary

Description

Highway Performance Monitoring System (HPMS) data from 2020 is summarized as the average daily total number of meters driven by passenger vehicles, trucks/busses, and tractor-trailers on interstates, freeways, and expressways within buffer meters of each s2 cell

Usage

get_traffic_summary(x, buffer = 400)

install_traffic(traffic_release = "hpms_2020_f12_aadt-2025-07-16")

Arguments

x

a vector of s2 cell identifers (s2_cell object)

buffer

distance from s2 cell (in meters) to summarize data

traffic_release

name of github release to download traffic data file from

Details

Only roads with F_SYSTEM classification of 1 ("interstate") or 2 ("principal arterial - other freeways and expressways") are used. Passenger vehicles (FHWA 1-3) are calculated as the total minus FHWA class 4-7 (single unit) and 8-13 (combo)

Value

a list the same length as x, which each element having a numeric vector of aadtm_passenger, aadtm_trucks_buses, aadtm_tractor_trailer

References

https://www.fhwa.dot.gov/policyinformation/hpms.cfm

https://data-usdot.opendata.arcgis.com/datasets/usdot::highway-performance-monitoring-system-hpms-2020/about

Examples

get_traffic_summary(
  s2::as_s2_cell(c("8841b6abd8207619", "8841b4f6affffffb", "8841b39f07f7d899")))
## Not run: 
# randomly sample 100 level 18 cells from s2 level-9: 8841b4
# https://igorgatis.github.io/ws2/?cells=8841b4
# use their centroids as the level 30 s2 cells
set.seed(1)
my_s2_cells <-
  s2::s2_covering_cell_ids(s2::s2_cell_polygon(s2::as_s2_cell("8841b4")),
                           min_level = 18, max_level = 18) |>
  unlist()|>
  sample(size = 100)|>
  s2::s2_cell_center()|>
  s2::as_s2_cell()
get_traffic_summary(my_s2_cells) |>
  dplyr::bind_rows()

## End(Not run)

Install daily PM2.5 averages

Description

Daily data by state is downloaded from the AQS API and filtered/harmonized as described in the details. Data from the EPA AQS API are updated more frequently compared to the pre-generated daily average files used by get_daily_aqs().

Usage

install_aqs(year = as.character(2025:2017), force_reinstall = FALSE)

Arguments

year

character; calendar year of data to install

force_reinstall

logical; download data from original source instead of reusing older downloads

Details

Installing AQS data via the API requires a key associated with an email address. Signup with the url putting in your email address; e.g., https://aqs.epa.gov/data/api/[email protected] and look for an email with the key. Save these credentials as environment variables (or in a .env file): AQS_DATA_MART_API_EMAIL and AQS_DATA_MART_API_KEY

For PM2.5 (FRM, non-FRM, and speciation), data is filtered to only observations with a sample duration of "24 HOURS". All pollutants measurements are removed if the observation percent for the sampling period is less than 75 or were indicated to be invalid. When a pollutant is measured by more than one device on the same day at the same s2 location, the average measurement is returned, ensuring unique measurements for each pollutant-location-day

Value

a character string path to an AQS data RDS file

Examples

# on 2025-07-22, 2025 data goes until the end of March 2025
## Not run: 
install_aqs("2025") |>
  readRDS()

## End(Not run)

Convert latitude/longitude vectors into S2 cells

Description

Convert latitude/longitude vectors into S2 cells

Usage

latlon_to_s2_cell(lat, lon)

Arguments

lat

Numeric vector of latitudes (decimal degrees)

lon

Numeric vector of longitudes (decimal degrees)

Value

An object of class s2_cell

Examples

latlon_to_s2_cell(
  lat = c(45.0, 46.1),
  lon = c(-64.2, -65.3)
)

Get daily PM2.5 model predictions

Description

Get daily PM2.5 model predictions

Usage

predict_pm25(x, dates, keep_predictors = FALSE)

Arguments

x

a vector of s2 cell identifers (s2_cell object)

dates

a list of date vectors for the predictions, must be the same length as x

keep_predictors

logical; return values for model predictors alongside PM2.5 estimates?

Details

Internally, loading the model file is cached, so repeated calls in the same R session will not require the overhead of loading the model file for a new prediction.

Value

a list of tibbles the same length as x, each containing columns for the predicted (pm25) and its standard error (pm25_se); with one row per date in dates. These numerics are the concentrations of fine particulate matter, measured in micrograms per cubic meter. See vignette("cv-model-performance") for more details on the cross validated accuracy of the daily PM2.5 model predictions.

Examples

d <- list(
  "8841b39a7c46e25f" = as.Date(c("2023-05-18", "2023-11-06")),
  "8841a45555555555" = as.Date(c("2023-06-22", "2023-08-15"))
)

predict_pm25(x = s2::as_s2_cell(names(d)), dates = d)

# takes less time after called once because model file is cached in memory

d <- list(
  "8841b39a7c46e25f" = as.Date(c("2023-05-13", "2023-11-16")),
  "8841a45555555555" = as.Date(c("2023-06-21", "2023-08-25"))
)
predict_pm25(x = s2::as_s2_cell(names(d)), dates = d)

Get daily PM2.5 model predictions using date ranges

Description

Get daily PM2.5 model predictions using date ranges

Usage

predict_pm25_date_range(x, start_date, end_date, average = FALSE)

Arguments

x

a vector of s2 cell identifers (s2_cell object)

start_date

a date vector of start dates for each s2 cell, must be the same length as x

end_date

a date vector of end dates for each s2 cell, must be the same length as x

average

logical; summarize daily exposures estimates and standard errors?

Details

The standard error for averages of daily pm25 exposures with known standard errors is calculated, assuming they are independent, as the square root of the sum of squared individual standard errors divided the total number of individual daily pm25 exposures.

Examples

predict_pm25_date_range(
  x = c("8841b39a7c46e25f", "8841a45555555555"),
  start_date = as.Date(c("2023-05-18", "2023-01-06")),
  end_date = as.Date(c("2023-06-22", "2023-08-15")),
  average = TRUE
)