Package 'addr'

Title: Clean, Parse, Harmonize, Match, and Geocode Messy Real-World US Addresses
Description: Clean, parses, standardize, match, and geocodes messy, real-world US addresses. Use the included `usaddress` library to tag address components and build addr vector objects composed of addr_part vectors for number, street, and place. These vectors can be standardized, matched, joined, and used as data-frame columns, allowing standard R tools to work with nested address structures.
Authors: Cole Brokamp [aut, cre], Erika Manning [aut]
Maintainer: Cole Brokamp <[email protected]>
License: MIT + file LICENSE
Version: 1.1.0
Built: 2026-05-19 17:13:48 UTC
Source: https://github.com/geomarker-io/addr

Help Index


Left join two data frames using fuzzy addr matching

Description

This wraps the addr fuzzy matching helpers and returns a left-join style result. The addr columns are matched by index and rows are expanded for one-to-many or many-to-many matches.

See addr_match() and addr_left_join() for a faster alternative that returns one selected match instead of all fuzzy matches.

Usage

addr_fuzzy_left_join(
  x,
  y,
  by = "addr",
  addr_fields = NULL,
  suffix = c(".x", ".y"),
  progress = interactive()
)

Arguments

x, y

data frames or tibbles with an addr column

by

addr column name in x (and y if the same); or a length-2 character vector of c(x_col, y_col)

addr_fields

a named vector of OSA maximum distances. Defaults are used for fields that are not supplied; see Details.

suffix

character vector of length 2 used to suffix duplicate columns

progress

logical; show progress bar while processing matched ZIP groups?

Details

addr_fuzzy_left_join() matches addresses within ZIP code groups, so maximum distances for place fields are ignored. Defaults for addr_fields:

  • number_prefix: 0

  • number_digits: 0

  • number_suffix: 0

  • street_predirectional: 0

  • street_premodifier: 0

  • street_pretype: 0

  • street_name: 1

  • street_posttype: 0

  • street_postdirectional: 0

Value

a data frame with left-join semantics; note that row order will be changed compared to x

Examples

my_addr <-
  tibble::tibble(address = voter_addresses()[1:10],
                 addr = as_addr(address),
                 id = sprintf("id_%04d", seq_len(10)))
the_addr <- nad_example_data()
addr_fuzzy_left_join(my_addr, the_addr, c("addr", "nad_addr"))

Fuzzy match addr vectors using field-specific string distances

Description

addr_fuzzy_match() matches two addr vectors using more than one address field.

fuzzy_match_addr_field() matches two addr vectors using a single address field.

Distances between address tags are defined using optimized string alignment; see fuzzy_match() and stringdist::stringdist() for more details.

Usage

addr_fuzzy_match(x, y, addr_fields = NULL)

fuzzy_match_addr_field(
  x,
  y,
  addr_field = c("number_prefix", "number_digits", "number_suffix",
    "street_predirectional", "street_premodifier", "street_pretype", "street_name",
    "street_posttype", "street_postdirectional", "place_name", "place_state",
    "place_zipcode"),
  osa_max_dist = 0
)

Arguments

x

addr vector to match

y

addr vector to match to

addr_fields

a named vector of OSA maximum distances. Defaults are used for fields that are not supplied; see Details.

addr_field

character name of single addr field to match on

osa_max_dist

maximum optimized string alignment distance used as threshold for matching on single addr field

Details

Defaults for addr_fields:

  • number_prefix: 0

  • number_digits: 0

  • number_suffix: 0

  • street_predirectional: 0

  • street_premodifier: 0

  • street_pretype: 0

  • street_name: 1

  • street_posttype: 0

  • street_postdirectional: 0

  • place_name: 0

  • place_state: 0

  • place_zipcode: 0

When fuzzy matching street_name, the "phonetic_street_key" prefilter is automatically used (see ?fuzzy_match).

Value

a list of integer vectors representing the position of the best matching address(es) in y for each address in x

Examples

x_addr <- as_addr(c("123 Main St.", "333 Burnet Ave", "3333 Foofy Ave"))
y_addr <- as_addr(c("0000 Main Street", "3333 Burnet Avenue", "222 Burnet Ave"))

# no matches with defaults
addr_fuzzy_match(x_addr, y_addr)

# match on osa_max_dist of 2 for the address number
addr_fuzzy_match(x_addr, y_addr, addr_fields = c("number_digits" = 2))

# ignore address number when matching
addr_fuzzy_match(x_addr, y_addr, addr_fields = c("number_digits" = Inf))


fuzzy_match_addr_field(
  as_addr(c("123 Main St.", "3333 Burnet Ave", "3333 Foofy Ave")),
  as_addr(c("0000 Main Street", "0000 Burnet Avenue", "222 Burnet Ave")),
  addr_field = "street_name", osa_max_dist = 1
)

# empty address fields have an OSA distance of zero and always match
fuzzy_match_addr_field(
  as_addr(c("123 Main St.", "3333 Burnet Ave", "3333 Foofy Ave")),
  as_addr(c("0000 Main Street", "0000 Burnet Avenue", "222 Burnet Ave")),
  addr_field = "number_prefix"
)

Left join two data frames using addr matching

Description

addr_left_join() is a convenience wrapper around addr_match() that returns a left-join style result. It expands rows of x for duplicate rows in the original y that share the exact matched addr, but it does not return multiple distinct candidate addresses from y. addr_match() still selects a single best address before this wrapper expands exact duplicates.

Usage

addr_left_join(
  x,
  y,
  by = "addr",
  suffix = c(".x", ".y"),
  zip_variants = TRUE,
  zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
  name_phonetic_dist = 2L,
  name_fuzzy_dist = 1L,
  number_fuzzy_dist = 1L,
  match_street_type = c("exact", "compatible", "ignore"),
  match_street_directional = c("exact", "swap", "ignore"),
  progress = interactive(),
  match_prepared = NULL
)

Arguments

x, y

data frames or tibbles with an addr column

by

addr column name in x (and y if the same); or a length-2 character vector of c(x_col, y_col)

suffix

character vector of length 2 used to suffix duplicate columns

zip_variants

logical; fuzzy match to common variants of x in y?

zip_variant

character vector; zipcode variant types to use when zip_variants is TRUE; see ?zipcode_variant

name_phonetic_dist

integer; maximum optimized string alignment distance between phonetic_street_key() of x and y to consider a possible match

name_fuzzy_dist

integer; maximum optimized string alignment distance between ⁠@name⁠ of x and y to consider a possible match

number_fuzzy_dist

integer; maximum optimized string alignment distance between addr_number strings in x and y to consider a possible match.

match_street_type

character; how to compare street pretype and posttype when selecting street candidates. "exact" requires pretype to match pretype and posttype to match posttype; "compatible" treats blank type fields as unknown but rejects candidates when known type information conflicts; "ignore" does not use street type fields when selecting candidates.

match_street_directional

character; how to compare street predirectional and postdirectional when selecting street candidates. "exact" requires predirectional to match predirectional and postdirectional to match postdirectional; "swap" also permits predirectional to match postdirectional and postdirectional to match predirectional; "ignore" does not use street directional fields when selecting candidates.

progress

logical; show addr_match() progress?

match_prepared

optional prepared addr_match_index for the y addr column, usually from addr_match_prepare(). When supplied, addr_left_join() validates that it is equivalent to the y addr column before reusing it for matching.

Value

A data frame with left-join semantics. Duplicate rows in y with the exact same matched addr are all returned. Partial ZIP-only or street-only matches do not expand to multiple candidate rows in y.

Examples

the_addr <- nad("Hamilton", "OH",
                refresh_binary = "no", refresh_source = "no")
my_addr <- tibble::tibble(
  addr = as_addr(voter_addresses()[1:100]),
  id = 1:100
)

d <- addr_left_join(
  my_addr,
  the_addr,
  by = c("addr", "nad_addr"),
  match_prepared = nad_example_data(match_prepared = TRUE)
)

d

# some addresses may match with more than one address in NAD
# since matching does not consider subaddress (e.g. "line two")
# take the first row in these cases

table(addr_match_stage(d$nad_addr.y[!duplicated(d$id)]))

Match addr vectors

Description

A single addr in y is chosen for each addr in x. Matching is staged to reduce the search space: ZIP codes are matched first, street names are then matched within each matched ZIP code, and street numbers are finally matched within each matched street and ZIP code combination. If more than one candidate addr remains in y after these stages, the first candidate in y is returned.

Missing or empty address components that cannot be matched at any stage are left missing in the returned addr() values. Rows with a matched ZIP code but no street match return an addr with only ⁠@place@zipcode⁠ filled; rows with matched ZIP code and street but no number match also return the matched ⁠@street⁠.

addr_match() accepts raw reference data and prepares it internally, which is the right default for one-off matching jobs. addr_match_prepare() becomes useful when the same reference y will be reused across multiple calls to addr_match(), because it caches the deduplicated reference addresses and ZIP/street/number candidate lookups once instead of rebuilding them on every call.

Preparing y once avoids recomputing unique(y), ZIP-code groups, and exact street/number candidate lookups each time you call addr_match() with the same reference addresses. For a single end-to-end match, preparing y explicitly does not remove that work; it only moves it outside addr_match().

Usage

addr_match(
  x,
  y,
  zip_variants = TRUE,
  zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
  name_phonetic_dist = 2L,
  name_fuzzy_dist = 1L,
  number_fuzzy_dist = 1L,
  match_street_type = c("exact", "compatible", "ignore"),
  match_street_directional = c("exact", "swap", "ignore"),
  progress = interactive()
)

addr_match_prepare(y, progress = interactive())

Arguments

x

addr vector to match

y

addr vector to match against, or a prepared addr_match_index created by addr_match_prepare()

zip_variants

logical; fuzzy match to common variants of x in y?

zip_variant

character vector; zipcode variant types to use when zip_variants is TRUE; see ?zipcode_variant

name_phonetic_dist

integer; maximum optimized string alignment distance between phonetic_street_key() of x and y to consider a possible match

name_fuzzy_dist

integer; maximum optimized string alignment distance between ⁠@name⁠ of x and y to consider a possible match

number_fuzzy_dist

integer; maximum optimized string alignment distance between addr_number strings in x and y to consider a possible match.

match_street_type

character; how to compare street pretype and posttype when selecting street candidates. "exact" requires pretype to match pretype and posttype to match posttype; "compatible" treats blank type fields as unknown but rejects candidates when known type information conflicts; "ignore" does not use street type fields when selecting candidates.

match_street_directional

character; how to compare street predirectional and postdirectional when selecting street candidates. "exact" requires predirectional to match predirectional and postdirectional to match postdirectional; "swap" also permits predirectional to match postdirectional and postdirectional to match predirectional; "ignore" does not use street directional fields when selecting candidates.

progress

logical; show reference-preparation timing and a progress bar while preparing raw y or processing matched ZIP groups?

Value

An addr vector, the same length as x, containing the selected match in y for each addr in x. Partial matches are returned with matched ZIP code and/or street fields filled when later stages do not match.

Examples

the_addr <- nad_example_data(match_prepared = TRUE)
my_addr <- as_addr(
  c(
    "2700 Alice St 45222",
    "10623 Srpingfield Pike 45215",
    "173 Wuhlper Ave 45220",
    "12176 8th Ave 45249",
    "12176 7ht Ave 45249",
    "10 W 14th St 45202",
    "10 Oak Rd 45241"
  )
)

addr_match(my_addr, the_addr)

addr_match(
  my_addr,
  the_addr,
  zip_variants = FALSE,
  name_phonetic_dist = 0L,
  name_fuzzy_dist = 0L,
  number_fuzzy_dist = 0L,
  match_street_type = "ignore",
  match_street_directional = "ignore"
)

my_addr <- as_addr(voter_addresses()[1:100])

d <- addr_match(my_addr, the_addr)
d

addr_match_stage(d)

Classify addr match stage

Description

Classify an addr vector into the staged outcomes returned by addr_match(): no match, ZIP-only match, ZIP-plus-street match, or ZIP-plus-street-plus-number match.

Usage

addr_match_stage(x, strict = TRUE)

Arguments

x

addr vector to classify

strict

logical; require x to follow the partial-result structure produced by addr_match()? If FALSE, classification is based only on the deepest non-missing core component (⁠@place@zipcode⁠, ⁠@street@name⁠, ⁠@number@digits⁠).

Value

an ordered factor with levels none, zip, street, number

Examples

y <- as_addr(c(
  "10 MAIN ST CINCINNATI OH 45220",
  "11 MAIN ST CINCINNATI OH 45220",
  "10 MAIN ST CINCINNATI OH 45229"
))
x <- as_addr(c(
  "99 MAIN ST CINCINNATI OH 45220",
  "10 OAK ST CINCINNATI OH 45220",
  "10 MAIN ST CINCINNATI OH 45103"
))

out <- addr_match(x, y)
addr_match_stage(out)

addr classes

Description

The structures for addr() and the addr_ classes are derived as a subset of the United States Thoroughfare, Landmark, and Postal Address Data Standard that is relevant for residential, numbered thoroughfare addresses:

  Address
   ├─ AddressNumber
   │  ├─ AddressNumberPrefix
   │  ├─ AddressNumber
   │  ├─ AddressNumberSuffix
   ├─ StreetName
   │  ├─ StreetNamePreModifier
   │  ├─ StreetNamePreDirectional
   │  ├─ StreetNamePreType
   │  ├─ StreetName
   │  ├─ StreetNamePostType
   │  └─ StreetNamePostDirectional
   └─ Place
      ├─ PlaceName
      ├─ StateName
      └─ ZipCode

addr() combines addr_number(), addr_street(), and addr_place() into a single addr vector:

<addr>
 @ number: <addr_number>
 .. @ prefix
 .. @ digits
 .. @ suffix
 @ street: <addr_street>
 .. @ predirectional
 .. @ premodifier
 .. @ pretype
 .. @ name
 .. @ posttype
 .. @ postdirectional
 @ place : <addr_place>
 .. @ name
 .. @ state
 .. @ zipcode

Usage

addr_number(
  prefix = NA_character_,
  digits = NA_character_,
  suffix = NA_character_
)

addr_street(
  predirectional = NA_character_,
  premodifier = NA_character_,
  pretype = NA_character_,
  name = NA_character_,
  posttype = NA_character_,
  postdirectional = NA_character_,
  map_posttype = TRUE,
  map_directional = TRUE,
  map_pretype = TRUE,
  map_ordinal = TRUE
)

addr_place(
  name = NA_character_,
  state = NA_character_,
  zipcode = NA_character_,
  map_state = TRUE
)

addr(number = addr_number(), street = addr_street(), place = addr_place())

Arguments

prefix

address number prefix, often a fractional or grid component

digits

primary street number for the address; must be between 0 and 999999

suffix

address number suffix, often a letter or unit-like component

predirectional

direction before the street name

premodifier

descriptive modifier before the street name

pretype

street type or classification before the street name

name

street name, or city/town/municipality name for addr_place()

posttype

street type or classification after the street name

postdirectional

direction after the street name

map_posttype

logical; map posttype to abbreviations?

map_directional

logical; map pre- and post-directional to abbreviations?

map_pretype

logical; map pretype to abbreviations?

map_ordinal

logical; map ordinal street names to abbreviations?

state

state or territory abbreviation

zipcode

ZIP code (must be five digits not starting with "000")

map_state

logical; map state to abbreviations?

number

an addr_number vector

street

an addr_street vector

place

an addr_place vector

Details

All field values must be character vectors of at least length one (including missing values). Length-one fields are recycled to match the length of other fields.

Value

An addr, addr_number, addr_street, or addr_place vector

Examples

# define a new addr_number vector
addr_number(digits = "290")
addr_number(prefix = "N", digits = "290", suffix = "A")

# define a new addr_street vector
addr_street(name = "Burnet", posttype = "Ave")

# street names are automatically mapped to abbreviations
addr_street(predirectional = "North", name = "Fifth", posttype = "Street")

# define a new addr_place vector
addr_place(name = "Cincinnati", state = "OH", zipcode = "45220")

# define a new addr vector
addr(
  addr_number(digits = "290"),
  addr_street(name = "Burnet", posttype = "Ave"),
  addr_place(name = "Cincinnati", state = "OH", zipcode = "45229")
)

# define a more complicated addr vector
# and explicitly specify empty components to avoid NA
addr(
  addr_number(prefix = "", digits = "200", suffix = ""),
  addr_street(
    predirectional = "west",
    premodifier = "Old",
    pretype = "US",
    name = "50",
    posttype = "avenue",
    postdirectional = "east",
    map_directional = TRUE,
    map_pretype = TRUE,
    map_posttype = TRUE
  ),
  addr_place(name = "Cincinnati", state = "ohio", zipcode = "45220")
)

# addr_* vectors are recycled and omitted fields are missing
addr(
  addr_number(digits = c("290", "200", "3333", "111")),
  addr_street(
    name = c("Burnet", "Main", "Ludlow", "State Route 32"),
    posttype = c("Ave", "St", "Ave", NA_character_)
  ),
  addr_place(name = "Cincinnati", state = "OH")
)

Coerce to addr

Description

as_addr() converts other objects into addr() vectors. See ?addr for more details on its structure.

Usage

as_addr(x, ...)

Arguments

x

object to coerce to an addr vector

...

additional arguments passed to methods

Methods implemented for

  • character: will be cleaned (if clean = TRUE) with clean_address_text() and then tagged using usaddress_tag(); tags are normalized to abbreviations by passing all ⁠map_*⁠ arguments to addr_street() or addr_place(); ZIP codes parsed with more than five characters are truncated with a warning, and malformed parsed ZIP codes are set to missing with a warning; non-numeric characters in parsed address number digits will be removed with a warning; parsed address number digits greater than 999999 are truncated to the first six digits with a warning

  • data.frame: must have columns named according to fields in addr_number(), addr_street(), or addr_place(); also passes the ⁠map_*⁠ arguments to addr_street() and addr_place()

  • addr: returned as-is

Examples

as_addr(voter_addresses()[1:1000])

data.frame(
  number_digits = c("290", "200"),
  street_name = c("Burnet", "Main"),
  street_posttype = c("Ave", "St"),
  place_name = c("Cincinnati", "Cincinnati"),
  place_state = c("OH", "OH"),
  place_zipcode = c("45229", "45220"),
  stringsAsFactors = FALSE
)|>
  as_addr()

Clean address text

Description

Remove excess whitespace and keep only letters, numbers, ⁠#⁠, and -.

Usage

clean_address_text(x)

Arguments

x

a vector of address character strings

Value

a vector of cleaned addresses

Examples

clean_address_text(c(
  "3333 Burnet Ave Cincinnati OH 45219",
  "33_33 Burnet Ave. Cincinnati OH 45219",
  "33\\33 B\"urnet Ave; Ci!ncinn&*ati OH 45219",
  "3333 Burnet Ave Cincinnati OH 45219",
  "33_33 Burnet Ave. Cincinnati OH 45219"
))

Translate county names and county FIPS identifiers

Description

county_fips_lookup() uses a package-internal reference derived from the 2025 U.S. Census county adjacency file to translate between county names, state abbreviations, and 5-digit county FIPS identifiers.

Name lookups accept either the full county-equivalent label (for example, "Orleans Parish") or a shortened form with common suffixes removed (for example, "Orleans"). If a shortened form is ambiguous within a state, the function errors and asks for the full county-equivalent name or the 5-digit FIPS identifier.

Usage

county_fips_lookup(county, state = NULL)

Arguments

county

character, length one; either a county name or a 5-digit county FIPS identifier

state

character, length one; state abbreviation or full state name; required when county is a name, ignored when county is already a 5-digit FIPS identifier

Value

A tibble with one row and columns county, county_full, state, and county_fips.

Examples

county_fips_lookup("Hamilton", "OH")
county_fips_lookup("Hamilton", "Ohio")
county_fips_lookup("39061")

Example line-one addresses

Description

The Cincinnati Eviction Hotspots data was downloaded from Eviction Labs and contains characteristics of the top 100 buildings that are responsible for about 25% of all eviction filings in Cincinnati (from their "current through 8-31-2024" release).

Usage

elh_data()

Details

https://evictionlab.org/eviction-tracking/cincinnati-oh/

Value

a tibble with 100 rows and 9 columns

Examples

elh_data()

Fuzzy match

Description

Fuzzy match strings in x to strings in y using optimized string alignment (OSA) distance and ignoring capitalization.

Usage

fuzzy_match(x, y, osa_max_dist = 1, prefilter = c("none", "psk"))

Arguments

x

character vector to match

y

character vector to match to

osa_max_dist

maximum OSA distance to consider a match; Inf is a special case that avoids computing string distance by returning all of y instead of just the best match or matches in y.

prefilter

method used to prefilter y before computing OSA distances; "none" does nothing, and "psk" removes values in y that do not share a phonetic_street_key() with any value in x.

Details

If multiple strings in y are tied for the minimum OSA distance from a string in x, all of their indices are included in the return value.

Value

a list of integer vectors representing the position of the best matching string(s) in y for each string in x

Examples

my_names <-
  c("Pinye", "Pine", "Oalck", "Sunset", "Riverbend", "Greenfild")
the_names <-
  c("Piney", "Pine", "Oak", "Cheshire", "Greenfield", "Maple", "Elm")
matches <- fuzzy_match(my_names, the_names, osa_max_dist = 1)
matches

lapply(matches, \(i) the_names[i])

x <- as_addr(voter_addresses()[1:100])@street@name
y <- unique(nad_example_data()$nad_addr@street@name)
system.time(fuzzy_match(x, y))
# larger vectors see a speedup when using
# phonetic_street_key as a prefilter
# but may miss potential matches that are within
# osa_max_dist of each other, but did not have
# identical phonetic codes (e.g., "woolper" and "woopler")
system.time(fuzzy_match(x, y, prefilter = "psk"))

Geocode addr vectors with Census TIGER address features

Description

geocode() geocodes addr vectors using Census TIGER address features (see ?taf) by:

  1. searching for a matching street (see ?match_addr_street), within the same ZIP code, also searching similar ZIP codes for a matching street if necessary

  2. using the address number to select the best address feature range and side of the street (even/odd), breaking ties on smallest width and spread

  3. linearly interpolating a geographic point along the best range line based on the actual and potential range of address numbers

  4. offsetting the interpolated point from the range line perpendicularly

Only matched input addresses return non-missing matched ZIP code and street values. Missing or unmatched ZIP codes return missing matched ZIP code, street, geography, and s2 cell values. If all ranges on the matched ZIP code and street exclude the address number, only the geography and s2 cell values return NA.

Usage

geocode(
  x,
  name_phonetic_dist = 1L,
  name_fuzzy_dist = 2L,
  match_street_type = c("exact", "compatible", "ignore"),
  match_street_directional = c("exact", "swap", "ignore"),
  zip_variants = TRUE,
  zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
  year = as.character(2025:2011),
  version = "v1",
  taf_install = TRUE,
  taf_redownload = FALSE,
  offset = 10L,
  progress = interactive()
)

geocode_zip(
  x,
  offset = 10L,
  name_phonetic_dist = 1L,
  name_fuzzy_dist = 2L,
  match_street_type = c("exact", "compatible", "ignore"),
  match_street_directional = c("exact", "swap", "ignore"),
  zip_variants = TRUE,
  zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
  year = as.character(2025:2011),
  version = "v1",
  taf_install = TRUE,
  taf_redownload = FALSE,
  progress_callback = NULL,
  taf_check = TRUE
)

Arguments

x

an addr vector (?as_addr)

name_phonetic_dist

integer; maximum optimized string alignment distance between phonetic_street_key() of x and y to consider a possible match

name_fuzzy_dist

integer; maximum optimized string alignment distance between ⁠@name⁠ of x and y to consider a possible match

match_street_type

character; how to compare street pretype and posttype when selecting street candidates. "exact" requires pretype to match pretype and posttype to match posttype; "compatible" treats blank type fields as unknown but rejects candidates when known type information conflicts; "ignore" does not use street type fields when selecting candidates.

match_street_directional

character; how to compare street predirectional and postdirectional when selecting street candidates. "exact" requires predirectional to match predirectional and postdirectional to match postdirectional; "swap" also permits predirectional to match postdirectional and postdirectional to match predirectional; "ignore" does not use street directional fields when selecting candidates.

zip_variants

logical; fuzzy match to common variants of x in y?

zip_variant

character vector; zipcode variant types to use when zip_variants is TRUE; see ?zipcode_variant

year

integer, length one; vintage of TIGER addrfeat (address feature) files

version

character, length one; major version of the package and taf dataset schema

taf_install

logical; install missing county TAF files needed for input ZIP codes and selected ZIP code variants before geocoding? If FALSE, geocoding proceeds with installed files only and warns when needed county files are missing.

taf_redownload

logical; re-download cached TIGER ZIP files when installing missing TAF counties?

offset

number of meters to offset geocode from street line

progress

logical; show a ZIP-code progress bar while geocoding?

progress_callback

optional callback used internally by geocode() to update progress after ZIP-code reference data is loaded

taf_check

logical; check for missing TAF counties? Used internally by geocode() after checking once for the full input vector.

Details

geocode_zip() is the workhorse function and operates on addr vectors with the same ZIP code; use geocode() to geocode an addr vector with multiple ZIP codes by grouping them by ZIP code and processing serially by default. At a lower level, grouping addr vectors by ZIP code and applying geocode_zip() facilitates more control (e.g., parallel processing).

If the mirai package is installed and mirai daemons have already been configured by the caller, geocode() uses them for ZIP-code-level parallel processing. Otherwise it falls back to sequential processing.

geocode() and geocode_zip() both download and install tiger address features by county (?taf_install) as needed based on the input addr ZIP codes (and possibly ZIP code variants). TAF install checks run before reading TAF ZIP files so parallel geocoding workers do not try to download county files at the same time.

Value

A tibble with columns addr (the input addr vector), matched_zipcode (character vector), matched_street (addr_street vector), matched_geography (s2_geography point vector), and s2_cell (s2_cell vector).

Examples

x <- as_addr(voter_addresses()[1:100])

# for example purposes, only install one county
Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")
# and geocode without installing other counties
gcd <- geocode(x, taf_install = FALSE)

# this is only for example purposes and usually not required; e.g.

## Not run: 
  gcd <- geocode(x)

## End(Not run)

gcd

table(geocode_stage(gcd))

geocode_table(gcd)

leaflet::leaflet(wk::wk_coords(gcd$matched_geography)) |>
  leaflet::addTiles() |>
  leaflet::addCircleMarkers(lng = ~x, lat = ~y, label = ~feature_id)

# use mirai for parallel processing
## Not run: 
  mirai::daemons(2)
  geocode(x)
  mirai::daemons(0)

## End(Not run)

Classify geocode stage

Description

Classify geocode results into staged outcomes returned by geocode(): no match, street match, or interpolated range match, distinguishing exact ZIP-code matches from ZIP-code variant matches.

Usage

geocode_stage(x)

Arguments

x

a data frame returned by geocode()

Value

an ordered factor with levels none, street_variant, street, range_variant, range


Convert geocode objects to JSON-safe tables

Description

geocode_table() converts the rich output from geocode() to a flat table with only JSON-safe column types.

Usage

geocode_table(x)

Arguments

x

a data frame returned by geocode()

Value

A tibble with atomic columns suitable for JSON serialization. geocode_table() includes the input address, geocode stage, matched ZIP code, matched street, and S2 cell as character columns.


Match addr_number vectors

Description

A single addr_number in y is chosen for each addr_number in x. If exact matches (using as.character) are not found, possible matches within number_fuzzy_dist are searched for in y. If multiple matches are present in y, the selected match has the lowest absolute numeric difference from ⁠@digits⁠ in x; ties are broken by optimized string alignment (OSA) distance and then by lexicographic order with digits preceding alphabetic characters.

addr_number objects with missing ⁠@digits⁠, or with empty strings for all of ⁠@prefix⁠, ⁠@digits⁠, and ⁠@suffix⁠, are not matched and returned as missing instead.

Usage

match_addr_number(x, y, number_fuzzy_dist = 1L)

Arguments

x, y

addr_number vectors to match

number_fuzzy_dist

integer; maximum optimized string alignment distance between addr_number strings in x and y to consider a possible match.

Value

An addr_number vector, the same length as x, containing the selected match in y for each element of x. Unmatched elements are returned as missing addr_number() values.

Examples

x <- addr_number(
   prefix = "",
   digits = as.character(c(1, 10, 228, 11, 22, 22, 22, 10, 99897, NA)),
   suffix = ""
 )

y <- addr_number(
  prefix = "",
  digits = as.character(c(12, 11, 10, 22)),
  suffix = ""
)

match_addr_number(x, y)

match_addr_number(x, y, number_fuzzy_dist = 0L)

Match addr_street vectors

Description

A single addr_street in y is chosen for each addr_street in x. If exact matches (using as.character) are not found, candidate matches are chosen by fuzzy matching on street name (using phonetic street key and street name) and matching the street type and directional components according to match_street_type and match_street_directional. Ordinal street names use restricted phonetic candidates: an ordinal phonetic key like ⁠#0007⁠ may fuzzy match only to plausible ordinal neighbors such as digit shifts (⁠#0070⁠, ⁠#0700⁠, ⁠#7000⁠) or same-width substitutions (⁠#0008⁠, ⁠#0009⁠), not arbitrary OSA-distance-one ordinal keys such as ⁠#0017⁠ or ⁠#0077⁠. If multiple candidates remain after fuzzy matching, the first candidate in y is returned.

addr_street objects with missing or empty ⁠@name⁠ are not matched and returned as missing instead.

Usage

match_addr_street(
  x,
  y,
  name_phonetic_dist = 1L,
  name_fuzzy_dist = 2L,
  match_street_type = c("exact", "compatible", "ignore"),
  match_street_directional = c("exact", "swap", "ignore")
)

Arguments

x, y

addr_street vectors to match

name_phonetic_dist

integer; maximum optimized string alignment distance between phonetic_street_key() of x and y to consider a possible match

name_fuzzy_dist

integer; maximum optimized string alignment distance between ⁠@name⁠ of x and y to consider a possible match

match_street_type

character; how to compare street pretype and posttype when selecting street candidates. "exact" requires pretype to match pretype and posttype to match posttype; "compatible" treats blank type fields as unknown but rejects candidates when known type information conflicts; "ignore" does not use street type fields when selecting candidates.

match_street_directional

character; how to compare street predirectional and postdirectional when selecting street candidates. "exact" requires predirectional to match predirectional and postdirectional to match postdirectional; "swap" also permits predirectional to match postdirectional and postdirectional to match predirectional; "ignore" does not use street directional fields when selecting candidates.

Value

An addr_street vector, the same length as x, containing the selected match in y for each element of x. Unmatched elements are returned as missing addr_street() values.

Examples

my_streets <- addr_street(
   predirectional = "",
   premodifier = "",
   pretype = "",
   name = c("Beechview", "Vivian", "Springfield", "Round Bottom", "Pfeiffer", "Beachview",
            "Vevan", "Srpingfield", "Square Top", "Pfeffer", "Wuhlper", ""),
  posttype = c("Cir", "Pl", "Pike", "Rd", "Rd", "Cir", "Pl", "Pike", "Rd", "Rd", "Ave", ""),
  postdirectional = ""
 )
the_streets <- nad_example_data()$nad_addr@street
match_addr_street(my_streets, the_streets)

toggle_y <- addr_street(
  predirectional = c("E", "", "", "E"),
  premodifier = "",
  pretype = c("", "", "US Hwy", "US Hwy"),
  name = c("14th", "Oak", "Main", "Main"),
  posttype = c("St", "Rd", "Rd", "Rd"),
  postdirectional = c("", "", "", "E"),
  map_pretype = FALSE,
  map_posttype = FALSE,
  map_directional = FALSE,
  map_ordinal = FALSE
)

# directionals are required by default, so blank "14th St" stays unmatched
format(match_addr_street(
  addr_street(
    predirectional = "",
    premodifier = "",
    pretype = "",
    name = "14th",
    posttype = "St",
    postdirectional = "",
    map_pretype = FALSE,
    map_posttype = FALSE,
    map_directional = FALSE,
    map_ordinal = FALSE
  ),
  toggle_y
))
format(match_addr_street(
  addr_street(
    predirectional = "",
    premodifier = "",
    pretype = "",
    name = "14th",
    posttype = "St",
    postdirectional = "",
    map_pretype = FALSE,
    map_posttype = FALSE,
    map_directional = FALSE,
    map_ordinal = FALSE
  ),
  toggle_y,
  match_street_directional = "ignore"
))

# type can also be ignored during fuzzy street-name matching
format(match_addr_street(
  addr_street(
    predirectional = "",
    premodifier = "",
    pretype = "",
    name = "Oka",
    posttype = "Ave",
    postdirectional = "",
    map_pretype = FALSE,
    map_posttype = FALSE,
    map_directional = FALSE,
    map_ordinal = FALSE
  ),
  toggle_y,
  name_fuzzy_dist = 1L
))
format(match_addr_street(
  addr_street(
    predirectional = "",
    premodifier = "",
    pretype = "",
    name = "Oka",
    posttype = "Ave",
    postdirectional = "",
    map_pretype = FALSE,
    map_posttype = FALSE,
    map_directional = FALSE,
    map_ordinal = FALSE
  ),
  toggle_y,
  name_fuzzy_dist = 1L,
  match_street_type = "ignore"
))

# compatible type matching allows blanks to stand in for unknown type fields
type_y <- addr_street(
  predirectional = "",
  premodifier = "",
  pretype = c("Ave", "Rd"),
  name = "Main",
  posttype = "",
  postdirectional = "",
  map_pretype = FALSE,
  map_posttype = FALSE,
  map_directional = FALSE,
  map_ordinal = FALSE
)
format(match_addr_street(
  addr_street(
    predirectional = "",
    premodifier = "",
    pretype = "",
    name = "Main",
    posttype = "Ave",
    postdirectional = "",
    map_pretype = FALSE,
    map_posttype = FALSE,
    map_directional = FALSE,
    map_ordinal = FALSE
  ),
  type_y,
  match_street_type = "compatible"
))

# type and directional matching can be relaxed independently
format(match_addr_street(
  addr_street(
    predirectional = "E",
    premodifier = "",
    pretype = "US Hwy",
    name = "Mian",
    posttype = "Rd",
    postdirectional = "E",
    map_pretype = FALSE,
    map_posttype = FALSE,
    map_directional = FALSE,
    map_ordinal = FALSE
  ),
  toggle_y,
  match_street_type = "ignore",
  name_fuzzy_dist = 1L
))
format(match_addr_street(
  addr_street(
    predirectional = "E",
    premodifier = "",
    pretype = "US Hwy",
    name = "Mian",
    posttype = "Rd",
    postdirectional = "E",
    map_pretype = FALSE,
    map_posttype = FALSE,
    map_directional = FALSE,
    map_ordinal = FALSE
  ),
  toggle_y,
  name_fuzzy_dist = 1L
))
format(match_addr_street(
  addr_street(
    predirectional = "E",
    premodifier = "",
    pretype = "US Hwy",
    name = "Mian",
    posttype = "Rd",
    postdirectional = "E",
    map_pretype = FALSE,
    map_posttype = FALSE,
    map_directional = FALSE,
    map_ordinal = FALSE
  ),
  toggle_y,
  name_fuzzy_dist = 1L,
  match_street_type = "exact",
  match_street_directional = "exact"
))

Match ZIP codes

Description

A single ZIP code in y is chosen for each ZIP code in x. By default, if exact matches are not found, common variants of ZIP codes in x are searched for in y (?zipcode_variant) If multiple variants are present in y, the selected match has the lowest absolute numeric difference from the ZIP code in x; ties are broken by OSA string distance and then by the minimum number.

Usage

match_zipcodes(
  x,
  y,
  zip_variants = TRUE,
  zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap")
)

Arguments

x, y

character vectors of ZIP codes to match

zip_variants

logical; fuzzy match to common variants of x in y?

zip_variant

character vector; zipcode variant types to use when zip_variants is TRUE; see ?zipcode_variant

Value

A character vector, the same length as x, containing the selected match in y for each ZIP code in x.

Examples

match_zipcodes(
  c("45222", "45219", "45219", "45220", "45220", "", NA),
  c("42522", "45200", "45219", "45221", "45223", "45321", "")
)

match_zipcodes(
  c("45222", "45219", "45219", "45220", "45220", "", NA),
  c("42522", "45200", "45219", "45221", "45223", "45321", ""),
  zip_variants = FALSE
)

Read National Address Database (NAD) tables into R

Description

The U.S. Department of Transportation partners with address programs from state, local, and tribal governments to compile their authoritative data into a database. Find more information here: https://www.transportation.gov/gis/national-address-database

nad_read() reads data from the NAD geodatabase by county, using source data already downloaded with nad_download() or downloading it when refresh_source = "yes", and readies it for R. Counties can be identified either by county name plus state, or by a 5-digit county FIPS identifier. County names and state abbreviations are resolved internally and still determine the cache path and source query. The NAD geodatabase has a very large size on disk (~10 GB).

Data binaries are the cached outputs of nad_read() for each County/State and are created on first run with nad(). Download data binaries to the tools::R_user_dir() data directory, or point R to these files on disk, to read NAD tables without downloading the nationwide NAD geodatabase. (Files are organized by major package version, NAD version, state, and named by county; e.g., see list.files(tools::R_user_dir("addr", "data"), recursive = TRUE))

Usage

nad(
  county,
  state = NULL,
  version = 22L,
  refresh_binary = c("yes", "no", "force"),
  refresh_source = c("no", "yes", "force")
)

nad_read(
  county,
  state = NULL,
  version = 22L,
  refresh_source = c("no", "yes", "force")
)

nad_download(version = 22L, refresh_source = c("yes", "no", "force"))

Arguments

county

character, length one; county name or 5-digit county FIPS identifier

state

character, length one; name or abbreviation of state. Required when county is a county name; ignored when county is a 5-digit county FIPS identifier

version

integer, length one; NAD revision to use. Defaults to 22L, revision 22 of the National Address Database.

refresh_binary

character, length one; choose how to refresh NAD data binaries cached on disk if not already present; "yes" will create data binary if not already present, "no" will error if data binary is not already present, "force" will create the data binary and overwrite any existing data binary

refresh_source

character, length one; choose how to refresh NAD source geodatabase on disk if not already present; "yes" will download the geodatabase if not already present, "no" will error if the file does not already exist, "force" will download and overwrite any existing geodatabase

Details

NAD source geodatabases are downloaded from the transportation.gov data portal: https://data.transportation.gov/d/yw36-suxr Downloads use the R curl package and resume from any interrupted partial download left in the addr user data directory. If the download cannot complete, nad_download() will also work with a NAD ZIP file that was downloaded another way and placed where tools::R_user_dir("addr", "data") can find it. For the original schema, see https://www.transportation.gov/sites/dot.gov/files/2023-07/NAD_Schema_202304.pdf Before downloading, please read the disclaimer here: https://www.transportation.gov/mission/open/gis/national-address-database/national-address-database-nad-disclaimer

Investigate individual address points in the online viewer: https://usdot.maps.arcgis.com/apps/instant/portfolio/index.html?appid=59f7e4fb71994d13b61f424e21a6cffe

The NAD does not distinguish between empty and missing address components. When reading into R, all missing address components are replaced with an empty string ("") except for address number (digits), street name, and ZIP code. Addresses with malformed ZIP codes are removed.

Examples

# explicitly download source data, then cache county output on first read
## Not run: 
  nad_download(version = 22L)
  nad("Butler", "OH")
  nad("39017")

## End(Not run)

# example data preloaded for Hamilton County, OH
# works without downloading NAD gdb first
Sys.setenv(R_USER_DATA_DIR = tempfile())
nad("Hamilton", "OH", refresh_source = "no", refresh_binary = "no")
nad("39061", refresh_source = "no", refresh_binary = "no")

Example National Address Database addresses

Description

An example of the data returned using nad() for Hamilton County, Ohio (NAD version 22L). See ?nad for more information about the National Address Database.

nad("Hamilton", "OH", refresh_source = "no", refresh_binary = "no") and nad("39061", refresh_source = "no", refresh_binary = "no") are equivalent to nad_example_data().

Usage

nad_example_data(match_prepared = FALSE)

Arguments

match_prepared

logical; return the example data preprocessed with addr_match_prepare()?

Value

If match_prepared = FALSE, a tibble with 349,407 rows and 7 columns. If match_prepared = TRUE, an addr_match_index.

Examples

nad_example_data()
nad_example_data(match_prepared = TRUE)

Convert street names into phonetic matching keys

Description

Ordinal street names (e.g., "11TH", "5TH") are encoded as zero-padded numeric identifiers with a special prefix, while non-ordinal street names are encoded using a Soundex phonetic code (see ?stringdist::phonetic). Ordinal words (e.g., "Eleventh", "Fifth") are detected and converted automatically. Each phonetic key is exactly four characters long.

Usage

phonetic_street_key(x)

Arguments

x

character vector

Value

character vector

Examples

phonetic_street_key(
  c("MEADOWLARK", "TOWNSEND", "IMMACULATE", "7TH", "WERK",
    "PAXTON", "5th", "BURNET", "FIFTH", "CLIFTON")
)

Launch the address parsing and matching explorer

Description

Opens a Shiny app that shows how an input address is tagged with tag_usaddress(), normalized by as_addr(), and then matched in stages against nad_example_data().

Usage

run_addr_explorer(launch.browser = interactive())

Arguments

launch.browser

logical; passed to shiny::runApp()

Value

Invisibly returns the result of shiny::runApp()

Examples

## Not run: 
  run_addr_explorer()

## End(Not run)

Launch the address geocoding explorer

Description

Opens a minimal Shiny app that geocodes one typed address with geocode() and maps the result with leaflet.

Usage

run_geocode_explorer(launch.browser = interactive())

Arguments

launch.browser

logical; passed to shiny::runApp()

Value

Invisibly returns the result of shiny::runApp()

Examples

## Not run: 
  run_geocode_explorer()

## End(Not run)

TIGER Address Features dataset

Description

taf() uses the arrow package to open the hive-partitioned parquet dataset of TIGER address features in the addr user data directory. Arrow FileSystemDataset objects are database-like backends for larger-than-memory datasets and support dplyr syntax for data manipulation; see https://arrow.apache.org/docs/r/articles/data_wrangling.html. Other TAF helpers such as taf_catalog(), taf_install(), and taf_zip() use nanoparquet directly for flat parquet file reads and writes. Arrow is only required for the advanced dataset interface returned by taf().

Usage

taf(year = as.character(2025:2011), version = "v1")

taf_install(
  county,
  year = as.character(2025:2011),
  version = "v1",
  overwrite = FALSE,
  redownload = FALSE
)

Arguments

year

integer, length one; vintage of TIGER addrfeat (address feature) files

version

character, length one; major version of the package and taf dataset schema

county

character, length 1; county FIPS code

overwrite

logical, length 1; overwrite an existing county install?

redownload

logical, length 1; re-download cached TIGER ZIP files?

Details

taf_install() downloads and links TIGER address features and feature names for a specific year and county, installing the resulting file in the addr user data directory. About 6% of ADDRFEAT rows do not have a county-local primary FEATNAMES match by LINEARID. In these cases, street tags are parsed from the ADDRFEAT full name, and the street_tag_parsed column is set to TRUE.

Value

a Dataset R6 object (see ?arrow::open_dataset); use dplyr verbs to query the data and get results, see examples

Examples

Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")

taf()

# use dplyr verbs to query
library(dplyr, warn.conflicts = FALSE)

# find top ten most frequent street name-posttype combinations
taf() |>
  group_by(street_name, street_posttype) |>
  summarize(
    n_zips = n_distinct(ZIP),
    n_ranges = n(),
    .groups = "drop"
  ) |>
  arrange(desc(n_zips), desc(n_ranges)) |>
  collect() |>
  slice(1:10)
Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")

Read TIGER address feature ZIP/county catalog

Description

taf_catalog() reads a TIGER-derived catalog of ZIP codes present in each county's TIGER address feature file for a specific year and addr TAF schema version. The catalog is installed with the package and is used to plan which county TAF files may be needed for a set of ZIP codes. It is separate from the local install manifest, which records only files installed on the current machine.

Usage

taf_catalog(year = as.character(2025:2011), version = "v1")

Arguments

year

integer, length one; vintage of TIGER addrfeat (address feature) files

version

character, length one; major version of the package and taf dataset schema

Value

a tibble with county_fips, ZIP, zip3, zip2, and n_ranges columns

Examples

taf_catalog("2025")

Find and install TAF counties needed for ZIP codes

Description

taf_needed_counties() uses taf_catalog() to identify county TAF files that may contain address ranges for ZIP codes in x, including selected ZIP code variants when requested. taf_ensure() installs any of those counties that are not already present in the local TAF manifest.

Usage

taf_needed_counties(
  x,
  year = as.character(2025:2011),
  version = "v1",
  zip_variants = TRUE,
  zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap")
)

taf_ensure(
  x,
  year = as.character(2025:2011),
  version = "v1",
  zip_variants = TRUE,
  zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
  redownload = FALSE
)

Arguments

x

an addr vector (?as_addr) or character vector of ZIP codes

year

integer, length one; vintage of TIGER addrfeat (address feature) files

version

character, length one; major version of the package and taf dataset schema

zip_variants

logical; fuzzy match to common variants of x in y?

zip_variant

character vector; zipcode variant types to use when zip_variants is TRUE; see ?zipcode_variant

redownload

logical, length 1; re-download cached TIGER ZIP files?

Value

taf_needed_counties() returns a tibble with catalog columns plus source_zip and source_zip_variant. taf_ensure() invisibly returns the subset of needed counties that were missing before installation.

Examples

taf_needed_counties(as_addr("10 MAIN ST CINCINNATI OH 45220"))

Read taf() data for ZIP codes across all installed counties

Description

taf_zip() reads and transforms taf() data for a subset of ZIP codes. It reconstructs the county_fips, s2_geography, and addr_street vectors in the returned data frame.

Usage

taf_zip(x, map = TRUE, year = as.character(2025:2011), version = "v1")

Arguments

x

character vector of five-digit ZIP codes

map

logical, length 1; map street tags read from taf() data (type, directional, ordinal) when converting to addr_street() vector?

year

character, length 1; vintage of TIGER addrfeat (address feature) files

version

character, length 1; major version of the package and taf dataset schema

Value

a tibble with LINEARID, FULLNAME, side, ZIP, FROMHN, TOHN, PARITY, OFFSET, s2_geography, addr_street, county_fips, and street_tag_parsed columns

Examples

Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")
taf_zip(c("45249", "45230", "45220"))

Tag US addresses

Description

Addresses are tagged using the usaddress conditional random field in a rust port of usaddress. Possible address labels include:

  • AddressNumberPrefix

  • AddressNumberSuffix

  • AddressNumber

  • BuildingName

  • CornerOf

  • IntersectionSeparator

  • LandmarkName

  • NotAddress

  • OccupancyIdentifier

  • OccupancyType

  • PlaceName

  • Recipient

  • StateName

  • StreetNamePostDirectional

  • StreetNamePostType

  • StreetNamePreDirectional

  • StreetNamePreModifier

  • StreetNamePreType

  • StreetName

  • SubaddressIdentifier

  • SubaddressType

  • USPSBoxGroupID

  • USPSBoxGroupType

  • USPSBoxID

  • USPSBoxType

  • ZipCode

Find more information about the definitions at https://www.fgdc.gov/standards/projects/address-data

Usage

tag_usaddress(x = NA_character_, clean = TRUE)

Arguments

x

character string of addresses

clean

logical; clean address text with clean_address_text() before tagging?

Value

a list of vectors of named address tags

Examples

tag_usaddress(
  c("290 Ludlow Avenue Apt 2 Cincinnati OH 45220",
  "3333 Burnet Ave Cincinnati Ohio 45219",
  "120 North Main Street, Greenville, SC 29601",
  "200 Southwest North Street, Topeka, KS 66603",
  "215 Highway 88 Road, Jackson, CA 95642"
  )
)

# edge cases!
tag_usaddress(
  c(
    "1600 Pennsylvania Avenue NW, Washington, DC 20500", # post-directional quadrant
    "1 Infinite Loop, Cupertino, CA 95014", # corporate campus street name
    "210 East 400 South, Salt Lake City, UT 84111", # grid addressing (Utah)
    "N6W23001 Bluemound Road, Wauwatosa, WI 53226", # address number prefix grid (Wisconsin)
    "350 Fifth Avenue, New York, NY 10118", # ordinal street name
    "4059 Mt Lee Drive, Hollywood, CA 90068", # abbreviated street element
    "233 South Wacker Drive, Chicago, IL 60606", # pre-directional
    "700 Exposition Park Drive, Los Angeles, CA 90037", # multi-word street name
    "2 South Biscayne Boulevard, Miami, FL 33131" # directional + boulevard
  )
)

Get s2_geography for tiger street ranges

Description

TIGER address features (street address ranges) are read from compressed addrfeat (address feature) shapefiles for each county and Census vintage. If not already present, compressed addrfeat shapefiles are downloaded from the Census FTP site to the addr user data directory.

When reading into R, the data is converted to one row per street side (L/R) for use by taf_install().

Usage

tiger_addr_feat(county, year, redownload = FALSE)

Arguments

county

character string of county FIPS identifier

year

character year of the Census TIGER/Line product

redownload

logical, length 1; re-download the cached TIGER ZIP file?

Value

a tibble with LINEARID, FULLNAME, side, ZIP, FROMHN, TOHN, PARITY, OFFSET, and s2_geography columns

Examples

tiger_addr_feat("39061", "2025")

Get names for tiger street ranges

Description

TIGER primary feature names are read from compressed feature-name databases for each county and Census vintage. If not already present, compressed addrfeat (address feature) shapefiles are downloaded from the Census FTP site to the addr user data directory.

When reading into R, the data is filtered to addressable MTFCCs (S1100, S1200, S1400, S1640) that have a name.

Usage

tiger_feat_names(county, year, redownload = FALSE)

Arguments

county

character string of county FIPS identifier

year

character year of the Census TIGER/Line product

redownload

logical, length 1; re-download the cached TIGER ZIP file?

Value

a tibble with unique LINEARID and addr columns

Examples

tiger_feat_names("39061", "2025")

Example addresses

Description

voter_addresses() returns an example character vector of real-world addresses downloaded from the Hamilton County, Ohio voter registration database on 2024-09-12. AddressPreDirectional, AddressNumber, AddressStreet, AddressSuffix, CityName, "OH", and AddressZip were pasted together to create 242,133 unique registered-voter addresses.

Usage

voter_addresses()

Value

a character vector

Examples

voter_addresses() |>
  head()

Create ZIP code variants

Description

An input ZIP code is used to generate variants (for, e.g., 45220):

  • minus1: subtracting one from zipcode (45219)

  • plus1: adding one to zipcode (45221)

  • sub5: substituting the fifth digit of the ZIP code (45221, 45222, 45223, 45224, 45225, 45226, 45227, 45228, 45229)

  • sub4: substituting the fourth digit of the ZIP code (45200, 45210, 45230, 45240, 45250, 45260, 45270, 45280, 45290)

  • swap: swapping the second and third digits of the ZIP code (42520)

More than one variant type can be created at once and variants will be returned in the same order as they were requested (see examples).

Usage

zipcode_variant(x, variant = c("minus1", "plus1", "sub5", "sub4", "swap"))

Arguments

x

character length one; five digit ZIP code

variant

character one or more variants to create; see description

Value

character vector of five digit ZIP code variants

Examples

zipcode_variant("45220")

# order matters!
zipcode_variant("45220", c("minus1", "plus1"))
zipcode_variant("45220", c("plus1", "minus1"))

zipcode_variant("45220", "sub5")

zipcode_variant("45220", "swap")