Title: | Convert real-world street addresses to county parcel identifiers |
---|---|
Description: | Functions in parcel include cleaning, parsing, and creating shortened 'address stubs' to match real-world addresses to county-provided addresses with known parcel identifiers. |
Authors: | Cole Brokamp [aut, cre] |
Maintainer: | Cole Brokamp <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.11.1 |
Built: | 2024-11-09 04:38:57 UTC |
Source: | https://github.com/geomarker-io/parcel |
convert to lowercase, remove non-alphanumeric characters and excess whitespace (adapted from degauss-org/dht::clean_address)
clean_address(.x)
clean_address(.x)
.x |
a vector of address character strings |
a vector of cleaned addresses
Input addresses are tagged into components and the street_number
and street_name
components are pasted together to create the address stub.
If either the street_number or street_name are missing
then the address_stub will be returned as missing.
If filter_zip
is TRUE, then addresses without a parsed
5-digit ZIP code in Hamilton County will have a missing address stub.
create_address_stub(.x, filter_zip = TRUE, ...)
create_address_stub(.x, filter_zip = TRUE, ...)
.x |
a vector of address character strings |
filter_zip |
force addresses with non-Hamilton ZIP codes to have a missing address_stub? |
... |
further arguments passed onto |
a vector of cleaned address stubs (street_number + street_name)
This helper function produces a tibble of parcel data for an input vector of addresses.
The link_parcel()
function returns all possible matches above the threshold
for each
input address and this function chooses the single best match based on the maximum score.
Note that one address can be linked to more than one parcel with the same match score (e.g.,
"323 Fifth" on https://wedge3.hcauditor.org/search_results). In this case,
a special identifier, TIED_MATCHES
is returned instead of a missing parcel_id
.
Addresses are subsequently tried to be matched with a known apartment
complex using link_apt()
. (Matched apartment complex psuedo-identifers take precedence over
matched parcel identifers.)
The hamilton_online_parcels
tabular data resource is also linked based on parcel_id
.
For finer control of selecting matched parcels based on scores, use link_parcel()
and link_apt()
get_parcel_data(x)
get_parcel_data(x)
x |
a vector of address character strings |
a tibble with the input_address
es defined in x
in the first column,
and columns corresponding to matched parcel characteristics from CAGIS and Auditor Online Summary website
To match a parcel to an apartment complex pseudo-identifier, it must contain:
a Hamilton County ZIP code
a street name matching the street names in parcel:::apt_defs
a street number within the ranges for each pseudo-identifier in parcel:::apt_defs
link_apt(x)
link_apt(x)
x |
a single address character string |
apt pseudo-identifier character string; NA
if not matched
This function uses the trained dedupe model included with the package to link one or more parcel identifiers to a vector of input addresses.
link_parcel(x, threshold = 0.2)
link_parcel(x, threshold = 0.2)
x |
a vector of address character strings |
threshold |
potential matches will only be returned if their |
Note that one address can be linked to more than one parcel (e.g., "323 Fifth" on https://wedge3.hcauditor.org/search_results). In this case, the input address will have multiple rows, one for each of the multiple matches.
a tibble with a column of unique, matched addresses input as x
along
with columns for their parcel_id
(s) and matching score
(s) (use this as a lookup
table for assigning parcel_id in other workflows, making decisions about what to do
with multiple matches and matching thresholds, etc.)
This function relies on usaddress
python library https://usaddress.readthedocs.io/en/latest/
It can be installed to a python virtual environment specific to R with:
py_install("usaddress", pip = TRUE)
(See the README for more details on installing
and managing non-system installations of python with reticulate.
tag_address(address, clean = TRUE)
tag_address(address, clean = TRUE)
address |
a character string that is a United States mailing address |
clean |
clean addresses with |
This function uses a custom tag mapping to combine address components into the columns in the returned tibble (see https://usaddress.readthedocs.io/en/latest/#details for full definition of components):
street_number
: AddressNumber
, AddressNumberPrefix
, AddressNumberSuffix
street_name
: StreetName
, StreetNamePreDirectional
, StreetNamePostDirectional
, StreetNamePostModifier
, StreetNamePostType
city
: PlaceName
state
: StateName
zip
: the first five characters of ZipCode
If an address is not classified as a Street Address
(i.e. Intersection
, PO Box
, or Ambiguous
),
then the columns in the returned component tibble will all be missing.
a tibble with street_number
, street_name
, city
, state
, and zip_code
columns