MapChi package test

This is an exploration of th MapChi tools to geocode data. The goal is to get zip code and census tract values from the data.

The result is we can get ZIP for about half of the addresses, but it does not return the census tract value at all.

# install.packages("devtools")
# library(devtools)
# install_github("dmwelgus/MapChi")

library(tidyverse)
library(MapChi)

Testing MapChi

Example address file

I downloaded test data from the census bureau for how their geocode works. Note the original does not have a header column, so I added col names so I could see what they are. You do NOT want a header column in the data you send through census_geo.

census_example <- read_csv(
    "data-raw/test-addresses.csv",
    col_names = c("id", "address", "city", "state", "zip"),
    col_types = cols(zip = col_character())
  )
                 
census_example

IIRC, At some point I tested the geocoder without a zip column (cause we don’t have them in gunviolence) and it would not process properly. You have to at least have a blank column there.

Geocode test

This runs the MapChi census_geo function on the census bureau test file to see the expected return. Note that it does NOT return a census tract, so this won’t be able to give us that.

test_geocoded <- census_geo("data-raw/test-addresses.csv")
test_geocoded %>% names

##  [1] "id"        "o_address" "o_city"    "o_state"   "o_zip"    
##  [6] "status"    "quality"   "m_address" "m_city"    "m_state"  
## [11] "m_zip"     "long"      "lat"       "not_sure"  "L_R"

test_geocoded

Note the first column there with all the values. I can’t remove that through select(). I did end up later removing by writing the data frame to csv and then reimporting, which I do below for the tx gun violence data.

Import tx data

Imports the cleaned data.

tx <- read_rds("data-out/01_tx.rds")

Set up columns and export to csv

The census_geo() function expects a csv file with spedific columns, so we create that here and write it out.

tx %>% 
  mutate(
    zip = "",
    tx = "TX"
  ) %>% 
  select(
    id,
    address,
    city,
    tx,
    zip
  ) %>% 
  write_csv("data-out/02_tx_addresses.csv", col_names = F)

Test MapChi on our tx addresses

Runs the geocoder.

tx_addresses <- census_geo("data-out/02_tx_addresses.csv")

The resulting dataframe has a weird first column that is a concatenation of all the fields that I can’t remove through select(), so I’m writing out to csv and then reimporting.

# export
tx_addresses %>% 
  write_csv("data-out/02_tx_addresses_geo.csv")
# import
tx_geo <- read_csv("data-out/02_tx_addresses_geo.csv")

## Parsed with column specification:
## cols(
##   id = col_double(),
##   o_address = col_character(),
##   o_city = col_character(),
##   o_state = col_character(),
##   o_zip = col_logical(),
##   status = col_character(),
##   quality = col_character(),
##   m_address = col_character(),
##   m_city = col_character(),
##   m_state = col_character(),
##   m_zip = col_double(),
##   long = col_double(),
##   lat = col_double(),
##   not_sure = col_double(),
##   L_R = col_character()
## )

Explore the resulting data

How did the geocoder fare?

tx_geo %>% 
  count(status, quality)

We got 92 great records and 40 good ones. 138 were not geocoded.

How are the zips

At least the zips we did get start with 7 as they should. 138 records do not have zip codes.

tx_geo %>% 
  count(m_zip)

Compare matched addresses

Allows us to look at the address in the data vs the address used for the geocoding.

tx_geo %>%
  filter(!is.na(quality)) %>% 
  arrange(quality %>% desc()) %>% 
  select(quality, o_address, m_address)

Join and export

We join our geocoded fields back to the original data in case we want to use it later.

Prepare the geocoded data frame to just have the cols we need joined.

tx_geo_2join <- tx_geo %>% 
  arrange(id) %>% 
  select(
    id, m_address, m_city, m_zip, lat, long
  ) %>% 
  mutate(m_zip = m_zip %>% as.character())

Join them

tx_joined <- left_join(tx, tx_geo_2join)

## Joining, by = "id"

Export results

tx_joined %>% write_rds("data-out/02_tx_joined.rds")
tx_joined %>% write_csv("data-out/02_tx_joined.csv")