A look at geocodio

There is a file data-raw/01_tx_gun_accidents_geocodio.csv that is the result of sending data-out/01-tx.csv through geocodio. I added the census tract, demographics and economic columns. It brought back a ton of info … more than necessary. Will try to cut that down here.

They also have an R package rgeocodio that is worth exploring. Might be able to specify just the columns needed.

library(tidyverse)
library(janitor)

Import the geocodio-returned data

tx_geocodio <- read_csv("data-raw/01_tx_gun_accidents_geocodio.csv") %>% clean_names()

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   incident_date = col_date(format = ""),
##   address = col_character(),
##   city = col_character(),
##   county = col_character(),
##   state = col_character(),
##   operations = col_character(),
##   city_or_county = col_character(),
##   `Accuracy Type` = col_character(),
##   Street = col_character(),
##   City = col_character(),
##   State = col_character(),
##   County = col_character(),
##   Zip = col_character(),
##   Country = col_character(),
##   Source = col_character(),
##   `Census Tract Code` = col_character(),
##   `Metro/Micro Statistical Area Name` = col_character(),
##   `Metro/Micro Statistical Area Type` = col_character(),
##   `Combined Statistical Area Name` = col_character(),
##   `Metropolitan Division Area Name` = col_character()
## )

## See spec(...) for full column specifications.

## Warning: 4 parsing failures.
## row col    expected      actual                                        file
##  14  -- 302 columns 299 columns 'data-raw/01_tx_gun_accidents_geocodio.csv'
## 124  -- 302 columns 299 columns 'data-raw/01_tx_gun_accidents_geocodio.csv'
## 139  -- 302 columns 299 columns 'data-raw/01_tx_gun_accidents_geocodio.csv'
## 175  -- 302 columns 299 columns 'data-raw/01_tx_gun_accidents_geocodio.csv'

It looks like this failed to parse 4 row: 14, 124, 139, 175. They were short columns?

Peek at data

Man, there are a lot of columns

tx_geocodio %>% head()

Select the columns we need

I’m going to print the names out to csv to take a closer look at them.

tx_geocodio %>% names() %>% as_tibble() %>% write_csv("data-out/02_tx_geocodio_names.csv")

## Warning: Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `tibble::enframe(name = NULL)` instead.
## This warning is displayed once per session.

tx_geocodio_select <- tx_geocodio %>% 
  select(
    id,
    incident_date,
    address,
    city,
    county,
    state,
    number_killed,
    number_injured,
    operations,
    city_or_county,
    number,
    street,
    city_2,
    zip,
    county_2,
    acs_economics_number_of_households_total_value,
    acs_economics_number_of_households_total_margin_of_error,
    acs_economics_median_household_income_total_value,
    acs_economics_median_household_income_total_margin_of_error,
    accuracy_score,
    accuracy_type,
    state_2,
    latitude,
    longitude,
    state_fips,
    county_fips,
    place_fips,
    census_tract_code,
    census_block_code,
    census_block_group,
    full_fips
  ) %>% 
  rename(
    households_total = acs_economics_number_of_households_total_value,
    households_moe = acs_economics_number_of_households_total_margin_of_error,
    median_income_total = acs_economics_median_household_income_total_value,
    median_income_moe = acs_economics_median_household_income_total_margin_of_error
  )

Export the geocoded file

tx_geocodio_select %>% 
  write_csv("data-out/02_tx_geocodio_select.csv")

tx_geocodio_select %>% 
  write_rds("data-out/02_tx_geocodio_select.rds")

Understanding the accuracy of geocoding results

Cribbed from geocodio docs.

Consider accuracy_type first as it has a heirarchy, with the best being rooftop and worst being state.

Value	Description
rooftop	The exact point was found with rooftop level accuracy
point	The exact point was found from address range interpolation where the range contained a single point
range_interpolation	The point was found by performing address range interpolation
nearest_rooftop_match	The exact house number was not found, so a close, neighboring house number was used instead
street_center	The result is a geocoded street centroid
place	The point is a city/town/place
state	The point is a state

Each geocoded result is returned with an accuracy score, which is a decimal number ranging from 0.00 to 1.00. The higher the score, the better the result.

Generally, accuracy scores that are larger than or equal to 0.8 are the most accurate, whereas results with lower accuracy scores might be very rough matches.

That said, a 1.0 perfect match for the “state” type is not a good result, because that is just the center of the state.

If you want to consider which of the results have accurate data to the zip code level, I would exclude some records in a couple of ways:

Exclude all that have accuracy_type of “place” or “state”. That leaves 137 results.
Then consider the accuracy_score. 0.5 or greater gives you 133 records. Go to 0.8 and it drops to 96. Could always hand-check those lower-rated results to see if they past muster.
Geocodio only return census info if “rooftop, range_interpolation, nearest_street, point, nearest_rooftop_match, street_center”. When they do, it’s for the census block. There are 127 records (of 270) that have that data.

'%ni%' <- Negate('%in%')
tx_geocodio_select %>% 
  filter(accuracy_type  %ni% c("place", "state"),
         accuracy_score >= .8)

So, in the end the geogoding results are not any better than with MapChi …

which results have median_income

tx_geocodio_select %>% 
  select(accuracy_type, median_income_total) %>% 
  filter(!is.na(median_income_total))