There is a file data-raw/01_tx_gun_accidents_geocodio.csv that is the result of sending data-out/01-tx.csv through geocodio. I added the census tract, demographics and economic columns. It brought back a ton of info … more than necessary. Will try to cut that down here.
They also have an R package rgeocodio that is worth exploring. Might be able to specify just the columns needed.
library(tidyverse)
library(janitor)
tx_geocodio <- read_csv("data-raw/01_tx_gun_accidents_geocodio.csv") %>% clean_names()
## Parsed with column specification:
## cols(
## .default = col_double(),
## incident_date = col_date(format = ""),
## address = col_character(),
## city = col_character(),
## county = col_character(),
## state = col_character(),
## operations = col_character(),
## city_or_county = col_character(),
## `Accuracy Type` = col_character(),
## Street = col_character(),
## City = col_character(),
## State = col_character(),
## County = col_character(),
## Zip = col_character(),
## Country = col_character(),
## Source = col_character(),
## `Census Tract Code` = col_character(),
## `Metro/Micro Statistical Area Name` = col_character(),
## `Metro/Micro Statistical Area Type` = col_character(),
## `Combined Statistical Area Name` = col_character(),
## `Metropolitan Division Area Name` = col_character()
## )
## See spec(...) for full column specifications.
## Warning: 4 parsing failures.
## row col expected actual file
## 14 -- 302 columns 299 columns 'data-raw/01_tx_gun_accidents_geocodio.csv'
## 124 -- 302 columns 299 columns 'data-raw/01_tx_gun_accidents_geocodio.csv'
## 139 -- 302 columns 299 columns 'data-raw/01_tx_gun_accidents_geocodio.csv'
## 175 -- 302 columns 299 columns 'data-raw/01_tx_gun_accidents_geocodio.csv'
It looks like this failed to parse 4 row: 14, 124, 139, 175. They were short columns?
Man, there are a lot of columns
tx_geocodio %>% head()
I’m going to print the names out to csv to take a closer look at them.
tx_geocodio %>% names() %>% as_tibble() %>% write_csv("data-out/02_tx_geocodio_names.csv")
## Warning: Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `tibble::enframe(name = NULL)` instead.
## This warning is displayed once per session.
tx_geocodio_select <- tx_geocodio %>%
select(
id,
incident_date,
address,
city,
county,
state,
number_killed,
number_injured,
operations,
city_or_county,
number,
street,
city_2,
zip,
county_2,
acs_economics_number_of_households_total_value,
acs_economics_number_of_households_total_margin_of_error,
acs_economics_median_household_income_total_value,
acs_economics_median_household_income_total_margin_of_error,
accuracy_score,
accuracy_type,
state_2,
latitude,
longitude,
state_fips,
county_fips,
place_fips,
census_tract_code,
census_block_code,
census_block_group,
full_fips
) %>%
rename(
households_total = acs_economics_number_of_households_total_value,
households_moe = acs_economics_number_of_households_total_margin_of_error,
median_income_total = acs_economics_median_household_income_total_value,
median_income_moe = acs_economics_median_household_income_total_margin_of_error
)
tx_geocodio_select %>%
write_csv("data-out/02_tx_geocodio_select.csv")
tx_geocodio_select %>%
write_rds("data-out/02_tx_geocodio_select.rds")
Cribbed from geocodio docs.
Consider accuracy_type first as it has a heirarchy, with the best being rooftop and worst being state.
| Value | Description |
|---|---|
| rooftop | The exact point was found with rooftop level accuracy |
| point | The exact point was found from address range interpolation where the range contained a single point |
| range_interpolation | The point was found by performing address range interpolation |
| nearest_rooftop_match | The exact house number was not found, so a close, neighboring house number was used instead |
| street_center | The result is a geocoded street centroid |
| place | The point is a city/town/place |
| state | The point is a state |
Each geocoded result is returned with an accuracy score, which is a decimal number ranging from 0.00 to 1.00. The higher the score, the better the result.
Generally, accuracy scores that are larger than or equal to 0.8 are the most accurate, whereas results with lower accuracy scores might be very rough matches.
That said, a 1.0 perfect match for the “state” type is not a good result, because that is just the center of the state.
If you want to consider which of the results have accurate data to the zip code level, I would exclude some records in a couple of ways:
accuracy_type of “place” or “state”. That leaves 137 results.accuracy_score. 0.5 or greater gives you 133 records. Go to 0.8 and it drops to 96. Could always hand-check those lower-rated results to see if they past muster.'%ni%' <- Negate('%in%')
tx_geocodio_select %>%
filter(accuracy_type %ni% c("place", "state"),
accuracy_score >= .8)
So, in the end the geogoding results are not any better than with MapChi …
tx_geocodio_select %>%
select(accuracy_type, median_income_total) %>%
filter(!is.na(median_income_total))