Chapter 9 Geocoding
9.1 Lats and longs
Geocoding converts addresses into coordinates (latitude and longitude) and vice versa. This helps turn basic address data into spatial points, making it useful for mapping and analyzing location-based patterns.
For example, geocoding can help define catchment areas around an asset or visualize spatial data on a map. To do this, we need access to a large database of addresses and coordinates. Google Maps is a popular choice, offering an API with a freemium model for access.
This guide by Oleksandr Titorchuk also provides a great overview of the geocoding process in R.
9.2 Spatial data
Let’s get started by loading up some packages.
#Loads the required required packages
pacman::p_load(ggmap,
tmaptools,
RCurl,
jsonlite,
tidyverse,
leaflet,
writexl,
readr,
readxl,
sf,
mapview,
rgdal)
##
## There are binary versions available but the source versions are later:
## binary source needs_compilation
## sf 1.0-15 1.0-20 TRUE
## stars 0.5-5 0.6-8 FALSE
## tmaptools 3.1-1 3.2 FALSE
Next, we’ll introduce some address data.
Approved speed camera locations in Victoria, Australia are publicly available through the government website. We can download the dataset as a spreadsheet and import it into R as a csv.
#Read in spreadsheet
url <- 'https://raw.githubusercontent.com/charlescoverdale/bookdata/main/mobile-camera-locations-june-2022.csv'
#Note: We need to import as a csv rather than xlsx for url functionality
camera_address <- read.csv(url, header=TRUE)
#Fix the column labels
camera_address <- janitor::row_to_names(camera_address,1)
#Convert the suburb column to title case
camera_address$SUBURB <- str_to_title(camera_address$SUBURB)
#Concatenate the two fields into a single address field
camera_address$ADDRESS <- paste(camera_address$LOCATION,
camera_address$SUBURB,
sep=", ")
#Add in Australia to the address field just to idiot proof the df
camera_address$ADDRESS <- paste(camera_address$ADDRESS, ", Australia", sep="")
#Preview the data
head(camera_address)
## LOCATION SUBURB Reason Code Audit Date
## 2 Abey Road Cobblebank BCD May-22
## 3 Adam Street Golden Square ACD Apr-22
## 4 Agar Road Coronet Bay BC T
## 5 Airport Drive Melbourne Airport ABCD Apr-22
## 6 Aitken Boulevard Roxburgh Park ABC Apr-22
## 7 Aitken Boulevard Craigieburn ABCD Apr-22
## ADDRESS
## 2 Abey Road, Cobblebank, Australia
## 3 Adam Street, Golden Square, Australia
## 4 Agar Road, Coronet Bay, Australia
## 5 Airport Drive, Melbourne Airport, Australia
## 6 Aitken Boulevard, Roxburgh Park, Australia
## 7 Aitken Boulevard, Craigieburn, Australia
This dataset does not have street numbers - only street names. This makes intuitive sense (as most speed cameras aren’t placed outside residential houses, but rather along main roads).
We can see that the street and suburb names are in different columns. To run a geocode, we want a single field of address data. This makes it easier for the database we’re fuzzy matching against to select the best possible point. Let’s append the street name and the suburb name into a single field.
We know all these addresses are in Australia, so we can add the word ‘Australia’ to the end of each entry in the newly created address field to help the geocoder find the location.
It makes sense that all these addressed should have a postcode in the format 3XXX (as the data set is for the State of Victoria). However, let’s hold off on that assumption for now.
9.3 Google API
Now that we have a useful and highly descriptive address field, we’re ready to run the geocode.
We can geocode the lats and lons using Google’s API through ther ggmap
package.
We must register a API key (by creating an account in the google developer suite).
You can ping the API 2,500 times a day. Lucky for us this dataset is only 1,800 rows long!
Uncomment the chunk below (using Ctrl+Shift+C) and enter your unique key from google to run the code.
# #Input the google API key
# register_google(key = "PASTE YOUR UNIQUE KEY HERE")
#
# #Run the geocode function from ggmap package
# camera_ggmap <- ggmap::geocode(location = camera_address$ADDRESS,
# output = "more",
# source = "google")
#
# #We'll bind the newly created address columns to the original df
# camera_ggmap <- cbind(camera_address, camera_ggmap)
#
# #Print the results
# head(camera_ggmap)
#
# #Write the data to a df
# readr::write_csv(camera_ggmap,"C:/Data/camera_geocoded.csv")
We’ll load up the output this chunk generates and continue.
9.4 Geocode analysis
The raw dataset our geocode produced looks good! Although… it could probably use some cleaning.
Let’s rename this dataset so we don’t loose the original df (and therefore have to run the google query again). This is simply a best practice step to build in some redundancy.
#Rename df
camera_geocoded_clean <- camera_geocoded
#Rename the API generated address field
camera_geocoded_clean <- camera_geocoded_clean %>% rename(address_long=address)
#Select only the columns we need
camera_geocoded_clean <- camera_geocoded_clean %>%
dplyr::select("LOCATION",
"SUBURB",
"ADDRESS",
"lon",
"lat",
"address_long"
)
#Make all the column names the same format
camera_geocoded_clean <- janitor::clean_names(camera_geocoded_clean)
#We still need to convert address_long to title case
camera_geocoded_clean$address_long <- str_to_title(camera_geocoded_clean$address_long)
If we want to go one step further, we can create spatial points from this list of coordinates. This is a good step for eyeballing the data. We see most of it is in Victoria (as expected!)… but it has picked up a couple of points in Sydney and WA.
These are worth investigating separately for correction or exclusion.
Uncomment the mapview function below to see these points on a leaflet map.
#Convert data frame to sf object
camera_points <- sf::st_as_sf(x = camera_geocoded_clean,
coords = c("lon", "lat"),
crs = "+proj=longlat
+datum=WGS84
+ellps=WGS84
+towgs84=0,0,0")
#Plot an interactive map
# mapview(camera_points)
We can export these points as a shapefile using the rgdal
package.Let’s now have a look at our edge case datapoints.
There’s a couple of ways to do this… but one of the simplest is to extract the postcode as a separate field. We can then simply sort by postcodes that do not start with the number 3.
camera_points$postcode <- str_sub(camera_points$address_long,
start= -16,
end= -12)
camera_points$postcode <- as.numeric(camera_points$postcode)
outliers <- camera_points %>% filter (postcode <3000 | postcode >= 4000)
print(outliers)
## Simple feature collection with 7 features and 5 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 115.9555 ymin: -33.90941 xmax: 152.7034 ymax: -25.53849
## Geodetic CRS: +proj=longlat
## +datum=WGS84
## +ellps=WGS84
## +towgs84=0,0,0
## location suburb
## 1 Epping Road Epping
## 2 High Street Epping
## 3 Kensington Road Kensington
## 4 Maryborough Road Ascot
## 5 Maryborough-Ballarat Road Talbot
## 6 Mooroopna-Murchison Road Murchison
## 7 Plumpton Road Plumpton
## address
## 1 Epping Road, Epping, Australia
## 2 High Street, Epping, Australia
## 3 Kensington Road, Kensington, Australia
## 4 Maryborough Road, Ascot, Australia
## 5 Maryborough-Ballarat Road, Talbot, Australia
## 6 Mooroopna-Murchison Road, Murchison, Australia
## 7 Plumpton Road, Plumpton, Australia
## address_long geometry
## 1 Epping Rd, Epping Nsw 2121, Australia POINT (151.0906 -33.77076)
## 2 High St, Epping Nsw 2121, Australia POINT (151.0836 -33.77611)
## 3 Kensington Rd, Kensington Nsw 2033, Australia POINT (151.2211 -33.90941)
## 4 Maryborough Qld 4650, Australia POINT (152.7034 -25.53849)
## 5 Maryborough Qld 4650, Australia POINT (152.7034 -25.53849)
## 6 Murchison Wa 6630, Australia POINT (115.9555 -26.89259)
## 7 Plumpton Rd, Plumpton Nsw 2761, Australia POINT (150.8439 -33.74823)
## postcode
## 1 2121
## 2 2121
## 3 2033
## 4 4650
## 5 4650
## 6 6630
## 7 2761
Easy enough. We see there’s 7 rows that don’t have a postcode starting in a 3000 postcode.
Four of these are in NSW, two in QLD, and one in WA. It looks like they are real (e.g. the streets and suburbs do exist in that state)… so for now let’s just exclude them from our dataset. We’ll do this by filtering by postcode.
Uncomment the mapview function below to see these points on a leaflet map.
camera_points_vic <- camera_points %>% filter (postcode>"3000" & postcode<"4000")
head(camera_points_vic)
## Simple feature collection with 6 features and 5 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 144.2738 ymin: -38.43479 xmax: 145.4467 ymax: -36.77596
## Geodetic CRS: +proj=longlat
## +datum=WGS84
## +ellps=WGS84
## +towgs84=0,0,0
## location suburb
## 1 Abey Road Cobblebank
## 2 Adam Street Golden Square
## 3 Agar Road Coronet Bay
## 4 Airport Drive Melbourne Airport
## 5 Aitken Boulevard Roxburgh Park
## 6 Aitken Boulevard Craigieburn
## address
## 1 Abey Road, Cobblebank, Australia
## 2 Adam Street, Golden Square, Australia
## 3 Agar Road, Coronet Bay, Australia
## 4 Airport Drive, Melbourne Airport, Australia
## 5 Aitken Boulevard, Roxburgh Park, Australia
## 6 Aitken Boulevard, Craigieburn, Australia
## address_long geometry
## 1 Abey Rd, Cobblebank Vic 3338, Australia POINT (144.5879 -37.70471)
## 2 Adam St, Golden Square Vic 3555, Australia POINT (144.2738 -36.77596)
## 3 Agar Rd, Coronet Bay Vic 3984, Australia POINT (145.4467 -38.43479)
## 4 Airport Dr, Melbourne Airport Vic 3045, Australia POINT (144.8661 -37.69291)
## 5 Aitken Blvd, Roxburgh Park Vic 3064, Australia POINT (144.915 -37.62426)
## 6 Aitken Blvd, Craigieburn Vic 3064, Australia POINT (144.91 -37.59446)
## postcode
## 1 3338
## 2 3555
## 3 3984
## 4 3045
## 5 3064
## 6 3064
There we go! A clean, geocoded dataset of speed camera locations in Victoria.