Chapter 3 Making maps with python
Maps are a great way to communicate data. They’re easily understandable, flexible, and more intuitive than a chart. There’s been numerous studies showing that the average professional often struggles to interpret the units on a y-axis, let alone understand trends in scatter or line graphs.
Making maps in R takes some initial investment (note: they can be fiddly). However once you have some code you know and understand, spinning up new pieces of analysis can happen in minutes, rather than hours or days.
The aim of this quick-reference guide is to get you from ‘I can produce a map in R’ to something more like ‘I can conduct spatial analysis and produce a visual which is ready to send without any further work.’
::opts_chunk$set(echo = TRUE)
knitr
library(reticulate)
# py_install is a special wrapper from the reticulate package that does "conda install" automatically
use_condaenv("tf")
::py_install("geopy")
reticulate
::py_install("geocoder") reticulate
3.1 Importing python packages
Let’s load in some libraries that we will use again and again when making charts.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import geopandas as gpd
import numpy as np
import statistics
from scipy.stats import norm
from matplotlib.ticker import EngFormatter, StrMethodFormatter
3.2 Making simple maps with geopandas
Just like a pandas dataframe, the geopandas
package allows us to us shapefiles.
We’ll go ahead and download some shapefiles from the ABS.
# Read the SHP file
= gpd.read_file('ASGS/SA4_2021_AUST_SHP_GDA2020/SA4_2021_AUST_GDA2020.shp')
SA4_shp
# Load the data using Geopandas
SA4_shp.head()
# Check the coordinate reference system attached to the shapefile
SA4_shp.crs
# Filter the data for only Greater Melbourne
= SA4_shp[SA4_shp['GCC_NAME21']=='Greater Melbourne']
SA4_shp_MEL
SA4_shp_MEL.head()
# Quick plot of the shapefile
=(20, 20), linewidth=0.1, edgecolor='0.9', legend = True)
SA4_shp_MEL.plot(figsize
'Melbourne\nCBD',
plt.annotate(=(144.96246,-37.81214),
xy=(144.46246,-38.21),
xytext= dict(arrowstyle='-'))
arrowprops
"SA2's of Greater Melbourne", fontsize=18)
plt.title(
'off') plt.gca().axis(
plt.show()
Here’s another example using a shapefile for WA
# Load Geometry File
= gpd.read_file('data/NOV21_WA_LOC_POLYGON_shp_GDA2020/wa_localities.shp')
WA_shp
=(20, 20), linewidth=0.1, color='green', edgecolor='0.9', legend = True)
WA_shp.plot(figsize
"Western Australia", fontsize=18)
plt.title(
'off') plt.gca().axis(
plt.show()
3.3 Geocoding address data
Using Nominatim to find the coordinates of a street address
from geopy.geocoders import Nominatim
= Nominatim(user_agent="coverdale")
geolocator = geolocator.geocode("150 Collins Street, Melbourne Australia")
test_location
print(test_location.address)
print(test_location.latitude, test_location.longitude)
print(test_location.raw)
Using Nominatim to find the street address from a set of coordinates
from geopy.geocoders import Nominatim
= Nominatim(user_agent="coverdale")
geolocator
= geolocator.reverse("-37.81214, 144.96246")
test_location
print(test_location.address)
print(test_location.latitude, test_location.longitude)
print(test_location.raw)
We can also use geopy
to find the distance between two points
Geopy can calculate geodesic distance between two points using the geodesic distance or the great-circle distance, with a default of the geodesic distance available as the function geopy.distance.distance.
#Here's an example usage of the geodesic distance:
from geopy.distance import geodesic
= (-37.81214, 144.96246)
sydney = (-33.8688, 151.2093)
melbourne print(geodesic(sydney, melbourne).kilometers)
# Using great-circle distance:
from geopy.distance import great_circle
= (-37.81214, 144.96246)
sydney = (-33.8688, 151.2093)
melbourne print(great_circle(sydney, melbourne).kilometers)
Note we see a slight difference in the km measurement (around 500m) - this is due to the earth not being exactly spherical.
Geocoding a list of addresses
hospital_data_clean = hospital_data.dropna()
# Split out the points into latitude and longitude
hospital_data_clean[[‘lat,’ ‘lon,’ ‘altitude’]] = pd.DataFrame(hospital_data[‘point’].to_list(), index=hospital_data.index)
# View dataframe
hospital_data_clean.head(5)
# Import necessary modules
import geopy
import geocoder
import geopandas as gpd
from shapely.geometry import Point
from geopandas.tools import geocode
from geopy.geocoders import Nominatim
= Nominatim(user_agent="coverdale")
geolocator
from geopy.extra.rate_limiter import RateLimiter
= RateLimiter(geolocator.geocode, min_delay_seconds=1)
geocode
# Read the data
= pd.read_csv("data/QLD_public_hospitals.csv",
hospital_data ='skip',
on_bad_lines='unicode_escape')
encoding
5)
hospital_data.head(
# Add the state and country to the data
'Address'] = hospital_data['Address'].astype(str) + ", Queensland, Australia"
hospital_data[
# Find the location
'location'] = hospital_data['Address'].apply(geocode)
hospital_data[
# Turn the location into a point
'point'] = hospital_data['location'].apply(lambda loc: tuple(loc.point) if loc else None)
hospital_data[
= hospital_data.dropna()
hospital_data_clean
# Split out the points into latitude and longitude
'lat', 'lon', 'altitude']] = pd.DataFrame(hospital_data_clean['point'].to_list(), index=hospital_data_clean.index) hospital_data_clean[[
= [Point(xy) for xy in zip (hospital_data_clean['lon'], hospital_data_clean['lat'])]
geometry
= gpd.GeoDataFrame(hospital_data_clean,
hospital_geodataframe ="EPSG:4326",
crs=geometry)
geometry
#hospital_geodataframe.set_crs(epsg=4326, inplace=True)
# View dataframe
5) hospital_geodataframe.head(
Let’s now plot these points on a map of Queensland. We’ll also need to load in the shape of Queensland as the ‘base map.’
# Read the SHP file
= gpd.read_file('ASGS/STE_2021_AUST_SHP_GDA2020/STE_2021_AUST_GDA2020.shp')
STE_shp
# Load the data using Geopandas
STE_shp.head()
# Check the coordinate reference system attached to the shapefile
STE_shp.crs
# Filter the data for only Greater Melbourne
= STE_shp[STE_shp['STE_NAME21']=='Queensland']
STE_shp_QLD STE_shp_QLD.head()
Now we plot the two layers together
= plt.subplots(1, 1, figsize=(12, 12))
fig, ax
# Base layer with all the areas for the background
=ax, linewidth=0.1, color='lightgrey', edgecolor='0.9')
STE_shp_QLD.plot(ax
# Hospital points
=ax, alpha=1, facecolor='blue', markersize=5)
hospital_geodataframe.plot(ax
"Hospitals in Queensland", fontsize=12)
plt.title(
ax.set_axis_off() plt.show()