Chapter 5

## Shape files

library(ggplot2)
library(sf)

Let’s load in a shapefile we have on the computer locally.

It’s in the folder Boston_Neighborhoods and the shapefile is also called Boston_Neighborhoods. Load it in.

Call the object boston.

library(sf)
library(ggplot2)
library(dplyr)

map_filepath <- "Boston_Neighborhoods/Boston_Neighborhoods.shp"

boston <- st_read(map_filepath)

### Glimpse

Let’s take a look at the data behind the shapefile.

_____(boston)
glimpse(boston)

### Plot the shapefile

_____(_______) +
_________()
ggplot(boston) +
geom_sf()

The ggplot2 and sf function you need to map out a shapefile is geom_sf().

Good job. Not much to display, yet.

So let’s bring in some data to join it with.

The data is historical population and race data by neighborhood from the City of Boston.

Take a glimpse.

_____(boston_population)
glimpse(boston_population)

### Join the shapefile to the data

Combine the shape file boston to the data frame boston_population.

Use the function that joins from the left from dplyr.

boston_pop_map <- _____(______, _______, by=c(_______=_______))

boston_pop_map
boston_pop_map <- left_join(boston_population, boston, by=c("Neighborhood"="Name"))

boston_pop_map

The dplyr verb you need to use to join to the left is left_join() and don’t forget that variable names in by need to be in quotes.

### Plot the data on the shapefile

Make a choropleth with colors based on Population.

ggplot(boston_pop_map) +
geom_sf(aes(fill=________))
ggplot(boston_pop_map) +
geom_sf(aes(fill=Population))

### Style it

First, let’s fix the legend so the gradient goes in the other direction.

And also add a title “Boston neighborhood population” and source in the caption.

ggplot(boston_pop_map) +
geom_sf(aes(fill=Population)) +
scale_fill__________(_________=1, _______="Population") +
_____(_________=________________, _______="Source: City of Boston")
ggplot(boston_pop_map) +
geom_sf(aes(fill=Population))  +
scale_fill_distiller(direction=1, name="Population") +
labs(title="Boston neighborhood population", caption="Source: US Census")

scale_fill_distiller()

Also, let’s get rid of the axis marks and the background gray grid.

ggplot(boston_pop_map) +
geom_sf(aes(fill=Population)) +
scale_fill__________(_________=1, _______="Population") +
_____(_________=________________, _______="Source: City of Boston") +
theme________() +
theme(_________________ = element_line(colour = ____________)) +
ggplot(boston_pop_map) +
geom_sf(aes(fill=Population))  +
scale_fill_distiller(direction=1, name="Population") +
labs(title="Boston neighborhood population", caption="Source: US Census") +
theme_void() +
theme(panel.grid.major = element_line(colour = 'transparent'))

These style tweaks are in Chapter 5.

## Small multiples

### More data

Let’s map historical population and race data by neighborhood from the City of Boston.

Take a glimpse.

_____(race_ethnicity2010)
glimpse(race_ethnicity2010)

### Join the shapefile to the data

Combine the shape file boston to the data frame race_ethnicity2010.

Use the function that joins from the left from dplyr.

boston_race_map <- _____(race_ethnicity2010, _______, by=c(_______=_______))

boston_race_map
boston_race_map <- left_join(race_ethnicity2010, boston, by=c("Neighborhood"="Name"))

boston_race_map

### Plot the data on the shapefile

Map the data, setting the color based on Percent.of.Population.

Also, facet wrap out by Race.and.or.Ethnicity this time.

This will take a few moments to render.

ggplot(boston_race_map) +
geom_sf(aes(fill=________)) +
__________(~___________, ncol=2)
ggplot(boston_race_map) +
geom_sf(aes(fill=Percent.of.Population)) +
facet_wrap(~Race.and.or.Ethnicity, ncol=2)

### Style with title and color fix

First, let’s fix the legend so the gradient goes in the other direction.

And also add a title “Boston neighborhood population by race” and source in the caption.

This will take a few moments to render.

ggplot(boston_race_map) +
geom_sf(aes(fill=Percent.of.Population)) +
__________(~___________, ncol=2) +
scale_fill__________(_________=1, _______="Percent of Population") +
_____(_________=________________, _______="Source: City of Boston")
ggplot(boston_race_map) +
geom_sf(aes(fill=Percent.of.Population)) +
facet_wrap(~Race.and.or.Ethnicity, ncol=2) +
scale_fill_distiller(direction=1, name="Percent of Population") +
labs(title="Boston neighborhood population by race", caption="Source: City of Boston")

### Style with grid and axis fixes

Also, let’s get rid of the axis marks and the background gray grid.

This will take a few moments to render.

ggplot(boston_race_map) +
geom_sf(aes(fill=Percent.of.Population)) +
scale_fill__________(_________=1, _______="Percent of Population") +
_____(_________=________________, _______="Source: City of Boston") +
theme________() +
theme(_________________ = element_line(colour = ____________)) +
ggplot(boston_race_map) +
geom_sf(aes(fill=Percent.of.Population))  +
scale_fill_distiller(direction=1, name="Percent of Population") +
facet_wrap(~Race.and.or.Ethnicity, ncol=2) +
labs(title="Boston neighborhood population by race", caption="Source: City of Boston") +
theme_void() +
theme(panel.grid.major = element_line(colour = 'transparent'))

## Mapping murders

This data comes from The Washington Post’s Murder with Impunity series.

We’ll start out with a slice of Boston’s data.

Take a glimpse at the structure of the data, which is in the environment as homicides.

_____(homicides)
glimpse(homicides)

### Plot the points on the map

Let’s use the leaflet package to map out all the locations of the murders in Boston.

We’ll use the boston shape file as the base. Then add the locations based on the latitude and longitude columns in the homicides data frame.

library(leaflet)

Fill in the blanks to 1) add the shapefile of Boston neighborhoods. Remember, the shapefile is in the global environment as boston. Then, 2) Add circles for every longitude and latitude in the homicides data frame and have the popup be the disposition variable.

leaflet(________) %>% addProviderTiles("CartoDB.Positron") %>%
setView(-71.087010, 42.316240, zoom = 11) %>%
__________(
fillColor = "transparent",
fillOpacity = 0.9,
weight = 1,
smoothFactor = 0.2,
color="black") %>%
add_______(~________$____, ~_________$__, popup=______$______, weight = 3, radius=40, color="#ff0000", stroke = TRUE, fillOpacity = 0.8)  leaflet(boston) %>% addProviderTiles("CartoDB.Positron") %>% setView(-71.087010, 42.316240, zoom = 11) %>% addPolygons( fillColor = "transparent", fillOpacity = 0.9, weight = 1, smoothFactor = 0.2, color="black") %>% addCircles(~homicides$lon, ~homicides$lat, popup=homicides$disposition, weight = 3, radius=40, color="#ff0000", stroke = TRUE, fillOpacity = 0.8) 

### Points in a polygon prep

We want to associate each dot with a corresponding neighborhood.

Prepare points for join by making points fit the projection of the shapefile it will be joined with.

murders_spatial <- homicides %>%
________(coords=c("lon", "lat"), crs = "+proj=longlat") %>%
__________(crs=_____(boston))

str(murders_spatial)
murders_spatial <- homicides %>%
st_as_sf(coords=c("lon", "lat"), crs = "+proj=longlat") %>%
st_transform(crs=st_crs(boston))

str(murders_spatial)

You’ll be using three different st_xxxxx functions. If you need reference, we’ve used it before.

### Points in a polygon join

Use the special function that sees where the geometries we’ve set in murders_spatial fall into which polygon in boston.

points_in <- __________(boston, _________, left=T)

points_in
points_in <- st_join(boston, murders_spatial, left=T)

points_in

We’ll bring in the viridis library for colors.

library(viridis)

And now we can make a choropleth map with this new transformed data frame.

Start with the points_in data frame and figure out the number of murders per neighborhood (Name is the variable name). You’ll need to create a new variable called murders.

points_in %>%
group_by(_____) %>%
__________(_________=___) %>%
________() +
________(aes(fill = ________), color=NA) +
coord_sf(datum=NA) +
labs(title = "Total murders in Boston by neighborhood",
subtitle = "Between 2008 and 2018",
caption = "Source: Washington Post",
fill = "Murders") +
scale_fill_viridis(option="magma", direction=-1)
points_in %>%
group_by(Name) %>%
summarize(murders=n()) %>%
ggplot() +
geom_sf(aes(fill = murders), color=NA) +
coord_sf(datum=NA) +
labs(title = "Total murders in Boston by neighborhood",
subtitle = "Between 2008 and 2018",
caption = "Source: Washington Post",
fill = "Murders") +
scale_fill_viridis(option="magma", direction=-1)

Nice!

Now, let’s figure out percent of unsolved murders by neighborhood and map that out.

You’ll have to add another variable (disposition) to group by and after summarizing by creating a new variable (murders), you’ll then create another variable percent that figures out the percent.

And then filter disposition for "Open/No arrest".

Then plot it with the function from ggplot2().

points_in %>%
______(____, ________) %>%
________(________=___) %>%
________(______=______/___(_________)*100) %>%
filter(____________=="Open/No arrest") %>%
ggplot() +
_______(aes(fill = _______), color=NA) +
coord_sf(datum=NA) +
labs(title = "Rate of unsolved murders in Boston by neighborhood",
subtitle = "Between 2008 and 2018",
caption = "Source: Washington Post",
fill = "Percent unsolved") +
scale_fill_viridis(option="magma", direction=-1)
points_in %>%
group_by(Name, disposition) %>%
summarize(murders=n()) %>%
mutate(percent=murders/sum(murders)*100) %>%
filter(disposition=="Open/No arrest") %>%
ggplot() +
geom_sf(aes(fill = percent), color=NA) +
coord_sf(datum=NA) +
labs(title = "Rate of unsolved murders in Boston by neighborhood",
subtitle = "Between 2008 and 2018",
caption = "Source: Washington Post",
fill = "Percent unsolved") +
scale_fill_viridis(option="magma", direction=-1)

Remember, to find the percent of something, it’s something/sum(something)*100.

Let’s see if we can do the analysis above but for each ethnicity listed in Boston.

Quickly take a look at the points_in data frame to look up what variable you want to add.

## Observations: 616
## Variables: 18
## $OBJECTID <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... ##$ Name          <fct> Roslindale, Roslindale, Roslindale, Roslindale, ...
## $Acres <dbl> 1605.568, 1605.568, 1605.568, 1605.568, 1605.568... ##$ Neighborho    <fct> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, ...
## $SqMiles <dbl> 2.51, 2.51, 2.51, 2.51, 2.51, 2.51, 2.51, 2.51, ... ##$ ShapeSTAre    <dbl> 69938273, 69938273, 69938273, 69938273, 69938273...
## $ShapeSTLen <dbl> 53563.91, 53563.91, 53563.91, 53563.91, 53563.91... ##$ uid           <chr> "Bos-000067", "Bos-000091", "Bos-000145", "Bos-0...
## $reported_date <int> 20150720, 20140109, 20130129, 20130914, 20110605... ##$ victim_last   <chr> "CASTILLO", "JOHN", "JONES", "CASTILLO", "MARTIN...
## $victim_first <chr> "JASON", "BRANDON", "CARLY", "NELSON B. JR.", "W... ##$ victim_race   <chr> "Hispanic", "Black", "Black", "Hispanic", "Hispa...
## $victim_age <chr> "28", "18", "32", "25", "23", "43", "17", "22", ... ##$ victim_sex    <chr> "Male", "Male", "Female", "Male", "Male", "Femal...
## $city <chr> "Boston", "Boston", "Boston", "Boston", "Boston"... ##$ state         <chr> "MA", "MA", "MA", "MA", "MA", "MA", "MA", "MA", ...
## $disposition <chr> "Open/No arrest", "Open/No arrest", "Open/No arr... ##$ geometry      <MULTIPOLYGON [°]> MULTIPOLYGON (((-71.12593 4..., MUL...

Now add that variable into the group_by and make a small multiples viz of the map based on that new variable.

points_in %>%
group_by(Name, __________, disposition) %>%
summarize(murders=n()) %>%
mutate(percent=murders/sum(murders)*100) %>%
filter(disposition=="Open/No arrest") %>%
ggplot() +
geom_sf(aes(fill = percent), color=NA) +
coord_sf(datum=NA) +
labs(title = "Rate of solved murders in Boston by neighborhood",
subtitle = "Between 2008 and 2018",
caption = "Source: Washington Post",
fill = "Percent unsolved") +
scale_fill_viridis(option="magma", direction=-1) +
__________(~_________)
points_in %>%
group_by(Name, victim_race, disposition) %>%
summarize(murders=n()) %>%
mutate(percent=murders/sum(murders)*100) %>%
filter(disposition=="Open/No arrest") %>%
ggplot() +
geom_sf(aes(fill = percent), color=NA) +
coord_sf(datum=NA) +
labs(title = "Rate of unsolved murders in Boston by neighborhood",
subtitle = "Between 2008 and 2018",
caption = "Source: Washington Post",
fill = "Percent unsolved") +
scale_fill_viridis(option="magma", direction=-1) +
facet_wrap(~victim_race)

That’s a slick-looking graphic.

These are the results as a data frame if you didn’t map it:

### Add it to an interactive map

One last map.

Let’s figure out where unsolved murders occurr in context of Black-majority population neighborhoods.

I’ve already loaded the Census tracts with the percent of Black population as the data frame boston_black_perc.

Here’s how the polygons look: And here’s how the data frame looks:

## Observations: 204
## Variables: 6
## $GEOID <chr> "25025000100", "25025000201", "25025000202", "... ##$ NAME            <chr> "Census Tract 1, Suffolk County, Massachusetts...
## $Black <dbl> 252, 170, 320, 101, 142, 209, 77, 102, 32, 190... ##$ Total           <dbl> 4185, 3553, 3697, 2590, 3110, 5127, 3277, 5904...
## $black_residents <dbl> 6.02, 4.78, 8.66, 3.90, 4.57, 4.08, 2.35, 1.73... ##$ geometry        <MULTIPOLYGON [°]> MULTIPOLYGON (((-71.1609 42..., M...

First, let’s create a new data frame that breaks out census tracts where the Black population is the majority.

black_boston <- boston_black_perc %>%
filter(__________ > 50)

black_boston
black_boston <- boston_black_perc %>%
filter(black_percent > 50)

black_boston

### Final map

Alright, I’ve set up some colors based on “Closed by arrest” and “Open/No arrest”.

disp <- colorFactor(c("#ff0000", "#13ED3F"), domain=c("Closed by arrest", "Open/No arrest"))

We’re going to create a layered interactive map.

First layer is all the Boston census tracts with a minimal border (boston_black_perc).

Second layer is the Black only census tracts with a thicker border (black_boston).

The third and final layer are circles representing solved and unsolved murders (the colors are based on the code above).

leaflet() %>% addProviderTiles("CartoDB.Positron") %>%
setView(-71.087010, 42.316240, zoom = 11) %>%
fillColor = "transparent",
fillOpacity = 0.9,
weight = 1,
smoothFactor = 0.2,
color="black") %>%
add_______(data=homicides, ~homicides$lon, ~homicides$lat, popup=homicides$_________, weight = 3, radius=40, color=~______(disposition), stroke = TRUE, fillOpacity = 0.8) %>% add_______("bottomright", colors= c("#ff0000", "#13ED3F"), labels=c("Arrest made/Closed", "Open"), title="Disposition")  leaflet() %>% addProviderTiles("CartoDB.Positron") %>% setView(-71.087010, 42.316240, zoom = 11) %>% addPolygons(data=boston_black_perc, fillColor = "transparent", fillOpacity = 0.9, weight = 1, smoothFactor = 0.2, color="black") %>% addPolygons(data=black_boston, fillColor = "transparent", fillOpacity = 1, weight = 3, smoothFactor = 0.2, color="black") %>% addCircles(data=homicides, ~homicides$lon, ~homicides$lat, popup=homicides$disposition, weight = 3, radius=40, color=~disp(disposition), stroke = TRUE, fillOpacity = 0.8) %>%
addLegend("bottomright", colors= c("#ff0000", "#13ED3F"), labels=c("Arrest made/Closed", "Open"), title="Disposition")