Now we will be working with and visualizing COVID-19 data on top of our current spatial plots. To see catch up, check out these associated shapefile and GeoJSON posts. The output used used in this post comes from the GeoJSON post since it is in the desired coordinate system.
The R script was simplified from the original tutorial to give me just the essentials. The script can be found on GitHub here.
Continuing with the plot example provided by Cowboy State Daily. Below I detail how to recreate their plot programmatically. .
The purpose of this post is not to criticize the original source. We seek to explore how to build up a spatial visualization from the ground up. Once the foundation of the visualization is composed, we can easily add more content and data. These additional layers can help convey and contextualize spatial information or other desirable information.
The article used was published on 17-July-2020. Here is the associated URL, https://cowboystatedaily.com/2020/07/16/wyoming-sees-39-new-confirmed-coronavirus-cases-recoveries-grow-by-34/
Here was their map, representing the current active cases as of 16-July-2020.
Now, here is what I was able to recreate programmatically:
Script Preparation
As mentioned, I used the outputs from the previous GeoJSON tutorial (in a more simplified script). The output structures were not modify.
The data was gathered from their map manually and recorded in a spreadsheet to provide a more conducive tabulated structure. Additional by county data was also collected as well to facilitate additional work.
library(tidyverse)
library(magrittr)
# Read in the spatial data to plot the Counties within
source('~/problemxsolutions/wyoming-r/wy_county_spatial_geojson.R')
file <- "./wy_covid_stats_20200724.csv"
wy_covid_df <- read_csv(file = file)
wy_covid_df_labels <-
left_join(x = wy_covid_df,
y = cog_df,
by = c('County'= 'COUNTYNAME'))
wy_covid_df_labels$lat[wy_covid_df_labels$County == "Lincoln"] <- 42.15
Even though I have calculated the centroids of each county, there was one county that did not quite meet my expectations. The name was plotted too close to a county border, so I adjusted the coordinate value to be more aesthetically pleasing.
Plot Details
To get the same plot colors and line colors from the original plot, I imported the image into an image editor and recorded the hexadecimal values for each color.
color_scheme_original_image <- c('#abddfe', # 'blue'
'#abddaa', # 'green'
'#ffff87', # 'yellow'
'#ffcc66') # 'orange'
Since I could not figure out programmatically how to assign each color to the appropriate county, I manually created the assignments.
county_pallete <-
c('Teton' = color_scheme_original_image[1],
'Sweetwater' = color_scheme_original_image[1],
'Hot Springs' = color_scheme_original_image[1],
'Johnson' = color_scheme_original_image[1],
'Albany' = color_scheme_original_image[1],
'Goshen' = color_scheme_original_image[1],
'Weston' = color_scheme_original_image[1],
'Natrona'= color_scheme_original_image[2],
'Platte'= color_scheme_original_image[2],
'Crook'= color_scheme_original_image[2],
'Park'= color_scheme_original_image[2],
'Lincoln'= color_scheme_original_image[2],
'Uinta' = color_scheme_original_image[3],
'Fremont' = color_scheme_original_image[3],
'Big Horn' = color_scheme_original_image[3],
'Campbell' = color_scheme_original_image[3],
'Niobrara' = color_scheme_original_image[3],
'Laramie' = color_scheme_original_image[3],
'Sublette' = color_scheme_original_image[4],
'Washakie' = color_scheme_original_image[4],
'Sheridan' = color_scheme_original_image[4],
'Carbon' = color_scheme_original_image[4],
'Converse' = color_scheme_original_image[4]
)
# geom_path(color = '#5f5f57') # This is the gray path borders between counties
Base Plot
This is a plain and simple plot that will serve as the template that we continue to build off of.
base_plot <-
ggplot(data = wyoming_county_polygons,
aes(x = long,
y = lat,
group = COUNTYNAME,
fill = COUNTYNAME)) +
geom_polygon() +
geom_path(color = '#5f5f57') +
scale_fill_manual(values = county_pallete) +
coord_map() +
theme(panel.background = element_rect(fill = 'gray2'),
panel.border = element_rect(linetype = 'solid', fill = NA),
panel.spacing = unit(0.2, 'lines'),
strip.text = element_text(),
strip.background = element_rect(linetype = 'solid', color = 'NA'),
axis.text = element_text(color = 'black'),
axis.ticks.length = unit(0, "cm"),
plot.margin = unit(c(0,0,0,0), "pt"))
Applying Labels to the map
Now, lets add the label annotations to the map and remove the legend. To clean up map, we scale the X and Y axis to exclusively fit the plot. We turn off the fill legend by inserting the guides()
function. Next we remove the theme attributes for axis title and text. The annotate()
function is how we add our labels and adjust the labeling properties.
wy_covid_spatial_plot <-
base_plot +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)) +
labs(x = 'Longitude', y = 'Latitude',
title = 'Active COVID Cases by County: 2020-07-16') +
guides(fill = FALSE) +
theme(axis.title = element_blank(),
axis.text = element_blank()) +
annotate("text",
x = wy_covid_df_labels$lon,
y = wy_covid_df_labels$lat,
label = paste(wy_covid_df_labels$County, "\n",
wy_covid_df_labels$`20200716`),
size = 3,
fontface = 2)
The font size of the county names are slightly larger than the original, but that can easily be adjusted. We could increase the size of the number as well. The only thing might be to separate the annotations to provide more flexibility and control.
Additional Plots
Next, we explore some different ways of adding information to the plot. We can assign color of the county by some metric or include additional numbers. Additionally, we could add details to highlight population density or cities affected. The fidelity of our available data is really our limitation. Keep in mind that we do not want to over complicate any graphic. All plots should be tailored to your key audience. Some audiences are accustomed to highly dimensional graphics, while others can view dense graphics as busy and confusing.
First we create another data object to help plot our data.
polygon_plot_data <-
wyoming_county_polygons %>%
left_join(x = .,
y = wy_covid_df,
by = c('COUNTYNAME'='County'))
The following code keeps the daily active case count, while scaling polygon fill property to the same metric. This adds a little more relative context outside of reading the number of cases for each county. We can see the relative volume of cases. This allows us to introduce something else into the label, as long as call out what we want.
# Current Active Cases Plot
ggplot(data = polygon_plot_data,
aes(x = long, y = lat,
group = COUNTYNAME,
fill = `20200716`)) +
geom_polygon() +
geom_path(color = 'white') +
coord_map() +
scale_fill_gradient(low = 'green',
high = 'red',
guide = "colorbar",
name = "Active Cases") +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)) +
labs(x = 'Longitude', y = 'Latitude',
title = 'Active COVID Cases by County: 2020-07-16') +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
annotate("text",
x = wy_covid_df_labels$lon,
y = wy_covid_df_labels$lat,
label = paste(wy_covid_df_labels$County, "\n",
wy_covid_df_labels$`20200716`),
size = 3,
fontface = 2)
In this next plot we depict the volume of overall lab-confirmed cases by the color spectrum of the county. We also annotate the number of deaths for the county below the county name.
# Deaths Plot
ggplot(data = polygon_plot_data,
aes(x = long, y = lat,
group = COUNTYNAME,
fill = `Confirmed Cases (Overall)`)) +
geom_polygon() +
geom_path(color = 'white') +
coord_map() +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)) +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_fill_gradient(low = 'green',
high = 'red',
guide = "colorbar",
name = 'Overall Lab-Confirmed Cases') +
ggtitle(label = "COVID Related Deaths (as of 2020-07-22)") +
annotate("text",
x = wy_covid_df_labels$lon,
y = wy_covid_df_labels$lat,
label = paste(wy_covid_df_labels$County, "\n",
wy_covid_df_labels$Deaths),
size = 2,
fontface = 2)
Mortality and Recovery Rate
Now lets depict the mortality rate by the number of Overall Lab-Confirmed Cases. The data I am using comes from the Wyoming Department of Health website as of 2020-07-24 (Friday). While the county dashboards, had a lot more data we could use, the process to record the isn’t desirable for this specific post. The concept can still be conveyed, but if we want to get more data, we could either request or take the time to record what we want from what they provide.
The data was recorded in the same data spreadsheet used above. Below, we run some quick rate calculation on the data objects. This gets the data in a very usable format to display on our plots.
# Mortality and Recovery Rate Plots
polygon_plot_data %<>%
mutate(mortality_rate = round(Deaths / `Confirmed Cases (Overall)`,
digits = 3) * 100,
recovery_rate = round(`Confirmed Recovered (Overall)` / `Confirmed Cases (Overall)`,
digits = 3) * 100)
wy_covid_df_labels %<>%
mutate(mortality_rate = round(Deaths / `Confirmed Cases (Overall)`,
digits = 3) * 100,
recovery_rate = round(`Confirmed Recovered (Overall)` / `Confirmed Cases (Overall)`,
digits = 3) * 100)
You will notice on the following plots that I modified the annotation pieces to separate the county names and featured numbers. This separation gives us way more control.
# Mortality Plot
ggplot(data = polygon_plot_data,
aes(x = long, y = lat,
group = COUNTYNAME,
fill = mortality_rate)) +
geom_polygon() +
geom_path(color = 'white') +
coord_map() +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)) +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_fill_gradient(low = 'green',
high = 'red',
guide = "colorbar",
name = 'Mortality Rate') +
ggtitle(label = "COVID Mortality Rates By County (as of 2020-07-24)") +
annotate("text",
x = wy_covid_df_labels$lon,
y = wy_covid_df_labels$lat,
label = wy_covid_df_labels$County,
size = 2,
fontface = 1) +
annotate("text",
x = wy_covid_df_labels$lon,
y = wy_covid_df_labels$lat,
label = paste("\n\n",
wy_covid_df_labels$mortality_rate,
"%"),
size = 3,
fontface = 2)
Looking at the plot in comparison to the total number of deaths, we can see that the mortality rates convey a somewhat different story. If we had the data by age group, we could create a faceted plot to enhance the context of deaths. We could also incorporate population density.
Next we look at the Recovery Rates by county. In the code below, we changed the color representation of what high and low means. Green should indicate a positive outcome in the context of recovery, so we should assign it to the high argument.
# Recovery Rates Plot
ggplot(data = polygon_plot_data,
aes(x = long, y = lat,
group = COUNTYNAME,
fill = recovery_rate)) +
geom_polygon() +
geom_path(color = 'white') +
coord_map() +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)) +
theme(axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_fill_gradient(low = 'red',
high = 'green',
guide = "colorbar",
name = 'Recovery Rate') +
ggtitle(label = "COVID Recovery Rates By County (as of 2020-07-24)") +
annotate("text",
x = wy_covid_df_labels$lon,
y = wy_covid_df_labels$lat,
label = wy_covid_df_labels$County,
size = 2,
fontface = 1) +
annotate("text",
x = wy_covid_df_labels$lon,
y = wy_covid_df_labels$lat,
label = paste("\n\n",
wy_covid_df_labels$recovery_rate,
"%"),
size = 3,
fontface = 2)
This plot depicts another different story from the mortality rate since not every case results in a death, and the number of deaths are far below the amount that recover. We can see that most counties have a sizable proportion of confirmed cases that are still active (at time of plot creation). As more active cases recover from the virus, we can anticipate the values increasing in a more desirable direction. The rate of recovery can also help direct resources to counties or help influence local policy decisions.
Conclusion
Adding spatial context to data, can improve an audiences understanding of the presented information. Something visually appealing can convey a lot more information without having to inundate plots with verbiage. The more granularity and fidelity in our data, the more we can support and visualize to enhance our contextual understanding.
Data visualization have the capacity to give a quick understanding of whats going on. The more context we can visualize the more we can improve what we see. Visualizations can help influence or support our (or others) decision making processes without having to necessarily read pages of written data, which can be very time consuming. Remember your audience.
GitHub Links
- https://github.com/problemxsolutions/wyoming-r/blob/master/wy_county_spatial_geojson.R
- https://github.com/problemxsolutions/wyoming-r/blob/master/wy_covid_20200724.R
What’s Next?
After gathering additional COVID data by county over time, I will work through visualizing the data over time and space. We can then explore creating low tech “video” or motion graphics. I will also detail more about how to output graphics in different formation, such as PDFs and KMLs.