This post is a supplemental tutorial to my initial post on Spatial Visualizations in R (Part 1) to work through processing a GeoJSON file in R from the https://www.geospatialhub.org/ for this particular example. You can work with the same GeoJSON file at the following link:
https://www.geospatialhub.org/datasets/b0e0a99ec14748eeae750949c7bbb2ec_0?geometry=-123.671%2C40.158%2C-91.437%2C45.779
You can explore and tailor the API under the “API Explorer” tab (left portion of the screen grab, midway down). It was on this screen that I was able to see the proper output coordinate reference I wanted, which resolving my initial issue with the shapefile’s coordinate output. You can also adjust what fields you want to export as well.
Data Storage
In this tutorial, there was no need to download and store the data since we will be pulling the data directly from the API URL.
The GeoJSON file is at the URL: https://opendata.arcgis.com/datasets/b0e0a99ec14748eeae750949c7bbb2ec_0.geojson
API Explorer
If you built a tailored query in the “API Explorer”, you will see the modified query URL, which you can use as well.
https://services.wygisc.org/HostGIS/rest/services/GeoHub/DORCounties/MapServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json
The output is not necessarily GeoJSON but is still json so the processing below will be generally the same.
Data Preparation
Now that we have our data source identified, we are going to explore how to interact with the GeoJSON file in R.
First we load the various packages I think we will need for the remainder of this tutorial. If you don’t have these libraries, you can install them using the install.packages("packagename")
command.
library(tidyverse)
library(magrittr)
library(rgdal)
library(rgeos)
library(maptools)
library(geosphere)
library(GISTools)
library(geojsonR)
Notice the final library. There are a couple of GeoJSON libraries that you can download, but this was what I chose. The process should be relatively the same, as well as the outputs. The main differences will be in the name of the functions and arguments used for library specific functions.
We will now pull the GeoJSON data from the API URL.
wyoming <- FROM_GeoJson(url_file_string = "https://opendata.arcgis.com/datasets/b0e0a99ec14748eeae750949c7bbb2ec_0.geojson")
By using the API we skip the process of downloading a file, extracting the zip, etc. This also potentially allows us to pull the most current data from the source repository, if the provider or the data necessitates.
Data Exploration
Once the data has been assigned to the variable wyoming
, we will explore the structure. Using the slotNames(wyoming)
function, we can see that this object is not structured the same way as the shapefile example.
The names(wyoming)
provides us the high-level list names.
Next, we inspect the structure using the str(wyoming)
function. From the output of this we can see that the object is composed of two lists, but within each list there are nested lists for each polygon. At the high-level, the first list holds the feature data, while the second list holds the type. The only value in the type list is a simple “FeatureCollection”, which describes the overall object of a collection of spatial features.
The short readout I have commented out gives us the structure of the first polygon in the feature list. The structure of the spatial feature contains a “geometry” list, a “properties” list and a “type” list.
slotNames(wyoming)
# NULL
names(wyoming)
# [1] "features" "type"
str(wyoming)
# List of 2
# $ features:List of 23
# ..$ :List of 3
# .. ..$ geometry :List of 2
# .. .. ..$ type : chr "Polygon"
# .. .. ..$ coordinates: num [1:1569, 1:2] -108 -108 -108 -108 -108 ...
# .. ..$ properties:List of 8
# .. .. ..$ COUNTYNAME : chr "Natrona"
# .. .. ..$ OBJECTID : num 1
# .. .. ..$ RuleID : NULL
# .. .. ..$ Shape_Area : num 2.6e+10
# .. .. ..$ Shape_Length: num 661865
# .. .. ..$ TAXYEAR : num 2014
# .. .. ..$ TYPENAME : chr "County Boundary"
# .. .. ..$ YEARCOLLEC : num 2013
# .. ..$ type : chr "Feature"
# ... MORE DATA ...
We look dive in to the structures some more to get some summary information by using the following:
# Lets look at the first element in the list
names(wyoming$features[[1]])
# [1] "geometry" "properties" "type"
summary(wyoming$features[[1]])
# Length Class Mode
# geometry 2 -none- list
# properties 8 -none- list
# type 1 -none- character
str(wyoming$features[[1]]$geometry)
# List of 2
# $ type : chr "Polygon"
# $ coordinates: num [1:1569, 1:2] -108 -108 -108 -108 -108 ...
After further inspection of the coordinates field we can see they are in the desired decimal degrees representation.
wyoming$features[[1]]$geometry$coordinates
# [,1] [,2]
# [1,] -107.5428 42.75742
# [2,] -107.5428 42.75381
# [3,] -107.5427 42.75012
# [4,] -107.5426 42.74643
# [5,] -107.5425 42.74274
The structure of the wyoming$features[[1]]$properties
gives us the information that we saw in the shapefiles spatial dataframe. To create the same table, we could bind_rows
using an lapply
function to loop through each of the list elements in the wyoming
object.
str(wyoming$features[[1]]$properties)
# List of 8
# $ COUNTYNAME : chr "Natrona"
# $ OBJECTID : num 1
# $ RuleID : NULL
# $ Shape_Area : num 2.6e+10
# $ Shape_Length: num 661865
# $ TAXYEAR : num 2014
# $ TYPENAME : chr "County Boundary"
# $ YEARCOLLEC : num 2013
Dealing with Holes
During my investigation of the data, I am across an improper label on one of the county polygon objects. For object 22 (Teton County) there were two sets of coordinates. The first set of coordinates are the main polygon, while any set of coordinates thereafter are holes in the polygon. For my primary purpose, I don’t need to worry about the hole, so I will just ignore in later processing.
wyoming$features[[22]]$geometry$type
# [1] "Polygon"
teton_1 <- wyoming$features[[22]]$geometry$coordinates[[1]][[1]] %>%
data.frame()
teton_2 <- wyoming$features[[22]]$geometry$coordinates[[1]][[2]] %>%
data.frame()
# X1 X2
# 1 -111.0486 44.27949
# 2 -111.0486 44.27986
# 3 -111.0483 44.28786
# 4 -111.0486 44.27949
ggplot() +
geom_polygon(data = teton_1, aes(x = X1, y = X2), color = 'black', alpha = .8) +
geom_polygon(data = teton_2, aes(x = X1, y = X2), fill = 'red', color = 'red')
Its a little hard to see but the hole is on the far left boundary slightly below the 44.4 grid line.
Data Transformation
In this section we will begin transforming the data to get it ready for visualizing. We will use the lapply
function to loop through the lists to get the coordinates and selected properties details. In assessing what data is in the “properties” and we actually need to plot the data, we only need to select the “COUNTYNAME” and “OBJECTID”.
wyoming_county_polygons <-
lapply(1:length(wyoming$features),
function(i){
if(!is.list(wyoming$features[[i]]$geometry$coordinates)){
tmpdata <- wyoming$features[[i]]$geometry$coordinates
}else{
tmpdata <- wyoming$features[[i]]$geometry$coordinates[[1]][1]}
tmpdata %>%
data.frame() %>%
tibble %>%
mutate(COUNTYNAME = wyoming$features[[i]]$properties$COUNTYNAME,
OBJECTID = wyoming$features[[i]]$properties$OBJECTID) %>%
rename("long" = 'X1', 'lat'='X2')
}) %>%
bind_rows()
Lets look at the wyoming_county_polygon
.
head(wyoming_county_polygons)
# A tibble: 6 x 4
# long lat COUNTYNAME OBJECTID
# <dbl> <dbl> <chr> <dbl>
# 1 -108. 42.8 Natrona 1
# 2 -108. 42.8 Natrona 1
# 3 -108. 42.8 Natrona 1
# 4 -108. 42.7 Natrona 1
# 5 -108. 42.7 Natrona 1
# 6 -108. 42.7 Natrona 1
Everything looks primed and ready to plot now.
ggplot(data = wyoming_county_polygons,
aes(x = long, y = lat,
group = COUNTYNAME, fill =COUNTYNAME)) +
geom_polygon()
We can see from the plot that everything was plotted as we would hope. Now we want bring it closer to our initial goal, by putting the names of the counties in each of the Counties.
Label the Counties
In the previous shapefile tutorial we examined different ways to plot the labels in the counties so we will skip that process here. If you would like to review that portion check that post out here and navigate down to the same section.
Since our coordinates are in decimal degrees, we can use the centroid function in the geosphere library. Our wyoming object does not contain the centroid coordinate as our shapefile example demonstrated, so this is a quick and easy way to calculate the same value.
We use a modified version of the process above to get the centroid of each polygon.
cog_df <-
lapply(1:length(wyoming$features),
function(i){
if(!is.list(wyoming$features[[i]]$geometry$coordinates)){
tmpdata <- wyoming$features[[i]]$geometry$coordinates
}else{
tmpdata <- wyoming$features[[i]]$geometry$coordinates[[1]][1]
}
tmpdata %>%
data.frame %>%
geosphere::centroid(x = .) %>%
data.frame %>%
tibble %>%
mutate(COUNTYNAME = wyoming$features[[i]]$properties$COUNTYNAME,
OBJECTID = wyoming$features[[i]]$properties$OBJECTID)
}) %>%
bind_rows
Conclusion
Here is the code to generate the visualization.
ggplot(data = wyoming_county_polygons, aes(x = long, y = lat, group = COUNTYNAME, fill =COUNTYNAME)) +
geom_polygon() +
geom_path(color = 'white') +
coord_equal() +
annotate("text",
x = cog_df$lon,
y = cog_df$lat,
label = cog_df$COUNTYNAME,
size = 4) +
labs(x = 'Longitude', y = 'Latitude',
title = 'County Map of Wyoming') +
guides(fill=FALSE) +
theme(panel.background = element_rect(fill = 'gray20'),
panel.border = element_rect(linetype = 'solid', fill = NA),
panel.spacing = unit(0.2, 'lines'),
strip.text = element_text(),
strip.background = element_rect(linetype = 'solid', color = 'black'),
axis.text = element_text(color = 'black'),
axis.ticks.length = unit(0, "cm"))
After inspection, if you see a result that doesn’t quite meet your expectation you could edit the appropriate values then rerun the plot code. If you will want to write off the results to a csv for future use and factor that into your script.
# If you wanted to manually adjust locations after this, you could
# edit the value and save off to a csv for future use.
cog_df %>% edit()
cog_df %>%
write_csv(., path = "./ProblemXSolutions.com/Wyoming/wyo_cog.csv")
What’s Next?
Now that we have created our spatial visualization using GeoJSON data, in the next post, we will continue the current example by applying the COVID data provided by the source mentioned above.