In this post I will go over setting up a geospatial database using PostGIS, an extension to PostgreSQL. For information about setting up a PostgreSQL database please refer back to my previous Data Storage.
Below is the project plan to ensure I load up all my downloaded map data.
Installing and Setting up
First you need to already have PostgreSQL installed. Please reference my previous post for those instructions.
On a Linux machine, getting PostGIS installed and running is pretty easy and straight forward, just like PostgreSQL. You can find installation information for your specific OS or Linux Distribution at https://postgis.net/install/.
I have copy/pasted the CLI instructions below for simplicity based on my Linux/Ubuntu 20.04 configuration.
# Update the package lists:
sudo apt-get update
# Install the latest version of PostGIS through APT.
sudo apt install postgis
Side note with install for an R Package
For one of the R packages below I did have to install libgdal-dev
. This came up when I was trying to install the package. Once that libgdal-dev
was installed I had no other issues on the R side of things.
sudo apt install libgdal-dev
Starting the Service and Accessing
The first command we will give our terminal is to start the PostgreSQL service. This is because PostGIS is an extension to the PostgreSQL database. We are running version 13 as well.
$ sudo systemctl start postgresql@13-main
Next we will switch to our postgres
account (default) on our just started server.
$ sudo -i -u postgres
The following images should be what you see after each terminal command. Notice in the second image, that once you type in psql
, we are now in the psql terminal/shell. From here we can issue all of our commands to interact, manage and query the database.
Enabling PostGIS
Now that our PostgreSQL database server is running, we need to enable PostGIS. From the PostGIS site:
PostGIS is an optional extension that must be enabled in each database you want to use it in before you can use it. Installing the software is just the first step. DO NOT INSTALL it in the database called
postgres
.Connect to your database with
https://postgis.net/install/psql
or PgAdmin. Run the following SQL. You need only install the features you want:
-- Enable PostGIS (as of 3.0 contains just geometry/geography)
CREATE EXTENSION postgis;
-- enable raster support (for 3+)
CREATE EXTENSION postgis_raster;
-- Enable Topology
CREATE EXTENSION postgis_topology;
-- Enable PostGIS Advanced 3D
-- and other geoprocessing algorithms
-- sfcgal not available with all distributions
CREATE EXTENSION postgis_sfcgal;
-- fuzzy matching needed for Tiger
CREATE EXTENSION fuzzystrmatch;
-- rule based standardizer
CREATE EXTENSION address_standardizer;
-- example rule data set
CREATE EXTENSION address_standardizer_data_us;
-- Enable US Tiger Geocoder
CREATE EXTENSION postgis_tiger_geocoder;
The following image shows the commands to create the various extensions on the servers response giving confirmation.
If we run the following SQL statement, we should get a response similar to the image below. It looks like all the extensions were created and we can now use the PostGIS extension features.
SELECT PostGIS_full_version();
Creating the Database
If you didn’t create the project_crime
database from the previous Data Storage post, you can follow this section. If you are ready to begin loading spatial data into you database, skip to the Loading Data
section below.
In the images below you can see the results of running the following in Psql:
postgres=# create database project_crime;
postgres=# l
The first command gets a response indicating the database was created. The second command lists the databases in our Postgres instance.
Creating a User
Now we are going to create a general analyst
user. We give this user a password during the command. The shell will respond with CREATE ROLE
as confirmation. Next we can grant privileges to this user for our project_crime
database. The shell’s response is GRANT
.
Now you are ready to import data and create table in your database. You can easily create tables using the Psql shell, but in this post, I will be demonstrating the rest in R. Please refer to the R Package section below for more details about the packages used. If you don’t need the info, skip to the next major section on “Creating Tables and Importing Data” below.
R Packages
A quick query for R and PostGIS will yield plenty of resources and R packages to get you started. You can also just search for PostGIS packages in R as well. The results yield rpostgis
initially but after digging around I came up with another two, postGIStools
and sf
. Each will be briefly described below. For more information, refer to each packages documentation resources or links provided below.
install.packages(c("postGIStools", "rpostgis", "sf"))
rpostgis
The rpostgis
package provides an interface between R and a PostGIS enabled PostgreSQL database. The package supports reading and writing vector (points, lines, polygons) and raster data.
The resources below also provides some code for setting up and interacting with a database via R:
# Install the latest rpostgis release from CRAN:
install.packages("rpostgis")
# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("mablab/rpostgis")
postGIStools
This package provides tools to interface with PostgreSQL/PostGIS databases. According to the package documentation it is now deprecated. It then points us to use the sf
package.
Resources:
sf
The sf
package provides simple features access in R. Simple features are a standardized way to encode spatial vector data. Binds to ‘GDAL’ for reading and writing data, to ‘GEOS’ for geometrical operations, and to ‘PROJ’ for projection conversions and datum transformations. Optionally uses the ‘s2’ package for spherical geometry operations on geographic coordinates.
To install this particular package on Linux you will need to make sure you have also install GDAL, GEOS and Proj.4 The following shell command is for Ubuntu. Check out https://github.com/r-spatial/sf for your specific system.
sudo apt install libudunits2-dev libgdal-dev libgeos-dev libproj-dev
Use the following R code to install the sf
package from CRAN or GitHub.
# Install the latest RPostgres release from CRAN:
install.packages("sf")
# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("r-spatial/sf")
Resources
- https://github.com/r-spatial/sf
- https://www.r-bloggers.com/2019/04/interact-with-postgis-from-r/
- https://cfss.uchicago.edu/notes/simple-features/
- package cheatsheet: https://github.com/rstudio/cheatsheets/blob/master/sf.pdf
Creating Tables and Importing Data
The following R code demonstrates connecting to the database I just created then creating a table from my the DC Ward map data API (GeoJSON).
library(tidyverse)
library(magrittr)
library(DBI)
library(rpostgis)
library(sf)
# Connect to desired PostgreSQL database
db <- 'project_crime' # databast name to connect to
host_db <- 'localhost' # Where is db being hosted? default server/host is localhost
db_port <- '5432' # Which port is the server listening to? default port number for PostgreSQL is 5432
db_user <- 'analyst'
db_password <- 'mypassword'
con <- dbConnect(RPostgres::Postgres(),
dbname = db,
host=host_db,
port=db_port,
user=db_user,
password=db_password)
# Check if database extension is installed.
# I did have to alter my analyst role in psql to superuser to do this on
# my initial use, but afterward, it wasn't needed to have superuser privileges.
# psql:
# project_crime=# alter role analyst superuser; le analyst superuser;
pgPostGIS(conn = con)
The TRUE
response lets me know the database is ready and functional. You can check the geometeries and rasters present in the database using:
pgListGeom(conn = con, geog = TRUE)
pgListRast(conn = con)
Since I have not loaded any spatial structured data into the database, I shouldn’t get anything meaningful back.
Load Data
The following shows us loading one of the GeoJSON files collected during my data search. Are experimenting around with a couple different methods, the sf
package provided the best function to read in the GeoJSON file into R.
data <- st_read(dsn = full_file_path)
The following screenshot displays the output and give a quick summary of the data it just read in.
Just to make sure, I’m going to plot the data to verify the polygons. The sf
package provides a geom_sf
function which easily plots the spatial data without having to specify much, unlike ggplot2::geom_polygon
.
ggplot(data = data)+
geom_sf(aes(fill= as.numeric(POP_2011_2015))) +
geom_sf_label(aes(label = WARD))
Now I will load the data into the database and then check the list of tables to validate it was loaded properly
sf::dbWriteTable(conn = con,
name = "ward_polygons",
value = data,
overwrite = TRUE,
driver = RPostgres::Postgres())
# Check the current list of Tables to verify table was created
dbListTables(conn = con)
# Check the current list of Geometries in the database
pgListGeom(conn = con, geog = TRUE)
Here’s the output:
These two function let us check and verify the tables are correct as well.
rpostgis::dbTableInfo(conn = con, name = 'ward_polygons')
rpostgis::pgGetGeom(conn = con, name = 'ward_polygons', geom = "geometry")
After writing up a process to load all my downloaded map data into the database, I can see all the associated tables. Check out the GitHub link below for more details about the full script used (ref: dc_geodb.R).
Remove Tables
If for some reason you decide to remove a table from your database, you can simply run a command to take care of that from R.
dbRemoveTable(conn = con, name = "NAME_OF_TABLE")
Database Resources
The official documentation on PostgreSQL check out their website and documentation at https://www.postgresql.org/docs/13/index.html.
The official documentation on PostGIS check out their website and documentation at https://postgis.net/.
For more information on Psql check out this guide: http://www.postgresguide.com/utilities/psql.html.
Planning Progress
I will reference back to my plan and update what we have so far. Using XMind’s icons, I put task completion status next to the data I have. I have expanded a couple areas to add some granularity. If during the process I expand or modify my plan, I should make sure my plan reflects those changes. Project documentation is a good skill to have. If you document properly as you go, it will save you time and patience later.
Posts in Project Series
- Criminal Analysis: Planning
- Criminal Analysis: Data Search (part 0)
- Criminal Analysis: Data Search (part 1)
- Criminal Analysis: Data Search (part 2)
- Criminal Analysis: Data Search (part 3)
- Criminal Analysis: Data Storage
- Criminal Analysis: Data Storage (part 2)
- Criminal Analysis: Data Search (part 4)
- Derive a Star Schema By Example
- Criminal Analysis: Data Exploration
- Criminal Analysis: Data Exploration (part 1)
- Criminal Analysis: Data Exploration (part 2a)
- Criminal Analysis: Data Exploration (part 2b)
- Criminal Analysis: Data Storage (Part 3)