This post will describe how to install both the PostgreSQL database and the PostGIS geodatabase extension.
If you have been following my Criminal Analysis: Data Storage posts then this will be a repeat of information.
Installing and Setting up
PostgreSQL
On a Linux machine, getting PostgreSQL installed and running is pretty easy and straight forward. Just follow the simple instructions from the PostgreSQL website (here). I have copy/pasted their code instructions below for simplicity. This does work on my Linux/Ubuntu 20.04 Computer.
# Create the file repository configuration:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
# Import the repository signing key:
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# Update the package lists:
sudo apt-get update
# Install the latest version of PostgreSQL.
# If you want a specific version, use 'postgresql-12' or similar instead of 'postgresql':
sudo apt-get -y install postgresql
My default installation through my Linux/Ubuntu 20.04 was version 13.1
PostGIS
On a Linux machine, getting PostGIS installed and running is just as easy and straight forward as getting PostgreSQL on your machine. You can find installation information for your specific OS or Linux Distribution at https://postgis.net/install/.
I have copy/pasted the CLI instructions below for simplicity based on my Linux/Ubuntu 20.04 configuration.
# Update the package lists:
sudo apt-get update
# Install the latest version of PostGIS through APT.
sudo apt install postgis
Starting the Service and Accessing
The first command we will give our terminal is to start the PostgreSQL service. We are running version 13 as well.
sudo systemctl start postgresql@13-main
To confirm that PostgreSQL service is active, running and is enabled, run the following from the terminal.
sudo systemctl is-active postgresql
sudo systemctl is-enabled postgresql
sudo systemctl status postgresql
# Check if your service is ready for connections and accepting
sudo pg_isready
Next we will switch to our postgres
account (default) on our just started server.
sudo -i -u postgres
The following images should be what you see after each terminal command. Notice in the second image, that once you type in psql
, we are now in the psql terminal/shell. From here we can issue all of our commands to interact, manage and query the database.
Creating a User
Now we are going to create a general analyst
user/role. The concept of a user and a role are somewhat interchangeable depending on how it is used, though there are implications. A user is the same as role, except that it implies login. Change analyst
to be whatever you want based on your needs.
CREATE USER analyst WITH PASSWORD 'mypassword';
-- or
CREATE USER analyst WITH ENCRYPTED PASSWORD 'mypassword';
We will give this user an encrypted password during the command as well. For simplicity I just used ‘mypassword”. You can just as easily use WITH ENCRYPTED PASSWORD
as well. We can specify that the user change it. The shell will respond with CREATE ROLE
as confirmation.
We can also alter our user if we decide later.
ALTER USER analyst RENAME TO johndoe;
For more information, please check out the official documentation on creating a user here:
- https://www.postgresql.org/docs/13/sql-createuser.html
- https://www.postgresql.org/docs/13/sql-alteruser.html
There is a lot more you can specify when creating or altering for a user/role.
Roles and Groups
A group is an alias for role. A role or group is a great way to assign or modify privileges to all members of the entity because membership conveys privileges. This alleviates having to manage individual privileges at the user level to a higher aggregate.
Here, we create a role called “analyst_role” and a group “analyst manager”.
CREATE ROLE analyst_role;
CREATE GROUP analyst_manager SUPERUSER CREATEDB CREATEROLE LOGIN;
As we define what a group or role can or cant do, we can alter our user analyst
to be apart of a role/group. These roles/groups can be granted privileges which are then available to its members.
ALTER USER johndoe IN ROLE analyst_role;
ALTER ROLE analyst_role LOGIN;
For more information, please check out the official documentation on creating a group here:
- https://www.postgresql.org/docs/13/sql-createrole.html
- https://www.postgresql.org/docs/13/sql-creategroup.html
- https://www.postgresql.org/docs/13/sql-alterrole.html
There is a lot more you can specify when creating or altering for a user/role.
Grant Privileges
Next we can grant or alter privileges to a user individually or by specifying at the role/group level. Membership in a role/group conveys privileges.
GRANT ALL PRIVILEGES ON DATABASE project_db to analyst_role;
for our project_db
database. The shell’s response is GRANT
.
Creating a Database
In the images below you can see the results of running the following in Psql to create my project_crime database:
postgres=# create database project_db;
postgres=# l
The first command gets a response indicating the database was created. The second command lists the databases in our Postgres instance. The images below are associated with a different database but demonstrates the concepts.
Now, lets switch to our project_db
postgres=# \c project_db
-- This will change the postgres=# to project_db=#
Enabling PostGIS
Now that our PostgreSQL database server is running and we have a databse to work with, we need to enable PostGIS for project_db
.
PostGIS is an optional extension that must be enabled in each database you want to use it in before you can use it. Installing the software is just the first step. DO NOT INSTALL it in the database called
postgres
.Connect to your database with
https://postgis.net/install/psql
or PgAdmin. Run the following SQL. You need only install the features you want:
That last part means, we should change to a database that is NOT the default postgres
database.
-- Enable PostGIS (as of 3.0 contains just geometry/geography)
CREATE EXTENSION postgis;
-- enable raster support (for 3+)
CREATE EXTENSION postgis_raster;
-- Enable Topology
CREATE EXTENSION postgis_topology;
-- Enable PostGIS Advanced 3D
-- and other geoprocessing algorithms
-- sfcgal not available with all distributions
CREATE EXTENSION postgis_sfcgal;
-- fuzzy matching needed for Tiger
CREATE EXTENSION fuzzystrmatch;
-- rule based standardizer
CREATE EXTENSION address_standardizer;
-- example rule data set
CREATE EXTENSION address_standardizer_data_us;
-- Enable US Tiger Geocoder
CREATE EXTENSION postgis_tiger_geocoder;
After each execution of the commands to create the various extensions you should get confirmation with “CREATE EXTENSION”.
If we run the following SQL statement, we should get a response similar to the image below. It looks like all the extensions were created and we can now use the PostGIS extension features.
SELECT PostGIS_full_version();
What’s next?
Now that we have installed our PostgreSQL database and PostGIS database extension, we can look at populating our database with data. We briefly touched on Creating and Altering Users and Roles/Groups, so we could explore that side of administration and management of the database.
I will also introduce pgAdmin, the leading Open Source management tool for Postgres. pgAdmin provides a good graphical interface to create and maintaining database objects. You can also manage Users, Roles/Groups and so much more..
SQL Resources
If you have not learned much SQL, I would highly recommend setting up an account at HackerRank.com. It is free and provides good problems to solve, which gives you experience. Work through their SQL problems. SQL is a very valuable skill to have and often expected for Data Scientists to know. If you are interested in Database problems, they have those too. The image below shows you all the skills available to learn from their platform.
For referencing or looking up commands, syntax, and examples use W3Schools.com and tutorialspoint.com.
Database Resources
For more information on PostgreSQL check out their website and documentation at https://www.postgresql.org/docs/13/index.html.
For more information on Psql check out this guide: http://www.postgresguide.com/utilities/psql.html.
Another resource for installing PostgreSQL on Linux/Ubuntu 20.04 with more information about everything covered above from the terminal: https://www.digitalocean.com/community/tutorials/how-to-install-and-use-postgresql-on-ubuntu-20-04
Other Data Storage Posts
- Criminal Analysis: Data Storage
- Criminal Analysis: Data Storage (part 2)
- Data Storage: Installing PostgreSQL and PostGIS
- Data Storage: pgAdmin
- Derive a Star Schema By Example
- Criminal Analysis: Data Storage (Part 3)