Thus far I have gathered and stored my data for the vast majority of the project into my PostgreSQL database. Now its time to explore the data.
Over the next series of posts for this project, I plan to share different methods of exploring the data. Initially I’ll demonstrate using R to tap into my PostgreSQL database. Next, I’ll mimic the R version of the posts using SQL within my pgAdmin4 setup. Then, I’ll introduce and replicate the processes using Elasticsearch and Kibana.
Exploration Planning
Criminal Activity Data
My primary data set to explore is the crime
table in my database. I need to assess the values and consistency across each of the fields available in what was provided by the source. It is possible for data to be a little inconsistent across each of the years over the past decade, especially if the source provides has not tidied up there data. I will need to assess what needs to be processed or transformed as well. This process will likely lead to spatial and temporal reference tables in my database to standardize values across all my data sets.
Map Data
The map data I gather does contain some additional information about their associated spatially grouping. The actually mapping polygons wont be as much of the focus for that but I will show each to help provide reference to the granularity of each.
Real Estate Data
Permit Data
The permit data will need to be explored to determine the disposition and distribution of values in the various fields. I may need to process and transform values as well. This another spatial data set, but the specific coordinates are not necessarily important at this point. I may plot out a smaller polygon region, just to see what the data provides though.
Housing Data
I will need to spend time exploring the Redfin and Realtor.com data to understand what I have, distributions, missing data, etc. I will need to identify any processing and transformations needs as well.
Economic Data
This data will need to be looked out to assess processing and transformation needs. Each of the data sets are pretty straight forward aggregates.
Other Data
Solar/Lunar Data
The solar and lunar phase data is not a primary focus and is meant o supplement as well as analyze against our crime data. It goes through cycles so its less interesting to explore. I will need to identify what needs to be processed or transformed though.
Weather and Temperature Data
For this data, I can some exploration, but just like the solar/lunar data, it will be more focused on identifying what needs to be processed or transformed. I did a little during my initial data search for the temperature portion.
Employment and Labor Data
This data will need to be looked at to assess processing and transformation needs. The individual tables are pretty straight forward as aggregated data sets.
Project Planning
Now we will start focusing on the “Questions” portion of the project plan. Initially, I recorded several questions at the start of the project. Throughout this data exploration phase, I will be reflecting back on my original questions, while also adding more as a result of getting a better idea of the various distributions. The exploration process may lead me to also modify my original questions.
GitHub Link
Although there is no specific code for this post, you can check out all the code produced for this project on my GitHub.
Posts in Project Series
- Criminal Analysis: Planning
- Criminal Analysis: Data Search (part 0)
- Criminal Analysis: Data Search (part 1)
- Criminal Analysis: Data Search (part 2)
- Criminal Analysis: Data Search (part 3)
- Criminal Analysis: Data Storage
- Criminal Analysis: Data Storage (part 2)
- Criminal Analysis: Data Search (part 4)
- Derive a Star Schema By Example
- Criminal Analysis: Data Exploration