Now that we have our crime and mapping data, lets work on gathering the other data I wrote down during the previous Planning post. Part 2 of Data Search will focus on our Other Data Sources branch in the project plan.
Moon Phases
On previous work projects, I have come across a couple of R packages that provide this information for a given date so it shouldn’t be difficult to get all the data for our project time frame.
The first package, phenology
, provides a function to get the information. Simply provide a date and whether you want the phase category returned. The default output is a value between 0 and 100, where 100 is full moon, while 50 is new moon, 25 is last quart and 75 is first quarter.
The second package, suncalc
, provides several functions related to sun and moon information. You can also get sun/moon rise and set times for a date and location. This information could be used to determine when an incident occurred.
The information provide by either package would be good enough, though suncalc
provides more information so I will likely use this. I can then get distribution of incidents by moon phase and corresponding sun/moon rise and set times.
Weather and Temperature Data
During my search for weather and temperature data, I came across several sources but after further investigation, I didn’t find exactly what I was looking for. Some required accounts, others didn’t provide the historical data I wanted.
NOAA Climate Data Online
Using the NOAA site (https://www.ncdc.noaa.gov/cdo-web/) I was able to get the data, but had to partition the date range to meet their limitations. In the final image below the range was 2yrs. They way it works is they email you the results of your request. Instead of committing to this just yet, I will check out any R packages.
The R packages below do utilize the web services offered by the NOAA CDO. See below for more details.
R Packages
I searched for R packages that could provide weather data for a given location. The results yielded weatherData
and rnoaa
.
weatherData
You can find the package here on GitHub: http://ram-n.github.io/weatherData/
On initial investigation, the package function and details seemed like it would be a good tool for my project. Unfortunately, while I was testing out the package and function, it appears that it had not been updated for a while and the internals of the function no longer worked with the website it was sourcing its data from. While I could go in to update the function, I opted to move on to the next package that did work instead.
rnoaa
You can find the package here on GitHub: https://github.com/ropensci/rnoaa
Better documentation is located on the rOpenSci website here: https://docs.ropensci.org/rnoaa/articles/rnoaa.html.
For a quick reference to what the data-types are and their unit measure descriptions, reference this: https://rdrr.io/github/ropensci/rnoaa/src/R/units-ghcnd.R.
In order to get data from NOAA (via NOAA Climate Data Online), you will need to register to get a token. The information for how to do this is located on the NOAA CDO website (here) and the R package documentation.
After performing the necessary queries to get the correct structure and location information, I wrote a script to pull all the weather data associated with Ronald Reagan National Airport (DCA) from 2009-2019. The API limits the number of record returns so I create a sequence to loop through to get all the data. Once my process completed, I aggregated the results into a single structure and wrote out the data to a CSV for later ingestion into my project storage solution (future post).
The script for getting the data will be posted to the GitHub link at the bottom of the post.
Planning Progress
I will reference back to my plan and update what we have so far. Using XMind’s icons, I put task completion status next to each data item to indicate level of progress. I have expanded a couple areas to add some granularity. If during the process I expand or modify my plan, I should make sure my plan reflects those changes. Project documentation is a good skill to have. If you document properly as you go, it will save you time and patience later.
Posts in Project Series
- Criminal Analysis: Planning
- Criminal Analysis: Data Search (part 0)
- Criminal Analysis: Data Search (part 1)
- Criminal Analysis: Data Search (part 2)
- Criminal Analysis: Data Search (part 3)
- Criminal Analysis: Data Storage
- Criminal Analysis: Data Storage (part 2)
- Criminal Analysis: Data Search (part 4)
- Derive a Star Schema By Example
- Criminal Analysis: Data Exploration