In this example, I’m going to demonstrate using the base distributed computing package to download GeoJSON files, perform some processing on those files and then write the resultant tables to a CSV file. The data being used comes from open source criminal activity data provided by Washington DC. You can view the original post written…
Category: data engineering
PostgreSQL Table Creation and Bulk Insertion
As part of converting my Criminal Analysis Data Project code from R to Julia, I thought I would create a series of small posts detailing components of the translation process of data operations in smaller bits. This particular post will show a solution for how to take tabular data from a CSV and load it…
Criminal Analysis: Data Storage (Part 3)
In this post, I will demonstrate loading my criminal activity data into ElasticSearch sot it can be explored, analyzed and visualized in Kibana. For instructions on installing and configuring the Elastic (formerly ELK) Stack, see my previous post. Although this post will specially reference the crime data from my PostgreSQL database, I will include additional…
Converting R scripts to Julia (Part 2)
As part of my Getting COVID-19 Data posts in R, Python and Julia, I will now advance to part two of the conversion process. As we saw in Part 1 of this post series, we duplicated the R scripts into the language specific script folder and changed the file extensions to the appropriate language. In…
Converting R scripts to Python (Part 2)
As part of my Getting COVID-19 Data posts in R, Python and Julia, I will now advance to part two of the conversion process. As we saw in Part 1 of this post series, we duplicated the R scripts into the language specific script folder and changed the file extensions to the appropriate language. In…
Getting COVID-19 Data (Julia)
In this post, I will cover getting open source COVID-19 data for the United States using Julia. The data pipeline demonstrated here is very simple example and could easily be adapted into a Prefect, Apache NiFi or Apache AirFlow ETL process. Data Search Performing a quick search on DuckDuckGo I got The COVID Tracking Project,…
Getting COVID-19 Data (Python)
In this post, I will cover getting open source COVID-19 data for the United States using Python. The data pipeline demonstrated here is very simple example and could easily be adapted into a Prefect, Apache NiFi or Apache AirFlow ETL process. Data Search Performing a quick search on DuckDuckGo I got The COVID Tracking Project,…
Getting COVID-19 Data (R)
In this post, I will cover getting open source COVID-19 data for the United States using R. The data pipeline demonstrated here is very simple example and could easily be adapted into a Prefect, Apache NiFi or Apache AirFlow ETL process. Data Search Performing a quick search on DuckDuckGo I got The COVID Tracking Project,…
Criminal Analysis: Data Exploration (part 2b)
Exploring Mapping Data In a continuation from part 2a, this post will explore the spatial points datasets from my database. I need to assess what each dataset contains. To aid in the exploration of spatial data, I will demonstrate plotting spatial points. This can be a lot easier to look at this particular structure in…
Criminal Analysis: Data Exploration (part 2a)
Exploring Mapping Data My next exploration task is the mapping/geospatial tables in my database. I need to assess the what each dataset contains. To aid in the exploration of spatial data, I will demonstrate plotting of spatial points and polygons. This can be a lot easier to looking at this particular structure in tabular or…