In this post I will go over installing Apache Spark and initial interactions from within R. I am currently using Linux/Ubuntu 20.04 so the instruction are tailored to my environment. The process should be similar to other Linux distributions as well as Mac and Microsoft environments. Getting Apache Spark There are a couple routes to…
Networking my Existing Equipment
This past week I have been migrating and consolidated my desktop and laptop to my networked attached storage (NAS) device, essentially my private cloud storage. My NAS has been sitting in the closet since I move a few years ago. I have a bunch of code and data spread across on my laptop and desktop…
Distributed Computing with Julia By Example
In this example, I’m going to demonstrate using the base distributed computing package to download GeoJSON files, perform some processing on those files and then write the resultant tables to a CSV file. The data being used comes from open source criminal activity data provided by Washington DC. You can view the original post written…
Benchmarking CSV vs CSVFiles packages: Write
This post will cover benchmarking in Julia using a specific case to evaluate the functions in CSV and CSVFiles packages to write a CSV file. Packages and Versioning In this use case, I am using Julia v1.5.3 with the following packages: Please reference each packages documentation for more details. CSV, CSVFiles, DataFrames, BenchmarkTools. Setting up…
Benchmarking CSV vs CSVFiles packages: Read
This post will cover benchmarking in Julia using a specific case to evaluate the functions in CSV and CSVFiles packages to read a CSV file. Packages and Versioning In this use case, I am using Julia v1.5.3 with the following packages: Please reference each packages documentation for more details. CSV, CSVFiles, DataFrames, BenchmarkTools. Setting up…
PostgreSQL Table Creation and Bulk Insertion
As part of converting my Criminal Analysis Data Project code from R to Julia, I thought I would create a series of small posts detailing components of the translation process of data operations in smaller bits. This particular post will show a solution for how to take tabular data from a CSV and load it…
Julia’s Gadfly for R ggplot2 Users
Over the past week I have been reading the documentation and playing with Julia’s Gadfly package. I thought it would be helpful to fellow R users coming from the world of ggplot2 to put together a quick reference guide to show the translation from one to the other. The coding and style for creating data…
Linux/Ubuntu 20.04: Upgrading Julia (v1.4.1 to v1.5.3)
Currently, the Julia programming language version that is available in the APT package management interface is 1.4.1, which was released by Julia on 2020-04-14. Recently I decided to just get the latest and greatest stable version that was released 2020-11-09. In going through the process, I thought it would be helpful to document it for…
Criminal Analysis: Data Storage (Part 3)
In this post, I will demonstrate loading my criminal activity data into ElasticSearch sot it can be explored, analyzed and visualized in Kibana. For instructions on installing and configuring the Elastic (formerly ELK) Stack, see my previous post. Although this post will specially reference the crime data from my PostgreSQL database, I will include additional…
Converting R scripts to Julia (Part 2)
As part of my Getting COVID-19 Data posts in R, Python and Julia, I will now advance to part two of the conversion process. As we saw in Part 1 of this post series, we duplicated the R scripts into the language specific script folder and changed the file extensions to the appropriate language. In…