Over the years of cleaning, transforming and analyzing data, I have created some useful tools to help expedite tidying and processing. This post will go over a tool designed to gather horizontally tabulated data to transform into a more key:value pair structure. This tool is very similar to R’s tidyr::gather
function. For those that don’t use R, this will certainly be a nice tool for your data processing toolkit.
If you work heavily in Microsoft Excel, your spreadsheets may look like the following depictions. Metadata corresponding to each row of data on the left, and column descriptors on top row. Sometimes you will have 1 column dedicated to describing the data for all the other data in that particular row. You may have several if there are a lot of attributes or metadata.
If you analyze data in pivot tables or Tableau after processing, you may grow to appreciate what this transformation can do for you. Now this is not as important in all cases. Its a tool to help.
Access VBA from MS Excel
If you are currently in MS Excel, you can press Alt+F11
to open up VBA or navigate to the Developer Menu and select “Visual Basic” button on the far left side of the ribbon. Once you have VBA open you can import the .bas
file. The file is located on GitHub here.
When you have the associated macro imported, and configured the way you want, you can run it against your data. To the run the macro, click the “Macros” button in the “Developer” ribbon. In the window, select “DataGathering”.
This tutorial will just show you the operations and results with explanations.
Data Toolkit
If you followed the customize the ribbon post successfully, then you can just press the Data Gathering button, which will run the macro.
Running the Macro
Initially, the example worksheet just has “Test_Data” worksheet.
The data on that worksheet looks like the following. This is a very generalized spreadsheet to demonstrate the process only.
Define Your Data
First the macro prompts you to select the Row Header Range. This encompasses the Column Headers and all metadata you would like to company each transformed record. In the Single column of metadata, we simply select the range from A1 to the end of our range, say A21. Similarly we can select multiple columns as depicted in the second picture.
Next, we are prompted to select the Column Header Range. This is just the column variables. In the example below its the range of cells from D1 through I1. Think of these values as the “key” in a key:value pair.
Next, we are prompted to select the Range containing the Data. This is just the data not including the column headers or metadata. In the example below its the range of cells from D2 through I121. Think of these values as the “value” in a key:value pair.
Finally, we are prompted to type in whatever name we want to give to the “key” field of our key:value pair. In this example, I just entered “DataVariables”. The important piece is that you understand the value and meaning to your dataset.
Process Complete
Depending on the size of your data, the process should complete really quickly. If you try o perform the operation without the macro, it will likely take 10min to several hours depending on size. When the process is complete, a message box will pop up to let you know its complete.
You should also notice that there is a new worksheet with the name “Gathered_Data_” plus the date-time it was created. This process does not modify or change your original data. You could just as easily run this operation several times, selecting different configurations each instance. The structure of your data and understanding of what you need to do will really define the direction you take this. It will save you time though.
Inspection and Validation
Now, we can inspect the output and validate that our Data Gathering performed as we expected. Notice that the cells with notes, are also put in their own column. I have worked with a lot of different spreadsheets and the amount of notes that are made can be staggering. Sometimes a client might want to retain those comments, while others are not concerned. This at least gives the option to include the addition attribute to the cell value with little additional effort required. You can select each image to view more closely.
Pivot Tables
Now that our data has been gathered, let’s look at the differences for using the data in Pivot Tables.
Pivot Table on Original Data
Before running a pivot table on the gathered data, lets look at what the pivot table on the original structure looks like. Here you have lot of columns in your fields pane. You have to select the individual columns you want to see on whichever axis. This has its merits depending on what you are doing.
Pivot Table on Gathered Data
Below is what your newly gathered data looks like in the pivot table structure. Instead of having lots and lots of columns to sort through you can work with one variable. With that one variable you can filter on the “columns” that you wanted. You can then place that variable on either “Columns” or “Rows” of the PivotTable Fields menu.
This is just another way of looking at and analyzing the data. It is not a cure all for all your analysis needs.
Conclusion
If you are frustrated by the structure of your data, maybe reorganizing and restructuring your data can help. Gathering your data helps to quickly filter on the relevant fields. The new structure can improve your ability to visualize your data as well. This is another tool to aid your data analysis and visualization efforts and is not going to solve all your issues.
If you work in Tableau, you will likely understand and appreciate how this structure will improve what visualizations you can build.
Consulting Assistance
If you like what you read here and think there is an opportunity for me to help your business with creating a custom solution, please reach out on my Freelance & Consulting page.
Excel VBA Recommendations