As a continuation of the previous article on Creating Distinct Lists, this post will demonstrate how both macros compliment each other. Both of these processes fit into my data conditioning workflow in Microsoft Excel.
Below is the original example data and the resultant merged data.
Access Macro from MS Excel
If you are currently in MS Excel, you can press Alt+F11
to open up VBA or navigate to the Developer Menu and select “Visual Basic” button on the far left side of the ribbon. Once you have VBA open you can import the .bas
file. The file is located on GitHub here. You can just use the DataConditioningToolkit.bas, which contains a whole basic data toolkit of macros.
When you have the associated macro imported, and configured the way you want, you can run it against your data. To the run the macro, click the “Macros” button in the “Developer” ribbon. In the window, select “Clean_Lookup”.
This tutorial will just show you the operations and results with explanations.
Data Toolkit
If you followed the customize the ribbon post successfully, then you can just press the Look-Up (Raw->Clean) button instead, which will run the macro.
Preparing for the Macro
The starting point of this macro is completing the series of look up tables created during the Create Distinct List macro. The initial state of results would like like this:
In these next images you will see that we have applied our desired transformations. The first image shows that no changes are necessary for the metadata. If changes were desired, we could easily make those here.
The second image shows the errors in the original data and what they should transform into. Whatever your cleaning operations are, this is the place to make those transformations.
Notice the number of variations for values. By conducting our transformations we can reduce variations and standardize the values for each particular field.
Side Note: Date Formats
One thing to note concerns date formats. Notice in the first image how the original date is composed a certain way and the “_clean” column shows a different style. If this is not made as a text, the style will most likely revert back. This can be frustrating, but just requires changing the column number format. There are more complicated methods to fix this but that is out of scope for this post.
Run the Macro
Now that we have completed our look up tables, we can proceed to run the appropriate macro. For this particular macro, you need to complete each lookup table. If there are no changes, just “fill right” or copy/paste the orginal values into the clean column.
The “Look-Up (Raw->Clean)” macro will look up the original values in the original data spreadsheet and runs the lookup to transform the values and create a new worksheet with the updated data table.
When you click the macro button, the prompt will ask you to select the “A1” cell on the original data worksheet. This is just to identify which worksheet is the original. Your data should start in the “A1” cell, with row 1 containing the variable names for each field.
Now, the next prompt asks you to do the same on the cleaned up lookup table worksheet. Once you click “OK” the macro will process the data.
Process Complete
The macro will create a new worksheet to store the new merged data. This preserves the original data and lookup tables, should you need to have traceability.
Inspection and Validation
At this time you can view and inspect the output. Dates can sometimes change formats automatically in MS Excel. You will lose your style formatting of cells, though you can easily write additional macros to maintain styles throughout the who process.
Conclusion
Once you have your new merged data worksheet, you can rerun the compliment of macros to continue to refine and further transform your data.
What you do next is up to you and your tasks. There are a number of tasks you could do with this. In past experiences of cleaning and transforming data, I could standard lots of worksheets and combine into a database to help formalize data collection. Coherency and standardization are certainly issues when you have several different users entering data.
Now that you have some basic tools to help you condition your data, you are better equipped to take your data to the next level. If you are part of a data migration and standards team, these tools will save you loads of time. Hopefully your analysis and visualizations will improve as well.
Consulting Assistance
If you like what you read here and think there is an opportunity for me to help your business with creating a custom solution, please reach out on my Freelance & Consulting page.