In this article I will cover the general workflow I have used in my work experiences to condition data in Microsoft Excel. This post references my previous work in the article on Creating Your Own Ribbon Bar of Tools (MS Excel).
General Workflow
The following image depicts a generalized workflow for data conditioning in MS Excel using my Data Toolkit add-in.
Step 1
In general, you already have data on a spreadsheet worksheet that needs to be cleaned up and maybe transformed for other purposes. As the user of the data, you will be in the best position to know what you need to do. This knowledge will influence what operations you need to perform.
Step 2
Identify what needs to be done to the worksheet. Does the worksheet have Rows or columns that are currently merged?
Should you consolidate your data columns to create a key:value pair structure? Does the data have comments tied to cells that should be extracted and maintained?
Original Data Key:Value Pairs
Could your columns/fields be rearranged to keep “like” things together? Do you need additional columns to simplify or enhance other fields? Do your data cells have special formatting configurations that provide some enhanced attributes to your data? These are just a handful of operations and are not all encompassing. Knowing your data and what you want to do with it is important in your planning process.
Maybe you are just preparing data to be ingested into a database. Data going into a database should be structured and free of font and cell formatting. Cells should also conform to data type rules and restrictions.
The macros in this step are discussed in the post, Vertical and Horizontal Demerge (MS Excel) and Data Gathering (MS Excel).
Step 3
Unless your data has been cleanly entered and free of errors, misspelling, etc, you will likely need to clean one or multiple fields. In Step 3, we create distinct lists for each column in your worksheet. The particular macro I have created for this step does just that. It may be overkill for some users but for some spreadsheets, you might need to do a lot of cleaning. By creating a distinct list of each field, we can survey our data and understand the values and variations. If we need to transform values to something more standardize this is where we can really benefit from the process. The images below depict the results of this step.
This particular macro will be discussed in more detail on a separate post for creating distinct value lists.
Step 4
In this step, we simply merge the cleaned up data in the lookup tables created during Step 3. The way my macros work, the original data is preserved so you can always go back and rerun the processes. The resultant transformation of values in your original worksheet now appear in a replicated worksheet with all the new, cleaned and transformed values. From here you can easily re-run the transformation process to continue to refine and reduce the variations that may still exist.
This particular macro will be discussed in more detail on a separate post for merging your old data with your newly created look up tables.
Step 5
The final step involves evaluating the resultant worksheet with your cleaned, transformed and processed data. You may decide to rerun the process on this resultant data to continue to improve the quality and cleanliness of your data. You or your customers standards will vary on what is acceptable. From this step you can use the data in an application, analyze it or ingest into a database. Cleaned data also makes data visualizations easier.
Conclusion
There are many ways to process data. This article poses a general workflow that has been helpful to me and other teammates working in a spreadsheet based environment.
Consulting Assistance
If you like what you read here and think there is an opportunity for me to help your business with creating a custom solution, please reach out on my Freelance & Consulting page.
The website is well organized and provides insight on the data conditioning process.. Thank you for the information and recommendations.