Practical And Clear Techniques To Clean Data In Excel

With the advances of technology today, businesses, companies, and organizations have amassed quite a number of data.  Whether data to track inventory, monitor employees and payroll, profile suppliers, and customers, terabytes of data are gathered every day. Most data files can be exported to Excel but are not necessarily considered as information. Data exported to Excel is raw and unorganized, think of it as “raw material” that needs to undergo a process- Data cleaning with Excel. Current information is vital to any business so the process to clean data in Excel should not take a long time nor be complicated.

This article defines what data cleaning is and why data cleaning in Excel is needed. We will detail a practical approach (and tips) on how to clean data in Excel.

Clear Meaning Of Data Cleaning In Excel

When collecting raw data from several sources, these carry mistakes and most likely from manual input of users such as typo errors, missing, and invalid inputs. Data cleaning is simply the process of ensuring data is useable.  Data cleaning with Excel is done to improve quality by doing any of the following:

  • Removing duplicates,
  • Identifying irrelevant details
  • Updating incomplete and inconsistent data
  • Handle missing or blank data
  • Correcting improper formats
  • Compiling different data sources

The goal of data cleaning in Excel is to create information- data sets that are standardized to allow businesses and organizations to easily access and retrieve the right data needed. Data cleaning with Excel ensures details always match the correct fields.  Here are a few benefits of how to clean data in Excel can help businesses and organizations:

  • Improves decision making and execution
  • Increases productivity and efficiency
  • Streamlines business practices and policy
  • Increases revenue and reduces the cost

Steps On How To Clean Data In Excel

Most of the time you are not able to control the source of the data- where it comes from, how it was formed nor used. So, before jumping through the data cleaning in the Excel process, it’s important to understand the raw data file first. If banks and financial institutions have KYC or Know Your Client process, then in Data cleaning with Excel you have KYD or “Know Your Data”.

Does the data file contain single or multiple worksheets? Do columns have headers? Are distinct details crammed into one single column?  Are there any footnotes or special instructions? These are only a few questions to help you know the type of data.

Tip: Before performing data cleaning with Excel, make sure you keep a backup copy of the raw data file.

To illustrate the data cleaning in the Excel process, we will use a retail company’s customer purchase order data imported to Excel. Notice we have highlighted the issues that must be resolved by data cleaning with Excel.

To convert the raw data into standardized and usable information, we will follow below basic steps on how to clean data in Excel:

Tip: You don’t need to go through the same data cleaning with Excel steps every single time. For example, if the data contains no non-printable characters and line breaks you can simply skip the step.

Clean Data In Excel By Deleting Formats

It is not uncommon in Excel to apply different formatting options (cell color, font, borders, etc.) to make important data fields stand out and easily recognizable. However, excessive formatting may have the opposite effect; it will be difficult to filter, organize, and essentially work with the data.

If you want to clear all the formats, including highlights & borders, and start with a clean data in Excel, simply use the Clear Formats option.

  1. In the Home tab, go to Editing Group.
  2. Click the down arrow next to Clear button.
  3. Select Clear Formats

Doing so will erase all cell formatting such as conditional formatting, number formats, fonts, colors, borders, etc.

Clean Data In Excel by Parsing Data Using Text to Columns

Notice the last column, column H, crammed the Address, City, State, and Postcode into one single column and is separated by commas only. This is a common issue that makes it hard to filter data and use Excel functions and formulas.

To clean data in Excel, parse, and split the text into separate distinct cells by using Excel’s Text to Column tool.

  1. Highlight the cells to parse and split
  2. In the Data tab, go to Data Tools Group and select Text to Columns
  3. From the Convert Text to Columns Wizard dialogue box, select Delimited then click Next
  4. From the Delimiters menu select Comma for the data. See a preview in the Data preview window. Then click Next.
  5. For the Column data format, since data consists of both text and numbers, select General.
  6. Select the Destination where the split data will appear, Excel automatically adds new columns to fit the data but you can use them to select the range of cells preferred. In this example, the destination column starts from Column H.
  7. Click Finish

Clean Data In Excel by Removing Extra Spaces And Non-Printing Characters

Some parts of the imported data’s text may contain leading and trailing spaces, these unwanted spaces make it complicated to navigate, collect data, and presentation-wise it looks confusing and unprofessional. Also, non-printing characters can cause issues when sorting and using lookup formulas. These unnecessary characters make it hard to understand the text.

Clean data in Excel by using the TRIM function to remove leading and trailing spaces

TRIM Function in Excel formula

Clean data in Excel by using CLEAN Function to remove non-printable characters including line breaks.

Clean Function In Excel Formula

Tip: Use TRIM and CLEAN combination to clean data in Excel

Clean Data In Excel By Handling Blank Cells

If data was or mistakenly missed out or incorrectly entered into a system, certain data elements will be missing or blank. Similar to the example data from the retail clothing company where some customers did not enter their phone number. This blank cells will create issues when using functions and formulas. To perform data cleaning in Excel, use the Editing Group’s Go To Special function.

  1. Select the data set
  2. Press F5 key, this the quickest way to access the Editing Group’s Go To Special function. Alternatively, use CTRL + G.
  3. On the Go To dialogue box, click Special
  4. Select Blanks button and click OK
  5. All blank cells in the selected data set will be highlighted
  6. To fix and replace blank cells, type “No Contact Details” in the active cell.
  7. Press CTRL + Enter to apply to the remaining blank cells

Clean Data In Excel By Converting Numbers Entered As Text To Numbers

It is not uncommon when data is imported from outside sources the numbers are stored as text values. Data cleaning with Excel must be done to avoid calculation issues. For example, the SUM function in Excel [link to sum articles] ignores non-numeric values. Total balances will be incorrect as it fails to sum all numerical values. How to clean data in Excel in this case?

Data cleaning in Excel can be done by simply changing the cell format.

  1. Select the cells with text-formatted numbers
  2. On the Home tab, go to Number group
  3. From the Number Format drop-down list, choose Number

Also, utilizing Excel’s error checking features [link to error checking article] is another approach to clean data in Excel.

Clean Data In Excel By Changing Text to Upper, Lower, and Proper Case

For some reason, other users do not follow the capitalization rules when entering text so it is not uncommon for names, addresses, titles, etc. to have inconsistent formats. Depending on your preference, you can clean and edit text cases to upper and lower cases. Three functions make data cleaning with Excel in this scenario easy- UPPER, LOWER, and PROPER functions.

Clean Data In Excel By Finding And Removing Duplicates

Duplicates in data can come from various reasons; import and export issues, customer input errors, inadequate data quality controls, etc. Data stored and collected from different sources then exported to Excel oftentimes is not 100% accurate as moving data between 2 systems can cause duplicates.

The most common cause of duplicates is human errors such as customers mistakenly use the wrong input field, data access is not secure and any team member can make changes, and management failing to implement consistent quality control or checks.  

Depending on your data, finding and removing duplicates can be done in single and multiple columns and rows. For the retail company’s data example, data cleaning in Excel through identifying duplicates is done based on the customer’s full name and order number.

  1. Select the data set (columns A & B)
  2. Go to Home Tab, then on Styles Group select Conditional Formatting
  3. From the drop-down list, choose Highlight Cell Rules then select Duplicate Values
  4. In the Duplicate Values dialog box, select Duplicate from the drop-down list the left, and specify the format to highlight the duplicate values. This will highlight cells with duplicates in the selected data set.
  5. Click Ok

The next step for data cleaning with Excel with duplicates is to remove it. Removing duplicates in Excel is not uncommon, in fact Excel dedicated a tool for it.

  1. Select the range of cells where duplicated values are to be removed
  2. Go to the Data tab and click Remove Duplicates then a dialogue box appears.
  3. If the data has headers, check “My data has headers”. Excel then excludes the data in the first row when finding and removing duplicates. If data has no headers, then uncheck.
  4. Excel allows you to select which columns to find and remove duplicates. In this example, we want to remove all duplicates under the Full Name and Order Number.
  5. Click Ok.

Excel gives a summary count of the duplicates found and removed as well as the unique values found.

Clean Data In Excel By Highlighting and Fixing Errors

Excel has many useful and versatile tools but if not used properly, various errors will appear. [link to Error checking in excel article]  In the retail clothing data example, notice the #NA error appears under Column E, Order Discount.

Use the Go to Special tool to clean data in Excel.

  1. Select the Data
  2. Press F5
  3. Click on “Special” the bottom left
  4. A dialogue box appears (Go to Special), Select Formulas
  5. Uncheck all options except “Errors
  6. Click Ok

All errors are highlighted, you can delete each error manually or replace it with preferred text. If you wish to replace, then type chosen text at active cell & hit CTRL+ENTER. In this example, replace #N/A error in Excel with “Not Applicable”.

Data cleaning with Excel’s Spell Check

Yes, Excel has spell checking abilities! Nothing can diminish a financial model, report, and presentation’s integrity than misspelled words and typo errors.  To access the Spelling check tool, go to the Review tab and select Spelling from the Proofing Group. Alternatively, the quickest way is to press the F7 key.

Alternatively, the quickest way is to press F7 key.

Data cleaning with Excel’s Find And Replace Tool

Find and Replace is an essential tool for data cleaning in Excel. To standardize and clean data in Excel, you can find and replace blank cells with text, zeros with numerical values, input errors with correct values. Find and Replace can also be used for updating reference cells in formulas and functions.

To illustrate Data cleaning with Excel’s Find and Replace, notice the City Names contain inconsistent customer input. City names North West and South East were instead entered as “nw” and “se” respectively. Also, a few are entered as lower case characters.

To Clean data in Excel with Find And Replace Tool follow the below process.

  1. Select the cells to find and replace values
  2. On the Home tab, Go to Editing Group and select Find & Select button
  3. Choose Replace from the Find & Select drop-down (alternatively, press CTRL + H)
  4. In the Find and Replace Dialogue Box, input the following
  5. For Find What, input “nw”
  6. For Replace With, input “North West”
  7. Click Replace All

The same process applies to find and replace “se” with the correct City Name “South East”. You can also apply the same process in correcting the lower case characters

Maintain A Tidy And Clean Data In Excel

Financial models, reports, and analysis are only as good as the clean data in Excel used. If you have incorrect, invalid, and inconsistent data, then valuations and decisions are in jeopardy. Data cleaning with Excel is a powerful tool because of its flexibility and extensive functions and formulas. The steps and tips presented in this article are easy-to-follow, quick, and common ways for data cleaning in Excel.However, data cleaning with Excel is a never-ending process. For as long as data is continuously used, edited, updated, and accessed by different people, reliable clean data in Excel is important.  In other words, the integrity of a tidy and clean data in Excel must be maintained as much as possible. Data cleaning with Excel helps entrepreneurs, analysts, executives, and other professionals to not only optimize data but to translate these into useful information for business improvements and strategies.

Was this helpful?