When working with data in Excel, it is very likely you will encounter duplication. How do you deal with duplicates in Excel? Dealing with duplicates in Excel is 2-stage process –first identify if you have duplicates, and secondly remove the duplicate records once you ascertain they don’t need to be in your list. This article discusses 8 cool ways to identify and remove duplicates in Excel.We shall work with a dummy set of sales data for a given month extracted from a company’s ERP. The list has a number of duplicate invoices which can overstate the revenue generated in the month.
Download the Excel file that goes with this post. Each of the methodologies has a specific worksheet in this file.
Leverage on the following approaches to identify and remove any occurrences of repeat records.
#1: Using Conditional Formatting
This is the easiest method to check presence of repeat records. It works best when you only want to check duplicates in a single column.
- Highlight the invoice numbers, range A2:A20 in this example
- From the Home tab >>Conditional Formatting >>Highlight Cells Rules >>Duplicate values…
- You can choose to retain the default settings from the dialog box that pops up or select Custom Format… from the right side of this window to select a different colour. Press ok when you are comfortable with the selected colour.
As you can see from the figure above, all the repeated invoice numbers are highlighted.
In some instances, you may want to ascertain if the entire row/record is a duplicate. In that case, you would require to create a helper column that uses the CONCATENATE function. This column will combine all the other columns.
Once combined, you can then apply the steps above on this column to check for duplicates.
#2: Using the COUNTIFS Function
The COUNTIFS function counts how many times an item occurs in a given list. In this example, we shall count how many times each of the invoice numbers appear in the Invoice# column. The formula is entered as follows:
The number returned by this function indicates how many times that invoice number (in a given row) appears in the range A2:A20. Be keen to lock this range absolutely to ensure the counting only happens in this confined list.
Once you get the values, you can either filter out all numbers greater than 1 OR use conditional formatting to highlight the repeated records as follows:
- Highlight the entire data set (do not select the column headings)
- From the e Home tab >>Conditional Formatting >>New Rule >> select “Use a formula to determine which cells to format” and type this formula =COUNTIFS ($A$2:$A$20, $A2)>1. Make sure you lock the cells as indicated in this formula.
- Click the Format button and from the dialog box that pops up, go to Fill tab and choose any colour.
#3: Using the COUNTIFS Function to Highlight Second and Subsequent Occurrences
The formula discussed above can be tweaked to only highlight subsequent repetition, i.e. don’t highlight the first instance for the invoice. We shall enter this formula in column G and compare the results with column F.
=COUNTIFS ($A$2:A2, A2)
The trick is to not select the entire range A2:A20 but to just count up to current row. By locking / fixing the starting point of our range and allowing the end to be flexible, we ensure that the formula counts up to the current row where we have auto-filled our formula. In effect therefore, it will return a number greater than 1 only in the second and subsequent occurrences.
The original formula in column F counted the total number an invoice number occurs in the [entire] range.
#4: Using PivotTables
This is another easy method.
We insert a PivotTable from the Insert menu, and then drag the Invoice# column in the Rows and the Customer column to the Values boxes.
Using the methods discussed earlier in this post you can then highlight the records with a value greater than 1.
#5: Using Advanced Filters to Filter Unique Values
Advanced filters is one of the least utilized feature in Excel. We shall use Advanced Filters to generate a list of unique records in a separate location in our worksheet.
- Ensure the active cell is inside your data set. Then from the Data tab >>Sort & Filter group >>click Advanced. This will highlight the data and the following dialog box will appear.
- Check the “Copy to another location” option. This will open up the ‘Copy to:’ box where you select the location in your worksheet where the new list will be placed. We have selected H1.
- Finally, tick the “Unique records only” check box and press OK. You will get a unique list of non-repeat records.
#6: Remove duplicates based on a condition in another column
This is a special case where you can apply logic and the formulas that you know in Excel.
For instance, in the records below, we would want to keep an item only where the quantities are less than or equal to 1000.
To achieve this, we need a formula that checks (1) if the item is repeated and (2) the value in the quantity column is >1000. These are the two conditions to delete a record. We shall enter the following formula in D3:
=IF ((COUNTIFS ($B$3:$B$14, B3)>1)*(C3>1000) =0,”Keep”,”Delete”)
As you can see, the record is retained either because it is not repeated or the value in quantity column is <=1000. The logic of the formula is that the COUNTIF part will return a TRUE/FALSE based on whether the item is repeated, whereas the C3>1000 will also give a TRUE/FALSE depending on whether the quantity exceeds a thousand or not. We then multiply the two results to either get a 1 or a 0. To delete a record, the multiplication must be =1 meaning both conditions are satisfied.
#7: The Remove Duplicates Option
The methods discussed above helps you to identify duplicate records. Once identified and investigations reveal they ought to be removed, you can do the following to remove the records.
- Select your data
- Go to Data tab > Data Tools group > Remove Duplicates
When you “Select All” in the figure above, this implies that Excel will scan through the entire record/row and remove the repeated records. To base the removal on the Invoice# column alone, unselect all and them tick that particular column.
#8: Using VBA code to deal with duplicates in Excel
If you are a lover of code, or if in one of your projects that require code you are confronted with the need to remove duplicates, the following can be of great help.
Remove duplicates from a single column using VBA
The code below will come in handy:
'first make a copy of column A and paste in column D
'remove dupes from column D
ActiveSheet.Range("D:D").RemoveDuplicates Columns:=Array(1), Header:=xlYes
Remove duplicates from a range using VBA
The following code works for a specified range. Use the downloaded file to scan through the lines to understand.
Dim sht As Worksheet
Dim lastRow As Long
Set sht = ThisWorkbook.Worksheets("UsingVBARange")
'identify last row with data
lastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
'make a copy of the dataset
sht.Range("A1:E" & lastRow).Copy _
Destination:=sht.Range("H1:L" & lastRow)
'remove duplicates from the copy
sht.Range("H1:L" & lastRow).RemoveDuplicates _
Columns: =Array (1, 2), Header: =xlYes
I won’t go into details of how the code works. Please note you may need to tweak the code to fit your scenario. The code is simplified to avoid confusion.
I hope this post has expanded your list of arsenals when it comes to duplicates. If this article was helpful to you, it will definitely be useful for others in your circles. Kindly share the same via your social networks or email a friend. Thank you for your support.