3 Ways to Remove Duplicate Column Names in R [Examples]

In this article, we discuss 3 ways to identify and remove duplicate column names from a data frame in R. For that reason, we focus only on removing columns with similar names, not with identical content.

Although (by default) R does not allow duplicate columns names in a data frame, you still might encounter them. For example, after importing data or replicating columns. Nevertheless, in most circumstances, you want to get to rid of them. So, how do you remove duplicate column names in R?

The easiest way to remove repeated column names from a data frame is by using the duplicated() function. This function (together with the colnames() function) indicates for each column name if it appears more than once. Using this information and square brackets one can easily remove the duplicate column names.

Although you can use the duplicated() function to identify and eliminate duplicate column names, there exist other methods. Therefore, in the remainder of this article, we discuss in total 3 methods that can help you in different circumstances.

Also, we discuss briefly how to rename and make duplicate column names unique (instead of removing them).

3 Ways to Remove Duplicate Column Names from a Data Frame

Before we show the 3 methods, we first create an R data frame that we will use in all the examples.

This data frame has 4 columns and 5 rows. The column names are not unique because the name x1 appears twice. Hence, the goal is to remove one of the columns called x1.

# Create a Data Frame
my_df <- data.frame(x1 = c(1:5),
                    x2 = letters[1:5],
                    x3 = seq(2,10,2),
                    x1 = LETTERS[22:26],
                    check.names = FALSE)
my_df
R data frame with duplicate column names

By default, R does not allow duplicate column names in a data frame. However, by using the check.names = FALSE option we still can generate such a data frame.

Do you know: How to Remove Columns with Duplicated Content?

1. Remove Duplicate Column Names with duplicated()

The first method to remove duplicate column names in R is by using the duplicated() function.

The duplicated() function determines which elements of a list, vector, or data frame are duplicates. So, by providing the column names of a data frame as its argument, the duplicated() function returns a logical vector indicating which column names are duplicates (using TRUEs and FALSEs).

For example:

After identifying the duplicated column names, one can easily remove them with the square brackets [] and the !-symbol.

In short, these are the steps to remove duplicate column names from an R data frame:

  1. Obtain the column names of the data frame using the colnames() function.
  2. Create a TRUE/FALSE-vector that indicates whether the column names from step 1 are duplicates (using the duplicated() function).
  3. Remove the duplicate column names with the square brackets and the !-symbol.
# Find Duplicate Column Names
duplicated_names <- duplicated(colnames(my_df))

# Remove Duplicate Column Names
my_df[!duplicated_names]
Remove duplicate column names from a data frame using duplicated() function.

As the image above shows, the duplicated() function indicates if a column name is a duplicate by reading the names from left to right. This way of handling the data has an effect on which column(s) will be removed.

Instead of reading from left to right, the duplicated() function can also read the data backward by adding the fromLast = TRUE option. As a result, the duplicated() function starts with the last column names and checks if previous names are duplicates.

For example:

# Find Duplicate Column Names (backwards)
duplicated_names <- duplicated(colnames(my_df), fromLast = TRUE)

# Remove Duplicate Column Names
my_df[!duplicated_names]
Remove duplicate column names from an R data frame using duplicated() function, starting from the left.

2. Remove Duplicate Column Names with unique()

The second method to remove duplicate column names is by using the unique() function.

In contrast to the duplicated() function, the unique() function returns a vector with distinct column names of a data frame. As a result, one can use the unique() function to select each column name once, and hence remove duplicate column names.

In short, these are the steps to remove duplicate column names with the unique() function:

  1. Create a vector of all column names from a data frame by using the colnames() function.
  2. Remove any duplicate column names from the vector with the unique() function.
  3. Use the output of the previous step to select each column name exactly once, and thereby removing the duplicate names.

For example:

# Find Unique Column Names
unique_names <- unique(colnames(my_df))

# Keep Only Unique Column Names
my_df[unique_names]
Remove duplicate column names from an R data frame using unique() function.

3. Remove Duplicate Column Names with dplyr

The third method to remove duplicate column names in R uses the dplyr package.

The dplyr package (part of the tidyverse) is a powerful package for data manipulation. It allows you to carry out different operations sequentially (e.g., filter rows, create a new column, etc.). This makes R programs more efficient and easier to read.

You need 3 functions to remove duplicate column names with the dplyr package, namely:

  • The colnames() function to create a vector of all column names in a data frame.
  • The unique() function to create a vector of the unique column names in a data frame.
  • The select() function to select the unique column names from the data frame.

The following R code shows an example.

# Load dplyr packages
require(dplyr)

# Remove Duplicate Column Names
my_df %>% 
  select(unique(colnames(.)))
Eliminate duplicate column names from a data frame using the dplyr package.

How to Rename and Make Duplicate Column Names Unique?

Instead of removing duplicate column names, it is also possible to make the column names of a data frame unique.

The easiest way to make all the column names unique is by using the make.names() function and the unique=TRUE option. If duplicate column names exist in the original data frame, this function adds a suffix to the names. For example, x1.1, .x1.2, etc. until all column names are unique.

For example:

# Create Unique Column Names
names(my_df) <- make.names(names(my_df), unique=TRUE)
my_df