How to Remove All Columns with a Common Prefix/Suffix in R [Examples]

If you want to remove columns in R, you can explicitly specify their names. However, this can be a tedious task if you have many columns. Therefore, it would be ideal to have R code deleting multiple columns without writing their names.

This article discusses how to efficiently remove all columns from an R data frame that share a prefix/suffix.

We’ll provide 2 methods that you can use. The first uses only basic R code, while the second utilizes the tidyverse package.

If you want to know how to drop columns in general, check out this article.

How to Remove All Columns with a Prefix

The data frame below has 4 columns, of which two start with “var_“. The goal is to remove these columns efficiently. In other words, we want to write R code that can be reused even if the number of columns that start with “var_” increases.

R data frame containing columns with a prefix.

Removing all columns with a common prefix is a two-step process.

1. Identify the columns with the prefix

In the first step, you use the grep() function to identify all columns with the prefix. The function takes two arguments, namely a character string (i.e., the prefix you are looking for), and a character vector that contains all column names.

You need to names() function to create a vector with all the column names of a data frame. For example, names(my_df).

As a result, the grep() function returns a vector with the column names that contain the string (i.e., the prefix) you are looking for.

2. Remove the columns with the prefix

The second step is to remove the identified columns. You can do this either with the brackets notation or the select() function from the tidyverse package.

The example below shows how to remove the columns with the brackets notation.

my_df <- data.frame(var_1 = c(1:10),
                    var_2 = letters[seq(1,10)],
                    col_3 = c(rep("A", 5), rep("B",5)),
                    col_4 = seq(0.1, 1, 0.1))
my_df

columns_to_remove <- grep("var_", names(my_df))
my_df[,-columns_to_remove]

The following R code uses the tidyverse package to remove all columns that start with the same characters.

my_df <- data.frame(var_1 = c(1:10),
                    var_2 = letters[seq(1,10)],
                    col_3 = c(rep("A", 5), rep("B",5)),
                    col_4 = seq(0.1, 1, 0.1))
my_df

if (!require("tidyverse")) install.packages("tidyverse")
columns_to_remove <- grep("var_", names(my_df))
my_df %>% select(-columns_to_remove)

As the following image demonstrates, we have successfully removed the columns var_1 and var_2 without explicitly specifying their names.

Remove columns with a prefix in R

How to Remove All Columns with a Suffix

Instead of removing all columns with a common prefix, we also demonstrate how to delete all columns with a suffix.

The R data frame below has 4 columns, of which 2 have a similar suffix, namely “_num“. We want to remove these columns without writing out their names.

R data frame with columns with a suffix.

To remove all columns with a common suffix from an R data frame, you need to use the grep() function. This function identifies and returns a vector with all columns that share the suffix. You can use this vector as an argument of the select() function to remove the columns from the data frame.

Once you have identified the columns with the suffix, you can remove them in two ways. You use either the bracket notation (basic R code) or the select() function (tidyverse).

Below we use only basic R code to remove the columns that share the suffix “_num“.

my_df <- data.frame(int_num = c(1:10),
                    letters = letters[seq(1,10)],
                    A_B = c(rep("A", 5), rep("B",5)),
                    double_num = seq(0.1, 1, 0.1))
my_df

columns_to_remove <- grep("_num", names(my_df))
my_df[,-columns_to_remove]

This code snippet demonstrates how to remove the columns that end with “_num” using the tidyverse package. We use the select() function and the min-operator to delete the unwanted columns.

my_df <- data.frame(int_num = c(1:10),
                    letters = letters[seq(1,10)],
                    A_B = c(rep("A", 5), rep("B",5)),
                    double_num = seq(0.1, 1, 0.1))
my_df

if (!require("tidyverse")) install.packages("tidyverse")
columns_to_remove <- grep("_num", names(my_df))
my_df %>% select(-columns_to_remove)

Note that we’ve used two steps to remove the desired columns. First, we identified the columns. Then, we removed them. However, you could combine the 2 lines of code into just one.

As you can see in the image below, we have successfully removed the columns int_num and double_num without explicitly mentioning them. This makes the code robust and reusable.

Remove columns with a suffix from an R data frame.

Related Articles

In this article, we discussed how to remove columns from an R data frame that share a common prefix/suffix. However, you might be interested in the following article(s) too.