5 Easy Ways to Replace Blanks in Column Names in R [Examples]

A valid column name in R consists of letters, numbers, and the dot or underline characters. However, after importing a dataset, your column names might contain blanks (i.e., whitespace). So, how do you replace blanks in the column names of your R data frame?

The easiest option to replace spaces in column names is with the clean.names() function. This R function creates syntactically correct column names by replacing blanks with an underscore. Moreover, you can use this function in combination with the %>%-operator from the Tidyverse package.

Besides the clean.names() function, we discuss 4 other options to replace blanks in a column name. The options we cover replace blanks with a dot, an underscore, or another character specified by the user.

A Sample Data Frame

Throughout this article, we will use the data frame below to demonstrate how to fix spaces in the header. The goal is to replace the blanks without explicitly specifying the column names. We want to create R code that is efficient and reusable.

An R data frame

You can recreate this data frame with the next R code.

my_df <- data.frame("x 1" = c(1:5),
                    "col 2" = letters[1:5],
                    "var 3" = sample(-5:5, 5),
                    check.names = FALSE)
my_df

1. Replace Blanks in Column Names with make.names()

The first method to remove spaces from a column name is with the make.names() function. This native R function substitutes blanks with a dot. It also makes sure that no duplicate names exist.

The make.names() function has one required argument, namely a vector with the column names. You can use the names() function to obtain the column names of a data frame.

The second, optional argument is the unique=-option. By setting this option to TRUE, R creates unique column names. We recommend using this option and set it to TRUE. (The default value is FALSE.)

The R code below shows how to use the make.names() function and replaces the blanks in the column names with a dot.

names(my_df) <- make.names(names(my_df), unique=TRUE)
my_df
Use the make.names() function to replace blanks with a dot in column names of an R data frame.

2. Replace Blanks in Column Names with gsub()

The second method to replace blanks in a column name also uses a native R function, namely the gsub() function.

The gsub() function searches for a pattern (e.g. a space) and performs a replacement of all matches. Whereas the make.names() function replaces all blanks with a dot, the gsub() function lets the user specify the replacement value. For example, you can use the gsub() function to replace blanks in column names with an underscore.

The gsub() function has 3 required arguments:

  1. The pattern you are looking for, e.g., a blank.
  2. The replacement value, e.g., an underscore.
  3. A character vector where matches are sough, e.g., column names.

Note that you must write the pattern and replacement between (double) quotes. You can use the names() function to create a character vector of the column names.

The R code below uses the gsub() function to replace blanks with an underscore in the column names of a data frame.

names(my_df) <- gsub(" ", "_", names(my_df))
my_df
Use the gsub() function to replace blanks with an underscore in column names of an R data frame.

3. Replace Blanks in Column Names with str_replace_all()

The third method to remove spaces from the column names in an R data frame uses the str_replace_all() function from the stringR package.

The stringR package provides powefull functions for string manipulation. For example, the stri_reverse() to reverse the characters in a string.

The stringR package also contains the str_replace_all() function. This function replaces matched patterns in a string. For example, blanks (the pattern) with an uderscore (the replacement value).

The str_replace_all() function has 3 required arguments:

  1. A character vector where matches are sough, e.g., column names.
  2. The pattern you are looking for, e.g., a blank.
  3. The replacement value, e.g., an underscore.

To create a character vector with column names, you can use the names() function.

This is how to use str_replace_all() to replace spaces in column names with an underscore. The first two lines of code install (if necessary) and load the stringR package.

if (!require("stringr")) install.packages("stringr")
library("stringr")
names(my_df) <- str_replace_all(names(my_df), " ", "_")
my_df
Use the s()tr_replace_all function to replace blanks with an underscore in column names of an R data frame.

4. Replace Blanks in Column Names with clean_names()

The fourth method to substitute blanks in the column names of a data frame uses the clean_names() function from the janitor package.

The janitor package provides simple tools for examining and cleaning dirty data. For example, the clean_names() function.

The clean_names() function cleans the names of a data frame and returns names that are unique and consist only of the _ character, numbers, and letters. In contrast to the previous methods, the clean_names() function takes and returns a data frame, for ease of piping with %>%.

In contrast to other methods, this method doesn’t let you specify the replacement value. In other words, all blanks are replaced by an underscore.

This is how you fix spaces in the column names of a data frame with the clean_names() function.

if (!require("janitor")) install.packages("janitor")
library(janitor)
my_df <- clean_names(my_df)
my_df
Use the clean_names() function to replace blanks with an underscore in column names of an R data frame.

5. Replace Blanks in Column Names with clean_names() and tidyverse

So far, we’ve shown how to replace blanks in column names with a separate block of R code. However, the fifth method let’s you substitute blanks with an underscore as part of a bigger block of code.

Since the clean_names() function returns a data frame, you can use this function in a chain of calculations using the pipe operator from the tidyverse package. In other words, you can fix the column names while you also add columns, carry out calculations, or filter observations.

In the example below we show how to combine the power of the clean_names() function and the tidyverse package.

if (!require("tidyverse")) install.packages("tidyverse")
library("tidyverse")
my_df <- my_df %>% 
  clean_names()
my_df