This article discusses how to replace blanks with NA’s in R.
Normally, if a value is missing, R shows an “NA” (not available). As a result, functions such as sum() or mean() recognize and ignore them. However, if missing values are not represented by NA’s, but by blanks instead, problems might occur.
Therefore, it is important to convert blanks to NA’s first.
In R, the easiest way to replace blanks with NA’s is by using the na_if() function from the dplyr package. This function checks if a value meets a specific condition (e.g., a blank) and converts it into a NA. Alternatively, you can use basic R code or the ifelse() function.
For the examples in this article, we create a data frame with 3 character columns (x1, x2, and x3) and 5 rows. Each column has a blank that we want to replace with an NA.
my_df <- data.frame(x1 = c("A", "", "C", "D", "E"), x2 = c("V", "W", "X", "", "Z"), x3 = c("a", "a", "", "a", "a")) my_df
How to Replace Blanks in One Column with NA’s
There are two ways to replace the blanks in one specific column with NA’s. Both methods use the ifelse() function. However, the first method uses this function in combination with basic R code, whereas the second method uses the dplyr package.
The ifelse() function tests a condition and depending on the outcome (TRUE or FALSE) returns a value. For example, you can assess whether a value is blank and return a NA (if TRUE) or return the original value otherwise (i.e., FALSE).
ifelse(check, yes-value, false-value)
The code below shows how to use the ifelse() function to convert blanks in column x1 into NA.
my_df$x1 <- ifelse(my_df$x1=="", NA, my_df$x1)
Alternatively, you can use the dplyr package in combination with the ifelse() function, which is a useful package for data manipulation, to replace a blank with a NA. You need the mutate() function to change the value of, in this case, column x1.
library(dplyr) my_df %>% mutate(x1 = ifelse(x1 == "", NA, x1))
How to Set Blanks to NA in Multiple Columns
Now, suppose you have a data frame with many columns and want to replace the blanks in several columns with NA.
You could use basic R code to do this (using the first example above), but it will take many lines of code. Therefore, to convert blanks into NA’s in various columns we recommend using the dplyr package and its across() function.
You can use the across() function to easily specify one or more columns you want to modify, either by mentioning the column names or column positions. For example, with the R code below, we replace the blanks in columns x1 and x2 with NA’s.
library(dplyr) my_df %>% mutate(across(c("x1","x2"), ~ifelse(.=="", NA, as.character(.))))
As you can see, we use again the ifelse() function again. The dot (.) in the first argument and third argument represents the columns you have selected with the across() function.
3 Ways to Replace Blanks with NA’s in All Columns in R
Above, we have demonstrated how to convert blanks into NA’s in one or more specific columns. However, we recommend replacing the blanks with NA’s in all columns before you start your analysis.
There are 3 ways to do this.
1. Replace Blanks with NA with Basic R Code
The first way to replace blanks with NA’s uses basic R code and needs only one line of code.
With the square brackets  we select the complete data frame and check for blanks. If the value is indeed a blank, then we replace it with an NA. Otherwise, we keep the original value.
my_df[my_df==""] <- NA
An advantage of this method is its simplicity, it requires very little code and is easy to understand. However, if you want to directly use the result of this operation (i.e., a data frame with NA’s instead of blanks), you have a problem. You can’t use the output directly as input for other actions.
Fortunately, there are methods that do satisfy this need. For example the functions from the dplyr package.
2. Replace Blanks with NA using dplyr and ifelse()
The second method to replace blanks with NA’s in all columns uses a combination of the functions: mutate(), across(), everything(), and ifelse().
- The mutate() function modifies the values in existing columns and creates new columns.
across()function helps to define which columns to modify.
everything()function specifies that all columns must be modified.
ifelse()function checks if a value is a blank. If so, it replaces this value with a NA. Otherwise, the original value is not changed.
The R code below shows an example of how all these functions work together to replace the blanks from all columns with NA’s.
library(dplyr) my_df %>% mutate(across(everything(), ~ifelse(.=="", NA, as.character(.))))
3. Replace Blanks with NA using dplyr and na_if()
The best and easiest method to replace blanks with NA’s in R is by using the functions mutate_all() and na_if() from the dplyr package.
() function facilitates the modification of all values in a data frame by applying the same operation. Next, the
() function spots the blanks and replaces them with NA’s.
See the example below.
library(dplyr) my_df %>% mutate_all(na_if,"")