In this article, we demonstrate 3 easy ways to deal with NaN’s in R.
A NaN value in R stands for “Not a Number” and represents the outcome of a calculation with an undefined result. For example, when you divide some number by 0.
Working with NaN’s can be problematic, mainly because most functions, such as sum() or mean(), do not work correctly. They return either an error or a NaN.
Therefore, it is essential to assess whether your data contains NaN’s and, if so, to treat them accordingly.
The 3 most common ways to deal with NaN’s in R are:
- Replace them with a zero.
- Remove the observation.
- Replace them with a missing value.
In the remainder of this article, we will discuss these 3 options in more detail and provide examples. In order to support the examples, we use a dataset that has 3 numeric variables, 5 observations, and some NaN’s.
my_data <- data.frame(x1 = c(1,2,NaN,4,5), x2 = c(10,NaN,30,NaN,50), x3 = c(-2,-1,NaN,1,2)) my_data
IMPORTANT: Because of its simplicity, we will use the dplyr and tidyr packages to replace/remove NaN’s. Therefore, it’s necessary to install and load these packages first.
install.packages("dplyr") library(dplyr) install.packages("tidyr") library(tidyr)
1. Replace NaN’s with Zeros in R
A common way to deal with NaN’s is replacing them with zeros.
The easiest way to do this in R is by using the replace_na() function from the tidyr package. This function requires the column(s) in which you want to replace the NaN’s and the new value, namely 0. If you combine this function with other functions from the dplyr package, you can easily substitute the NaN’s in all columns.
The replace_na() function:
replace_na(<column name(s)>, <new value>)
Note: Altough the name suggests that the function replaces missing values (i.e., NA’s), the replace_na() function also substitutes NaN’s.
The R code below contains an example of how to use the replace_na() function and replace the NaN’s in the columns x1 and x3 with zeros.
my_data %>% mutate( across(c('x1','x3'), ~replace_na(.x, 0)) )
First, we use the mutate() function to let R know that we want to change the value(s) in one or more columns. Then, we use the across() function to specify the columns (i.e., x1 and x3). Finally, we use the replace_na() function to replace the NaN’s with zeros.
In this example, we’ve specified the column names explicitly. However, if you have a data frame with many columns and you want to replace the NaN’s in all of them, writing out all names can be time-consuming. Therefore, you can use the everything() function instead.
The everything() function lets you modify all columns at once.
my_data %>% mutate( across(everything(), ~replace_na(.x, 0)) )
2. Remove Observations with NaN’s in R
Instead of replacing NaN’s with zeros, you can also choose to remove observations that have NaN’s in one or more columns.
In R, the drop_na() function from the tidyr packages detects and removes observations with NaN’s. This function either searches for NaN’s in all columns, or in one or more specific columns of the dataframe. Alternatively, you can use the na.omit() function or the complete.cases() function.
In this article, we will solely focus on the drop_na() function. For more information about the other two functions, you can check this interesting article.
By default, the drop_na() function has no mandatory argument(s). If you do not provide arguments, then the drop_na() function removes all observations that have at least one NaN, regardless of the column(s). This is equivalent to keeping only the complete observations.
my_data %>% drop_na()
Instead, if you want to remove observations that have NaN´s in specific columns, then you can provide these column names as argument of the drop_na() function. For example, the to check for NaN´s in the columns x1 and x3, you can use the argument c(“x1”, “x3”).
my_data %>% drop_na(c("x1", "x3"))
In the example above, we check only for NaN´s in the columns x1 and x3. Therefore, the NaN´s in the column x2 in the output dataset remain.
3. Replace NaN with NA in R
Lastly, you can also replace NaN´s with NA´s.
In R, replacing NaN’s with an NA (i.e., not available) can be easily done with the replace_na() function. This functions has two mandatory arguments, namely the column(s) in which you want to replace the NaN´s and the new vale of the NaN’s. Hence, this is very similar to replacing NaN’s with zeros.
Below we provide an example how to use the replace_na() function in combination with the functions mutate(), across(), and everything().
my_data %>% mutate( across(everything(), ~replace_na(.x, NA)) )
As the image demonstrates, we’ve replaced all NaN’s with an NA.