In this article, we explain how to replace missing values in R with the group’s median.

**The easiest way to replace missing values with the group’s median in R is with the tidyverse package. Firstly, you define the groups with the group_by function. Secondly, you use the mutate function to modify missing values. Finally, apply the ifelse function and the median function to replace the NA’s.**

If you want to know how to replace NA´s with the median of a column (i.e., not per group), please read this article.

## Replace Missing Values in R with the Median by Group

There are different ways to replace missing values in R. Here we discuss three to replace NA’s with the group’s median.

In this section, we’ll use the data frame below to demonstrate how each method works.

The data frame has two columns and ten rows. You can separate the data into two groups, namely “A” and “B”. Each group has one missing value.

This is the R code to create this data frame.

```
my_groups <- c(rep("A", 5), rep("B",5))
my_values <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_df <- data.frame(my_groups, my_values)
```

### How to Replace Missing Values the Group’s Median with *data.table*

The first way to replace missing values in R with the median by group uses the ** data.table package**.

The *data.table* package provides an improved version of the R base *data.frame* with syntax and feature enhancements for ease of use, convenience, and programming speed.

**This is how to replace missing values with the group’s median with data.table**

1. Use the *setDT()* function to transform a data frame into a data.table.

2. Specify the column that contains the missing values.

3. Use the := operator to calculate the new column value per group.

4. Use the *ifelse()* function to identify missing values and replace them with the median.

5. Specify the column that defines the groups with the *by* option.

The R code below shows an example of the steps above.

```
library(data.table)
setDT(my_df)
my_df[, my_values := ifelse(is.na(my_values),
median(my_values, na.rm = TRUE),
my_values),
by = my_groups]
```

### How to Replace Missing Values the Group’s Median with *plyr*

The second way replaces NA’s with the group’s median with the *plyr *package.

The *plyr* package provides a set of tools to split data into homogenous sets, apply a function to each piece, and combine the results back together.

In our example, we split the data frame into two sets based on the *my_groups* column, replace NA’s with the median of each set, and combine the results.

**This is how you replace NA’s with the median per group with plyr**

1. Start the *ddply()* function.

2. Specify the data frame that contains the missing values.

3. Specify the column that defines the groups.

4. Use the *transform* option.

5. Specify the column that contains the missing values.

6. Use the *ifelse()* function to identify and replace missing values with the median.

7. Finish the *ddply()* function.

The R code below provides an example of the steps mentioned above.

```
library(plyr)
ddply(my_df, ~ my_groups, transform,
my_values = ifelse(is.na(my_values),
median(my_values, na.rm = TRUE),
my_values))
```

### How to Replace Missing Values the Group’s Median with *tidyverse*

The third way to replace missing values in R with the median per group uses the *tidyverse* package. This is probably the most convenient way because of its readability.

**These are the steps to replace missing values in R with the group’s median**

1. Use the *group_by()* function to specify the column that defines the group.

2. Use the *mutate()* function to modify the values in the column with the missing values.

3. Apply the *ifelse()* function to identify and replace NA’s with the median.

The code below provides an example.

```
my_df %>%
group_by(my_groups) %>%
mutate(my_values = ifelse(is.na(my_values),
median(my_values, na.rm = TRUE),
my_values))
```

**Note that you can modify the R code above easily when not one, but multiple columns define the groups in your data frame**. For example, *group_by(column1, column2, etc.)*.

## Replace Missing Values in R with the Median by Group in All Numeric Columns

So far, we’ve demonstrated how to handle missing values in one column. However, might have a data frame with many columns. Applying the techniques mentioned before to all columns can be a tedious task.

So, **how do you replace values in all columns with the group’s median?**

**These are the steps**

- Load the
*tidyverse*library. - Use the
*group_by()*function to specify the column (or columns) that defines the groups. - Apply the
*mutate_if()*function to replace only missing values in numeric columns. - Use the
*ifelse()*function to identify NA’s and replace them with the median.

The R code below provides an example of how to easily replace missing in all columns with the group’s median.

```
library(tidyverse)
my_groups <- c(rep("A", 5), rep("B",5))
my_values_1 <- c(4, 9, 10, NA, 5, 12, NA, 7, 11, 8)
my_values_2 <- c(3, NA, 4, 8, 2, 11, 15, NA, 9, 10)
my_df <- data.frame(my_groups, my_values_1, my_values_2)
my_df %>%
group_by(my_groups) %>%
mutate_if(is.numeric,
function(x) ifelse(is.na(x),
median(x, na.rm = TRUE),
x))
```

If you run the code above, this will be the result.