In this article, we discuss 3 ways to extract the last N characters from a string in R. The string can be a single variable, an element of a vector, or an observation in a data frame.
Although you can carry out this operation with basic R code, the best way to read the last N characters from a string is by using the str_sub() function. This function is part of the stringr package and therefore compatible with the dplyr package.
Besides extracting a substring from a string, you can also use the str_sub() function to remove some characters in a string (article).
Next, we will show the 3 methods to extract the last N characters from a string in R.
METHOD 1: Extract the Last N Characters with R Base Code
Although it’s not the easiest method, you can read characters from a string starting from the right by only using basic R code. This method is only the preferred option in a situation where you can’t or won’t install packages.
You need two R base functions to extract the last N characters from a string, namely:
- substr() (or substring())
The substr() function extracts a substring from a string and has the following syntax:
substr(text, start, stop)
- text is a character vector.
- start is the first element to be read.
- stop is the last element to be read.
For example, for a string of 9 characters (e.g., “My String”) you can extract the last 3 characters by reading from position 7 to 9.
# Extract the Last 3 Characters from "My String" substr("My String", start = 7, stop = 9)
However, if you work with strings of different lengths, the start and stop position change. Fortunately, you can use the nchar() function so that you don’t have to change these two arguments manually.
The nchar() function is an R base function that takes a string as its argument and returns the number of characters in this string.
So, by combining the substr() function and the nchar() function you can extract the last N characters from a string regardless of its length.
# Extract the Last 3 Characters from "My String" (dynamically) my_string <- "My String" last_n <- 3 substr(my_string, nchar(my_string) - last_n + 1, nchar(my_string))
Notice that we didn’t explicitly specify the values of the second and third argument. By using the nchar() function, your R code becomes more flexible.
METHOD 2: Extract the Last N Characters with the str_sub() Function
The best method to read the last N characters from an R string is by using the str_sub() function.
The str_sub() function is part of the stringr package, and has the following syntax:
str_sub(string, start, end)
- string is a character string.
- start defines the position of the first character to read.
- end establishes the position of the last character to read.
If you don’t define the end argument, the str_sub() reads until the end of the string. Hence, to read the last N characters, you only need to define the arguments string and start.
Also, the str_sub() function can read strings from right to left if you provide a negative value as the start argument. This is a big advantage compared to the substr() function.
For example, to read the last 3 characters from a string (respectively of its length), you can use the argument start = -3.
install.packages("stringr") library("stringr") my_string <- "My String" last_n <- 3 str_sub(my_string, start = -last_n)
Needless to say, the str_sub() function also works with a vector of strings.
vector_of_strings <- c("My String", "Two Words", "12345") last_n <- 3 str_sub(vector_of_strings, start = -last_n)
METHOD 3: Extract the Last N Characters with the dplyr
Lastly, we demonstrate how to extract the last N characters from a string with dplyr.
Because the dplyr package and the stringr package are both part of the tidyverse you can use their functions together. In other words, you can use the str_sub() function as part of a dplyr operation.
For example, you can create a new column with the last 3 characters of a string using the modify() function and the str_sub() function.
install.packages("tidyverse") library("tidyverse") my_data <- data.frame(name = c("Peter", "Anna", "Maria")) my_data %>% mutate(last_3_chars = str_sub(name, start = -3))