3 Simple Ways to Count the Number of Words in a Character String in R [Examples]

In this article, we show 3 simple ways to count the number of words in a string in R.

Counting the number of words might be necessary to check if a string meets a specific requirement, or if the string contains actual words instead of a set of random characters.

For example, the character string below has 5 words.

Today is a beautiful day

However, the next string has 4 words and a combination of 3 characters, namely “!$?“. When we count words, we want R to ignore “!$?“.

Today is a !$? day

How to Count the Number of Words in a String in R

We will show 3 simple ways to count the number of words in R. The first two methods use only basic R code. For the third method, you need to install the stringr package.

You can install and load the stringr package with the following R code.

if (!require("stringr")) install.packages("stringr")

Count the Number of Words with STRSPLIT

The first method uses the strsplit() function to count the number of words in a string.

These are the steps:

  1. Use the strsplit() function to create a vector of your string. The strsplit() function converts each word of your string into a single element of the vector.
  2. Count the number of elements (i.e., words) of the vector with the length() function.

The strsplit() function provides an elegant way to separate the words of a string into elements of a vector. Note that the second argument of the strsplit() function defines the character that separates the words. In this case, we use a single blank.

See the R code below to count the number of words in a string.

my_string <- "Today is a beautiful day"
length(strsplit(my_string, " ")[[1]])
Count the words in a string in R with strsplit

The string above is a normal string. However, the following string doesn’t contain only words. It also contains 3 random characters.

Today is a !$? day

If you run the code below, R lets you know that this string has also 5 words.

my_string <- "Today is a !$? day"
length(strsplit(my_string, " ")[[1]])

However, we want that R ignores the 3 random characters, and hence returns 4 as the number of words in this string. To do so, we use the regular expression \\w+.

The regular expression \\w+ counts only words that start with letters, numbers, underscores, or asterisks. All other “words” are ignored.

In the code below, we use the \\w+ regular expression as the second argument of the strsplit() function. As a result, R returns that the string contains just 4 words.

my_string <- "Today is a !$? day"
length(strsplit(my_string, "\\w+")[[1]])

Count the Number of Words with GREGEXPR

The second method to count the number of words in a character string uses the gregexpr() function. You can use this function to find matching substrings in a larger string.

As discussed before, you can use the regular expression \\w+ to count only words that start with letters, numbers, underscores, or asterisks. We’ll use this regular expression in the gregexpr() function too.

These are the steps to count the number of words in a string with the gregexpr() function:

  1. Define as the first argument of the gregexpr() function the regular expression R should look for. In this case, “\\w+”.
  2. Define as the second argument of the gregexpr() function your string of which you want to know the number of words.
  3. Use the lengths() function to count the number of occurances the gregexpr() function has found with the regular expression (i.e., words) you looked for.

Note that you should use the lengths() function instead of the length() function. The gregexpr() function returns a list, and we are only interested in the first element of the list. The lengths() function counts the length of the first element of a list.

The R code below shows how you can use the gregexpr() function to count the number of words in an R string.

my_string <- "Today is a beautiful day"
lengths(gregexpr("\\w+", my_string))
Count the words in a string in R with gregexpre

The next code demonstrates that R ignores the !$?-characters while counting the number of words.

my_string <- "Today is a !$? day"
lengths(gregexpr("\\w+", my_string))

Count the Number of Words with STR_COUNT

The third method to count the number of words in an R string uses the str_count() function from the stringr package. The stringr package provides many powerful functions that you can use to manipulate strings.

Like the first two methods, the str_count() function uses a regular expression to count the number of words in an R string. As a regular expression, it uses \\w+ to only count words that start with letters, numbers, underscores, or asterisks. Words that start with other characters will be ignored while counting the total number of words in the string.

Counting the words in a string the with str_count() function is straightforward. The function as two arguments:

  1. Your character string.
  2. The regular expression you want to count. In this case, “\\w+’.

The R code below shows how to count the words of a string in R with the str_count() function.

library(stringr)
my_string <- "Today is a beautiful day"
str_count(my_string, "\\w+")
Count the words in a string in R with str_count

The next example proves that with the \\w+ expression, R counts only words.

library(stringr)
my_string <- "Today is a !$? day"
str_count(my_string, "\\w+")

Instead of counting the number of words in a string, you can also use the str_count() function to count the number of digits in a string.

What is the Fastest Way to Count the Number of Words in an R string

In this article, we’ve discussed three ways to count the number of words in a string in R. But, what is the fastest way to count the number of words?

The table and graph below show a comparison of the three methods. It compares the time needed to count the number of words in different strings. In this experiment, the strings had 10, 100, 1.000, and 10.000 words.

As the table and graph clearly demonstrate, the str_count() function is the fastest when it comes to counting the number of words in a string in R.

Method# of Words per StringExecution Time
strsplit()102.153308e-04
strsplit() 1002.189809e-04
strsplit() 1.0002.404510e-04
strsplit() 10.0006.296285e-04
gregexpr()103.362013e-04
gregexpr () 1002.826450e-04
gregexpr () 1.0003.816869e-04
gregexpr () 10.0005.997638e-04
str_count()103.315666e-05
str_count () 1004.480765e-05
str_count () 1.0001.018986e-04
str_count () 10.0009.704359e-05
Compare performance counting words in a string in R

Related Articles