Substring Function in R: A Comprehensive Guide

Substring() function in R is widely used to either extract the characters present in the data or to manipulate the data. You can easily extract the required characters from a string and also replace the values in a string.

Greetings and Introduction

Hello folks, hope you are doing good. Today let’s focus on the substring function in R.

The substring() Function Syntax

Substring: We can perform multiple things like extracting of values, replacement of values and more. For this we use functions like substr() and substring().

substr(x,start,stop)
substring(x,first,last=1000000L)

Where:

  • x = the input data / file.
  • Start / First= starting index of the substring.
  • Stop / Last= Ending index of the substring.

Extract characters using substring() function in R

Well, I hope that you are pretty much clear about the syntax. Now, let’s extract some characters from the string using our substring() function in R.

       #returns the characters from 1,11
        df<-("Journal_dev_private_limited")
        substring(df,1,11)
        Output = “Journal_dev”

        #returns the characters from 1-7
        df<-("Journal_dev")
        substring(df,1,7)
        Output = “Journal”


Congratulations, you just extracted the data from the given string. As you can observe, the substring() function in R takes the start/first and last/end values as arguments and indexes the string and returns a required substring of mentioned dimensions.

Replace using substring() function in R

With the help of substring() function, you can also replace the values in the string with your desired values. Seems to be interesting right? Then Let’s see how it works.

        #returns the string by replacing the _ by space
        df<-("We are_developers")
        substring(df,7,7)=" "
        df
        Output = “We are developers”

        #string replacement 
        df<-("R=is a language made for statistical analysis")
        substring(df,2,2)=" "
        df
        Output = “R is a language made for statistical analysis”


Great, you did it! In this way, you can replace the values in a string with your desired value.

In the above case, you have replaced the ‘_’ (underscore) and “=” (equal sign) with a ” ” (space). I hope you got it better.

String replacement using substring() function

Till now, everything is good! But what if you are required to replace some values, which should reflect in all the strings present?

Don’t worry! We can replace the values and can make them to reflect on all the strings present.

Let’s see how it works!

        #replaces the 4th letter of each string by $
        df<-c("Alok","Joseph","Hayato","Kelly","Paloma","Moca")
        substring(df,4,4)<-c("$")
        df
        Output = “Alo$” “Jos$ph” “Hay$to” “Kel$y” “Pal$ma” “Moc$”


Oh, What happened? Every 4th letter in the strings has replaced by ‘$’ sign!.

Well, that is substring() for you. It can replace the marked positions with our given value.

In the above case, every 4th letter in all the input strings was replaced by the ‘$’ sign by the substring() function. It’s incredible right? I say Yes. What about you?

The use of substr() and str_sub() function in R

We’ve already focused on rows. Now, we will be looking into the extraction of characters in the columns as well.

Let’s see how it works!.

We can create a data frame with sample data having 2 columns namely Technologies and popularity. Let’s extract some specific characters out of this data. It will be fun.

        #creates the data frame
        df<-data.frame(Technologies=c("Datascience","machinelearning","Deeplearning","Artificalintelligence"),Popularity=c("70%","85%","90%","95%"))
        df
                   Technologies      Popularity
        1           Datascience        70%
        2       machinelearning        85%
        3          Deeplearning        90%
        4 Artificalintelligence        95%


Yes, we have now created a data frame. Let’s extract some text. To do so, run the below code to extract characters from 8-10 in all the strings in Technologies column using substr() function in R.

        #creates new column with extracted values
        df$Extracted_Technologies=substr(df$Technologies,8,10)
        df
        Output =

                     Technologies       Popularity     Extracted_Technologies
        1           Datascience_DS        70%                    enc
        2       machinelearning_ML        85%                    lea
        3          Deeplearning_DL        90%                    rni
        4 Artificalintelligence_AI        95%                    ali


Now, you can see that we have created a new column with extracted data. Like this, you can extract the data by specifying the index values.

The use of str_sub() function in R

We saw the substr() function in action. Now, as I mentioned before, we will be looking into the str_sub() function and its way of extraction.

Let’s roll!

Again we are going to create the same data frame including the data of Technologies and its popularity as well.

       df<-data.frame(Technologies=c("Datascience","machinelearning","Deeplearning","Artificalintelligence"),Popularity=c("70%","85%","90%","95%"))
        df
                   Technologies      Popularity
        1           Datascience        70%
        2       machinelearning        85%
        3          Deeplearning        90%
        4 Artificalintelligence        95%


Well, let’s make use of the str_sub() function, which will return the indexed characters as output. Taking/generating a substring in R can be done in many ways and this is one of them.

      #using the str_sub function
        df$Extracted_Technologies=str_sub(df$Technologies,10,15)
        > df
        As you can see that the str_sub() function extracted the indexed values and returns the output as shown below.

                   Technologies   Popularity    Extracted_Technologies
        1           Datascience        70%                     ce
        2       machinelearning        85%                 arning
        3          Deeplearning        90%                    ing
        4 Artificalintelligence        95%                 intell

Wrapping Up

Yes, taking or generating a substring of the given string is quite an easier task. Thanks to functions like substr(), substring(), and str_sub() which made sub stringing interesting and exciting.

That’s all for now. Don’t forget to make use of this amazing function in your computation. Happy sub-stringing!!!

Create a Free Account

Register now and get access to our Cloud Services.

Posts you might be interested in:

centron Managed Cloud Hosting in Deutschland

How to Calculate BLEU Score in Python?

Python
How to Calculate BLEU Score in Python? BLEU score in Python is a metric that measures the goodness of Machine Translation models. Though originally it was designed for only translation…