Simple String Operations

One of the major challenges of string parsing is handline additional whitespaces in words. Often additional widespaces are present on the left, right or both sides of the word.

String Trimming

The str_trim function offers an effective way to get rid of these whitespaces.

whitespace.vector <- c('  abc', 'def   ', '     ghi       ')
str_trim(whitespace.vector, side='left') # Trimming white spaces on the left side of the string.
## [1] "abc"        "def   "     "ghi       "
str_trim(whitespace.vector, side='right') # Trimming white spaces on the right side of the string.
## [1] "  abc"    "def"      "     ghi"
str_trim(whitespace.vector, side='both') # Trimming white spaces on both sides of the string.
## [1] "abc" "def" "ghi"

String Padding

Conversely we could also pad a string with additional characters for a defined width using the str_pad() function. The default padding character is a space.

str_pad('abc', width=7, side="left") # Padding characters to the left side of the string.
## [1] "    abc"
str_pad('abc', width=7, side="right") # Padding characters to the right side of the string.
## [1] "abc    "
str_pad('abc', width=7, side="both", pad="#") # Padding other characters to both sides of a string.
## [1] "##abc##"

String Wrapping

Sometimes text have to be manipulated to neat paragraphs of defined width. The str_wrap() function could be used to format the text into defined paragraphs of specific width.

some.text <- 'All the Worlds a stage, All men are merely players'
cat(str_wrap(some.text, width=25)) # Usage of the str_wrap function.
## All the Worlds a stage,
## All men are merely
## players

Extracting Words

Let us complete this chapter with the simple word() function which extract words from a sentence. We specify the positions of the word to be extracted from the setence. The default separator value is space.

some.text <- c('The quick brown fox', 'jumps on the brown dog') # Extracting the first two words of a character vector.
word(some.text, start=1, end=2)
## [1] "The quick" "jumps on"
word(some.text, start=1, end=-2) # Extracting all but the last word from a character vector.
## [1] "The quick brown"    "jumps on the brown"