Skip to contents

A function that performs a series of cleaning steps on a text variable. Useful for processing when dealing with qualitative data

Usage

clean_text(
  df,
  text_var = message,
  tolower = TRUE,
  remove_hashtags = TRUE,
  remove_mentions = TRUE,
  remove_emojis = TRUE,
  remove_punctuation = TRUE,
  remove_digits = TRUE,
  in_parallel = TRUE
)

Arguments

df

A tibble or data frame object containing the text variable the user wants to perform cleaning steps upon

text_var

The text variable with the message assigned to the observation that the user wishes to clean

tolower

Whether to convert all text to lower case?

remove_hashtags

Should hashtags be removed?

remove_mentions

Should any user/profile mentions be removed?

remove_emojis

Should emojis be removed?

remove_punctuation

Should punctuation be removed?

remove_digits

Should digits be removed?

in_parallel

Whether to run the function in parallel (TRUE = faster)

Value

The data object provided, with a cleaned text variable

Examples

if(interactive()){
#Performs all cleaning steps in parallel
cleaned_data <- clean_text(df = ParseR::sprinklr_export,
text_var = Message,
in_parallel = TRUE)

# If the user wants to perform all cleaning steps but keep capital letters and punctuation 
cleaned_data <- clean_text(df = ParseR::sprinklr_export,
text_var = Message,
tolower = FALSE,
remove_punctuation = FALSE,
in_parallel = TRUE)
}