A function that performs a series of cleaning steps on a text variable. Useful for processing when dealing with qualitative data
Usage
clean_text(
df,
text_var = message,
tolower = TRUE,
remove_hashtags = TRUE,
remove_mentions = TRUE,
remove_emojis = TRUE,
remove_punctuation = TRUE,
remove_digits = TRUE,
in_parallel = TRUE
)
Arguments
- df
A tibble or data frame object containing the text variable the user wants to perform cleaning steps upon
- text_var
The text variable with the message assigned to the observation that the user wishes to clean
- tolower
Whether to convert all text to lower case?
- remove_hashtags
Should hashtags be removed?
- remove_mentions
Should any user/profile mentions be removed?
- remove_emojis
Should emojis be removed?
- remove_punctuation
Should punctuation be removed?
- remove_digits
Should digits be removed?
- in_parallel
Whether to run the function in parallel (TRUE = faster)
Examples
if(interactive()){
#Performs all cleaning steps in parallel
cleaned_data <- clean_text(df = ParseR::sprinklr_export,
text_var = Message,
in_parallel = TRUE)
# If the user wants to perform all cleaning steps but keep capital letters and punctuation
cleaned_data <- clean_text(df = ParseR::sprinklr_export,
text_var = Message,
tolower = FALSE,
remove_punctuation = FALSE,
in_parallel = TRUE)
}