Group Terms Network • ParseR

Often, we may want to examine how text differs between all levels of a group variable in our data set. For example, we may have data that represents mentions of brands, audiences or are classified by something such as sentiment/tone. Conveniently, we can do so using the viz_group_terms_network function of ParseR. The use of networks are ideal for visualizing relationships between groups in data.

Here, we’ll run through an example use case for the viz_group_terms_network function, showcasing how we can visualize any nuances in text between our groups. In this instance, we’ll use the sentiment classification of the message variable within our example data. By doing so, we can see what terms are most likely to be used in conjunction with each ascribed sentiment class; Negative Neutral and Positive.

First of all, we want to load in our data and ensure that stop words are removed from the message variable, the aim of doing this is to reduce noise and help surface anything insightful in our data.

# Generate a data sample 
set.seed(1)
example <- ParseR::sprinklr_export %>%
  dplyr::slice_sample(n = 1000) %>% 
  janitor::clean_names() %>% 
  dplyr::mutate(message = tm::removeWords(message, tm::stopwords(kind = "en")))

By making use of arguments such as group_var and text_var, we can render a basic network. The number of terms displayed and text size can also be changed depending on user needs and requirements.

# Important to set seed for consistency across plots rendered
set.seed(12)
example %>% 
  ParseR::viz_group_terms_network(group_var = 
                                  sentiment, text_var = message, 
                                  n_terms = 20, text_size = 2.5, 
                                  with_ties = TRUE, 
                                  group_colour_map = NULL, 
                                  terms_colour = "black", 
                                  selected_terms = NULL, selected_terms_colour =  NULL)

Looking at the initial plot, one of the first things we may feel is that we want/need to apply an appropriate colour map to our group variable. We can do so by creating a named vector or character vector and supplying this to the argument group_colour like so.

# Define the colour list
sentiment_colours <- c("NEGATIVE" = "#8b0000",
                       "NEUTRAL" = "grey45",
                       "POSITIVE" = "#008b00")

set.seed(12)
example %>% 
  ParseR::viz_group_terms_network(group_var = sentiment,
                                  text_var = message,
                                  n_terms = 20,
                                  text_size = 2.5,
                                  with_ties = TRUE,
                                  group_colour = sentiment_colours,
                                  terms_colour = "black", 
                                  selected_terms = NULL, selected_terms_colour =  NULL)

One may also be interested in plucking any terms that prove useful for storytelling, and applying them with a different colour to help draw attention to any key insights. Similar to how we have done with the group variable colour map, we create a list of terms and supply that list to the selected_terms argument, and then supply the appropriate colours to both terms_colour and selected_terms_colour. Note: It’s likely that the user will want to include each of the group variable names within the terms list if they wish to reduce visibility of all other terms(using terms_colour).

# Define any terms we wish to select 
selected_terms <- c("NEGATIVE", "NEUTRAL", "POSITIVE", "racism", "disgusting", "colorism", "proud", "love", "amazing", "latino", "latinx", "celebrate")

set.seed(12)
example %>% 
  ParseR::viz_group_terms_network(group_var = sentiment, 
                                  text_var = message,
                                  n_terms = 20, 
                                  text_size = 2.5, 
                                  with_ties = TRUE, 
                                  group_colour_map = sentiment_colours, 
                                  terms_colour = "grey70", 
                                  selected_terms = selected_terms,
                                  selected_terms_colour =  "black")

Also, other features of the viz_group_terms_network function allow the user control over the number of terms to plot by supplying a value to the n_terms argument as well as the size of text by using text_size. Although, these two arguments would be interdependent on one another in terms of the quality and output of the plot, meaning that if the selected value for n_terms is high, then it is likely necessary to reduce text_size accordingly, and vice versa.

set.seed(12)
example %>% 
    ParseR::viz_group_terms_network(group_var = sentiment,
                                    text_var =  message, 
                                    n_terms = 30, 
                                    text_size = 2, with_ties = TRUE, 
                                    group_colour = sentiment_colours, 
                                    terms_colour = "black", 
                                    selected_terms = NULL, selected_terms_colour =  NULL)

There are some extra arguments which have not yet been covered in entirety, and allow for the user to play around with the output of the visualization. The use of these may depend on the number of groups present in the data, and the amount of information the user may want/need to portray:

n_terms The number of terms that the user wishes to demonstrate per group(if n_terms = 20 there and there are three groups, you will visualize 60 terms in total) .
text_size Represents the text size of terms and may need to be lowered when the selected value for n_terms is high or there are lots of groups in the data(the plot too busy).
with_ties A logical argument(TRUE or FALSE) defining whether to allow for > n_terms if terms have equal frequency in group_var count.