Skip to contents

A network visualisation, where the big nodes are levels of group_var e, e.g. topic -> topic_1, topic_2 etc. connected by edges to the small nodes which are the most frequently seen terms for each level of the grouping variable

Usage

viz_group_terms_network(
  data,
  group_var,
  text_var,
  n_terms = 20,
  text_size = 4,
  with_ties = FALSE,
  group_colour_map = NULL,
  terms_colour = "black",
  selected_terms = NULL,
  selected_terms_colour = "pink"
)

Arguments

data

A data frame or tibble containing both the text_var and group_var columns

group_var

A group variable or factor. This could be either; brand, audience, sentiment or similar

text_var

The text variable assigned to each observation containing the message or post

n_terms

How many of the highest mentioned terms per group should be included in the visualization

text_size

An integer stating the desired text size

with_ties

Whether to allow for > n_terms if terms have equal frequency in `group_var`'s count

group_colour_map

For if the user wants to apply custom colour mapping to the group variables

terms_colour

What colour should the terms be?

selected_terms

Any terms that the user wishes to colour differently(should be supplied as a list)

selected_terms_colour

What colour should any selected terms be? This includes all those defined in the list supplied to the `selected_terms` argument

Value

Returns a ggplot network visualization showing the relationship between terms in a text variable and any group variable in the data, byway of counting the most frequently used terms in conjunction with each class of the group variable.

Details

The main idea of this function is to help identify which groups have similar terms associated with them - big nodes will placed close by to the other big nodes they share terms with, if a big node shares no other terms with another big node they will be placed far apart.

It's important to communicate how many of the top terms have been selected for, as if the term "happy" is #18 for group 1, and #21 for group 2, and our cut off point was 20, we may falsely assume that "happy" is not a term shared by both groups. Looking further down the list (setting n_terms to 30-40) to strengthen any inferences made is recommended.

Examples

set.seed(1)
viz_group_terms_network(data = ParseR::sprinklr_export,
group_var = Sentiment,
text_var = Message,
n_terms = 20,
text_size = 4,
with_ties = FALSE,
group_colour_map = NULL,
terms_colour = "black",
selected_terms = NULL,
selected_terms_colour = "black")
#> Warning: Removed 27 rows containing missing values or values outside the scale range
#> (`geom_point()`).


# To add group colour
sentiment_colours <- c("NEGATIVE" = "#8b0000",
"NEUTRAL" = "grey45", 
"POSITIVE" = "#008b00")

set.seed(1)
viz_group_terms_network(data = ParseR::sprinklr_export,
group_var = Sentiment, 
text_var = Message,
n_terms = 20, text_size = 4,
with_ties = FALSE,
group_colour_map = sentiment_colours,
terms_colour = "black",
selected_terms = NULL,
selected_terms_colour = "black")
#> Warning: Removed 27 rows containing missing values or values outside the scale range
#> (`geom_point()`).


# To supply selected terms and colour them differently
selected_terms <- c("hispanic", "heritage", "month")

set.seed(1)
viz_group_terms_network(data = ParseR::sprinklr_export,
group_var = Sentiment,
text_var = Message,
n_terms = 20,
text_size = 4,
with_ties = FALSE,
group_colour_map = sentiment_colours,
terms_colour = "grey70",
selected_terms = selected_terms,
selected_terms_colour = "black")
#> Warning: Removed 27 rows containing missing values or values outside the scale range
#> (`geom_point()`).


plot <-viz_group_terms_network(data = ParseR::sprinklr_export,
group_var = Sentiment,
text_var = Message,
n_terms = 20,
text_size = 4,
with_ties = FALSE,
group_colour_map = sentiment_colours,
terms_colour = "grey70",
selected_terms = selected_terms,
selected_terms_colour = "black")