We are often faced with the question of how conversations differ between groups. One way of answering this question is to look at the weighted log odds ratio for terms used in each group. This number tells us how many times more/less likely a word is used for that group compared to the average within that dataset. ParseR uses functions from the tidylo package to calculate these.

To demonstrate how to use the calculate_wlos function, we will use the example data included within the ParseR package.

# Example data
example <- ParseR::sprinklr_export

## Calculate Weighted Log-Odds Ratios

Let’s say we want to compare how males and females talk about Hispanic Heritage Month.

# Remove rows with no gender information
example <- example %>%
dplyr::filter(SenderGender != 'NA')

# Calculate WLOs
wlos <- ParseR::calculate_wlos(example,
topic_var = SenderGender,
text_var = Message,
top_n = 30)

example is a list object that contains two items:

1. view: a human-readable tibble that contains the weighted log-odds for each of the top_n = 30 terms

wlos$view ## # A tibble: 3,961 × 4 ## SenderGender word n log_odds_weighted ## <chr> <chr> <int> <dbl> ## 1 F hispanic 254 -1.01 ## 2 F heritage 201 -1.15 ## 3 F twitter 179 -0.797 ## 4 M hispanic 171 1.16 ## 5 M heritage 147 1.32 ## 6 M twitter 119 0.916 ## 7 F hispanicheritagemonth 107 -0.157 ## 8 F month 94 -0.583 ## 9 M month 63 0.669 ## 10 M hispanicheritagemonth 57 0.182 ## # … with 3,951 more rows 2. viz: a plot that visualises terms in the view tibble with term frequency on the x-axis, and weighted log-odds on the y-axis. wlos$viz

What we can learn from the plot is that men appear more likely to associate Hispanic Heritage Month with the celebratory aspects, whereas women discuss the learnings they can take from it.