We are often faced with the question of how conversations differ between groups. One way of answering this question is to look at the weighted log odds ratio for terms used in each group. This number tells us how many times more/less likely a word is used for that group compared to the average within that dataset. ParseR uses functions from the tidylo package to calculate these.

To demonstrate how to use the calculate_wlos function, we will use the example data included within the ParseR package.

# Example data
example <- ParseR::sprinklr_export

Calculate Weighted Log-Odds Ratios

Let’s say we want to compare how males and females talk about Hispanic Heritage Month.

# Remove rows with no gender information
example <- example %>%
  dplyr::filter(SenderGender != 'NA')

# Calculate WLOs
wlos <- ParseR::calculate_wlos(example, 
                               topic_var = SenderGender,
                               text_var = Message,
                               top_n = 30)

example is a list object that contains two items:

  1. view: a human-readable tibble that contains the weighted log-odds for each of the top_n = 30 terms

    wlos$view
    ## # A tibble: 3,961 × 4
    ##    SenderGender word                      n log_odds_weighted
    ##    <chr>        <chr>                 <int>             <dbl>
    ##  1 F            hispanic                254            -1.01 
    ##  2 F            heritage                201            -1.15 
    ##  3 F            twitter                 179            -0.797
    ##  4 M            hispanic                171             1.16 
    ##  5 M            heritage                147             1.32 
    ##  6 M            twitter                 119             0.916
    ##  7 F            hispanicheritagemonth   107            -0.157
    ##  8 F            month                    94            -0.583
    ##  9 M            month                    63             0.669
    ## 10 M            hispanicheritagemonth    57             0.182
    ## # … with 3,951 more rows
  2. viz: a plot that visualises terms in the view tibble with term frequency on the x-axis, and weighted log-odds on the y-axis.

    wlos$viz

What we can learn from the plot is that men appear more likely to associate Hispanic Heritage Month with the celebratory aspects, whereas women discuss the learnings they can take from it.