Skip to contents

A function to visualize word frequency and log odds ratio for a given dataset. The function takes several arguments and returns a list containing the visualization and the underlying data.

Usage

calculate_wlos(
  df,
  topic_var,
  text_var = Message,
  top_n = 30,
  filter_by = c("association", "frequency"),
  top_terms_cutoff = 500,
  nrow = 4
)

Arguments

df

A dataframe.

topic_var

The variable that contains the topic label for each post.

text_var

The variable containing the text to be compared.

top_n

The maximum number of words to include for each topic in the visualisation.

filter_by

Whether the top_n terms per category are ordered by their association (high WLO value) or by frequency

top_terms_cutoff

The maximum number of terms to select the WLOs from. Only relevant for filter_by = "association"

nrow

The number of rows the plots should be displayed with.

Value

A list containing a table with weighted log-odds calculated for each word in each group, and a visualisation

Details

Plots terms that are more/less likely to appear in conversation across different topics.

When using filter_by = "association" it's important to look at the top_terms_cutoff argument. The higher the value, the more terms you'll include; this can mean you include very low frequency terms. A good starting point is ~ 500-1000. If you wanted to see only the top 50 terms of the whole dataset and how they are distributed across groups, you could use top_terms_cutoff == 50.

Examples

{
sprinklr_export <- sprinklr_export[1:1000,]
sprinklr_export <- clean_text(sprinklr_export, Message)
calculate_wlos(sprinklr_export, SocialNetwork, Message)
}
#> Beginning parallel sessions
#> Ending parallel sessions
#> $viz

#> 
#> $view
#> # A tibble: 90 × 4
#>    SocialNetwork word           n log_odds_weighted
#>    <chr>         <chr>      <int>             <dbl>
#>  1 TWITTER       hispanic     712              2.44
#>  2 WEB           the          220              3.10
#>  3 INSTAGRAM     the          189              2.31
#>  4 TWITTER       beto         180              6.30
#>  5 WEB           and          167              2.92
#>  6 TWITTER       orourke      166              6.00
#>  7 INSTAGRAM     to           161              2.84
#>  8 INSTAGRAM     and          161              2.81
#>  9 TWITTER       caucus       160             10.7 
#> 10 TWITTER       membership   151             10.4 
#> # ℹ 80 more rows
#>