identify and visualises the textual differences between groups
Source:R/calculate_wlos.R
calculate_wlos.Rd
A function to visualize word frequency and log odds ratio for a given dataset. The function takes several arguments and returns a list containing the visualization and the underlying data.
Usage
calculate_wlos(
df,
topic_var,
text_var = Message,
top_n = 30,
filter_by = c("association", "frequency"),
top_terms_cutoff = 500,
nrow = 4
)
Arguments
- df
A dataframe.
- topic_var
The variable that contains the topic label for each post.
- text_var
The variable containing the text to be compared.
- top_n
The maximum number of words to include for each topic in the visualisation.
- filter_by
Whether the top_n terms per category are ordered by their association (high WLO value) or by frequency
- top_terms_cutoff
The maximum number of terms to select the WLOs from. Only relevant for filter_by = "association"
- nrow
The number of rows the plots should be displayed with.
Value
A list containing a table with weighted log-odds calculated for each word in each group, and a visualisation
Details
Plots terms that are more/less likely to appear in conversation across different topics.
When using filter_by = "association" it's important to look at the top_terms_cutoff argument. The higher the value, the more terms you'll include; this can mean you include very low frequency terms. A good starting point is ~ 500-1000. If you wanted to see only the top 50 terms of the whole dataset and how they are distributed across groups, you could use top_terms_cutoff == 50.
Examples
{
sprinklr_export <- sprinklr_export[1:1000,]
sprinklr_export <- clean_text(sprinklr_export, Message)
calculate_wlos(sprinklr_export, SocialNetwork, Message)
}
#> Beginning parallel sessions
#> Ending parallel sessions
#> $viz
#>
#> $view
#> # A tibble: 90 × 4
#> SocialNetwork word n log_odds_weighted
#> <chr> <chr> <int> <dbl>
#> 1 TWITTER hispanic 712 2.44
#> 2 WEB the 220 3.10
#> 3 INSTAGRAM the 189 2.31
#> 4 TWITTER beto 180 6.30
#> 5 WEB and 167 2.92
#> 6 TWITTER orourke 166 6.00
#> 7 INSTAGRAM to 161 2.84
#> 8 INSTAGRAM and 161 2.81
#> 9 TWITTER caucus 160 10.7
#> 10 TWITTER membership 151 10.4
#> # ℹ 80 more rows
#>