Generate pairwise correlations for a vector of terms of interest.

calculate_corr(
 df,
 text_var,
 terms,
 min_freq = 10,
 corr_limits = c(-1, 1),
 n_corr = 75,
 hashtags = FALSE,
 mentions = FALSE
)

Arguments

df

A dataframe where each row is a separate post.

text_var

The variable containing the text which you want to explore.

terms

The terms of interest. You can use multi-word phrases.

min_freq

The minimum number of times a term must be observed to be considered.

corr_limits

Numerical lower and upper bounds for correlations.

n_corr

The number of correlations to include (begins with the most positive within the range specified in corr_limits).

hashtags

Should hashtags be included?

mentions

Should mentions be included?

Value

A list containing a summary table and a tidygraph object suitable for a network visualisation.

Examples

calculate_corr( df = sprinklr_export, text_var = Message, terms = c("foo", "bar", "I'm looking for"), min_freq = 10, corr_limits = c(-1, 1), n_corr = 75, hashtags = TRUE, mentions = FALSE )
#> Using `to_lower = TRUE` with `token = 'tweets'` may not preserve URLs.
#> $viz #> # A tbl_graph: 77 nodes and 76 edges #> # #> # An unrooted tree #> # #> # Node Data: 77 × 2 (active) #> word term_freq #> <chr> <int> #> 1 bar 16 #> 2 hookah 13 #> 3 #downtownorlando 23 #> 4 doors 25 #> 5 #orlandofl 21 #> 6 #orlandoflorida 21 #> # … with 71 more rows #> # #> # Edge Data: 76 × 3 #> from to correlation #> <int> <int> <dbl> #> 1 1 2 0.206 #> 2 1 3 0.206 #> 3 1 4 0.201 #> # … with 73 more rows #> #> $view #> # A tibble: 76 × 3 #> from to correlation #> <chr> <chr> <dbl> #> 1 bar hookah 0.206 #> 2 bar #downtownorlando 0.206 #> 3 bar doors 0.201 #> 4 bar #orlandofl 0.161 #> 5 bar #orlandoflorida 0.161 #> 6 bar #orlandocity 0.161 #> 7 bar #orlandonights 0.161 #> 8 bar #collegenight 0.161 #> 9 bar #citywalk 0.161 #> 10 bar #ucf 0.161 #> # … with 66 more rows #>