Generate pairwise correlations for a vector of terms of interest.

calculate_corr(
df,
text_var,
terms,
min_freq = 10,
corr_limits = c(-1, 1),
n_corr = 75,
hashtags = FALSE,
mentions = FALSE
)

Arguments

df A dataframe where each row is a separate post. The variable containing the text which you want to explore. The terms of interest. You can use multi-word phrases. The minimum number of times a term must be observed to be considered. Numerical lower and upper bounds for correlations. The number of correlations to include (begins with the most positive within the range specified in corr_limits). Should hashtags be included? Should mentions be included?

Value

A list containing a summary table and a tidygraph object suitable for a network visualisation.

Examples

calculate_corr(
df = sprinklr_export,
text_var = Message,
terms = c("foo", "bar", "I'm looking for"),
min_freq = 10,
corr_limits = c(-1, 1),
n_corr = 75,
hashtags = TRUE,
mentions = FALSE
)
#> Using to_lower = TRUE with token = 'tweets' may not preserve URLs.#> $viz #> # A tbl_graph: 77 nodes and 76 edges #> # #> # An unrooted tree #> # #> # Node Data: 77 × 2 (active) #> word term_freq #> <chr> <int> #> 1 bar 16 #> 2 hookah 13 #> 3 #downtownorlando 23 #> 4 doors 25 #> 5 #orlandofl 21 #> 6 #orlandoflorida 21 #> # … with 71 more rows #> # #> # Edge Data: 76 × 3 #> from to correlation #> <int> <int> <dbl> #> 1 1 2 0.206 #> 2 1 3 0.206 #> 3 1 4 0.201 #> # … with 73 more rows #> #>$view
#> # A tibble: 76 × 3
#>    from  to               correlation
#>    <chr> <chr>                  <dbl>
#>  1 bar   hookah                 0.206
#>  2 bar   #downtownorlando       0.206
#>  3 bar   doors                  0.201
#>  4 bar   #orlandofl             0.161
#>  5 bar   #orlandoflorida        0.161
#>  6 bar   #orlandocity           0.161
#>  7 bar   #orlandonights         0.161
#>  8 bar   #collegenight          0.161
#>  9 bar   #citywalk              0.161
#> 10 bar   #ucf                   0.161
#> # … with 66 more rows
#>