Generate pairwise correlations for a vector of terms of interest.

calculate_corr(
 df,
 text_var,
 terms,
 min_freq = 10,
 corr_limits = c(-1, 1),
 n_corr = 75,
 hashtags = FALSE,
 mentions = FALSE,
 clean_text = FALSE
)

Arguments

df

A dataframe where each row is a separate post.

text_var

The variable containing the text which you want to explore.

terms

The terms of interest. You can use multi-word phrases.

min_freq

The minimum number of times a term must be observed to be considered.

corr_limits

Numerical lower and upper bounds for correlations.

n_corr

The number of correlations to include (begins with the most positive within the range specified in corr_limits).

hashtags

Should hashtags be included?

mentions

Should mentions be included?

clean_text

Should the text variable be cleaned?

Value

A list containing a summary table and a tidygraph object suitable for a network visualisation.

Examples

spinklr_export <- ParseR::sprinklr_export
calculate_corr(
  df = sprinklr_export,
  text_var = Message,
  terms = c("foo", "bar", "I'm looking for"),
  min_freq = 10,
  corr_limits = c(-1, 1),
  n_corr = 75,
  hashtags = TRUE,
  mentions = FALSE
)
#> $viz
#> # A tbl_graph: 77 nodes and 76 edges
#> #
#> # An unrooted tree
#> #
#> # A tibble: 77 × 2
#>   word            term_freq
#>   <chr>               <int>
#> 1 bar                    17
#> 2 b                      34
#> 3 10pm                   11
#> 4 n                      28
#> 5 hookah                 13
#> 6 downtownorlando        23
#> # ℹ 71 more rows
#> #
#> # A tibble: 76 × 3
#>    from    to correlation
#>   <int> <int>       <dbl>
#> 1     1     2       0.250
#> 2     1     3       0.228
#> 3     1     4       0.204
#> # ℹ 73 more rows
#> 
#> $view
#> # A tibble: 76 × 3
#>    from  to              correlation
#>    <chr> <chr>                 <dbl>
#>  1 bar   b                     0.250
#>  2 bar   10pm                  0.228
#>  3 bar   n                     0.204
#>  4 bar   hookah                0.200
#>  5 bar   downtownorlando       0.200
#>  6 bar   doors                 0.195
#>  7 bar   910                   0.186
#>  8 bar   7200                  0.186
#>  9 bar   r                     0.184
#> 10 bar   downtown              0.174
#> # ℹ 66 more rows
#>