R/calculate_corr.R
calculate_corr.Rd
Generate pairwise correlations for a vector of terms of interest.
calculate_corr(
df,
text_var,
terms,
min_freq = 10,
corr_limits = c(-1, 1),
n_corr = 75,
hashtags = FALSE,
mentions = FALSE,
clean_text = FALSE
)
A dataframe where each row is a separate post.
The variable containing the text which you want to explore.
The terms of interest. You can use multi-word phrases.
The minimum number of times a term must be observed to be considered.
Numerical lower and upper bounds for correlations.
The number of correlations to include (begins with the most positive within the range specified in corr_limits).
Should hashtags be included?
Should mentions be included?
Should the text variable be cleaned?
A list containing a summary table and a tidygraph object suitable for a network visualisation.
spinklr_export <- ParseR::sprinklr_export
calculate_corr(
df = sprinklr_export,
text_var = Message,
terms = c("foo", "bar", "I'm looking for"),
min_freq = 10,
corr_limits = c(-1, 1),
n_corr = 75,
hashtags = TRUE,
mentions = FALSE
)
#> $viz
#> # A tbl_graph: 77 nodes and 76 edges
#> #
#> # An unrooted tree
#> #
#> # A tibble: 77 × 2
#> word term_freq
#> <chr> <int>
#> 1 bar 17
#> 2 b 34
#> 3 10pm 11
#> 4 n 28
#> 5 hookah 13
#> 6 downtownorlando 23
#> # ℹ 71 more rows
#> #
#> # A tibble: 76 × 3
#> from to correlation
#> <int> <int> <dbl>
#> 1 1 2 0.250
#> 2 1 3 0.228
#> 3 1 4 0.204
#> # ℹ 73 more rows
#>
#> $view
#> # A tibble: 76 × 3
#> from to correlation
#> <chr> <chr> <dbl>
#> 1 bar b 0.250
#> 2 bar 10pm 0.228
#> 3 bar n 0.204
#> 4 bar hookah 0.200
#> 5 bar downtownorlando 0.200
#> 6 bar doors 0.195
#> 7 bar 910 0.186
#> 8 bar 7200 0.186
#> 9 bar r 0.184
#> 10 bar downtown 0.174
#> # ℹ 66 more rows
#>