Calculate Group Term Coverage (GTC) — calculate

Calculates the percentage of documents within each group that contain specific terms (words or n-grams). This gives us a different view of our groups to Weighted Log-odds.

Usage

calculate_gtc(df, group_var, text_var, ngram_n = 1, top_n = 20)

Arguments

df: A data frame containing the text data
group_var: Name of the grouping variable (quoted or unquoted)
text_var: Name of the text variable (quoted or unquoted)
ngram_n: Length of n-grams to consider (default: 1)
top_n: Number of top terms to return per group (default: 20)

Value

A data frame with group term coverage statistics

Details

GTC should be helpful in checking our assigned names, labels, or descriptions of groups. It is primarily an internal tool, and is unexpected to be useful in communicating results. The function can check n-grams up to `ngram_n=5`, but it's clear that n=5 n-grams should be present in a very low You will most likely want to set this parameter to 1 - for words, or 2 - for bigrams. Bigrams will be more informative than words, but their proportions will also be significantly lower.