Calculate the Cumulative Group Term Coverage (GTC) — calculate_cumulative

For each group in the data frame, this function ranks the terms according to the It then iteratively calculates how many of the group's documents have one of the terms from the top_n terms.

Usage

calculate_cumulative_gtc(df, group_var, text_var, ngram_n = 1, top_n = 10)

Arguments

df: A data frame containing the text data
group_var: Name of the grouping variable (quoted or unquoted)
text_var: Name of the text variable (quoted or unquoted)
ngram_n: Length of n-grams to consider (default: 1)
top_n: Number of top terms to return per group (default: 20)

Value

A data frame with cumulative coverage statistics

Details

If we are using terms to label or name groups, we want to know what This function first looks at the top ranked term within the group, and tells us what It then takes the union of the documents the top 2 terms appear in, and calculates this as a It then takes the union of the documents the top 3 terms appear in, and so on and so forth.

The values ought to be simple to interpret - we are always looking at the If our topic model's description (whether from an LLM or from top terms etc.) says that this topic is all about 'Instagram Trends' but 'Instagram' is only in 10 Maybe it's Social Media Trends more generally, or maybe it's just about photos.