Skip to contents

ParseR 1.2.1

Clean Text Refactor & Group Term Coverage

clean_text - Refactoring the URL regex and compiling makes a big difference - savings scale with input, ~20x for small datasets, ~200x for large datasets Benchamrking - Streamlined the way that patterns are combined & compiled the regex patterns to get a bit of extra performance - Fixed the emojis argument to not remove latin-accented characters, is now under remove_all_non_ascii in the function to better represent what it does. - Added .onExit to take care of closing parallel sessions if function doesn’t terminate properly (session crashes, user terminates function)

count_ngram - added the ability to prevet tidytext::unnest_tokens() from lowercasing all text by force - fixed removal of stopwords in the nodes data frame and the error that it was previously leading to, so that English stopwords can be kept in easier.

calculate_wlos - set uninformative = TRUE, rather than uninformative = FALSE - added a global_word_frequency data frame to the function when filter_by = ‘association’

group_term_coverage - Introduced functions for calculating and visualising GTC, visit the Distinctness in Groups Vignette