Changelog
Source:NEWS.md
ParseR 1.2.1
Clean Text Refactor & Group Term Coverage
clean_text - Refactoring the URL regex and compiling makes a big difference - savings scale with input, ~20x for small datasets, ~200x for large datasets Benchamrking - Streamlined the way that patterns are combined & compiled the regex patterns to get a bit of extra performance - Fixed the emojis argument to not remove latin-accented characters, is now under remove_all_non_ascii in the function to better represent what it does. - Added .onExit to take care of closing parallel sessions if function doesn’t terminate properly (session crashes, user terminates function)
count_ngram - added the ability to prevet tidytext::unnest_tokens() from lowercasing all text by force - fixed removal of stopwords in the nodes data frame and the error that it was previously leading to, so that English stopwords can be kept in easier.
calculate_wlos - set uninformative = TRUE, rather than uninformative = FALSE - added a global_word_frequency data frame to the function when filter_by = ‘association’
group_term_coverage - Introduced functions for calculating and visualising GTC, visit the Distinctness in Groups Vignette