ParseR combines functionality from the tidytext package and the tidygraph package to enable users to create network visualisations of common terms in a data set.

We’ll play through an example by creating a bi-gram network from a sample of the data set included in the ParseR package.

# Generate a sample
set.seed(1)
example <- ParseR::sprinklr_export %>%
  dplyr::sample_n(1000)

Count the n-grams

Each post will be broken down into bi-grams (i.e. pairs of words) and the 25 most frequent bi-grams will be returned.

counts <- example %>%
  ParseR::count_ngram(text_var = Message, n = 2, top_n = 25)

Note that counts is a list object:

class(counts)
## [1] "list"

It contains two objects:

  1. “view”
  • A human-readable tibble with the most common n-grams.
counts %>%
  purrr::pluck("view")
## # A tibble: 20 × 3
##    word1       word2       ngram_freq
##    <chr>       <chr>            <int>
##  1 hispanic    heritage           503
##  2 heritage    month              253
##  3 heritage    celebration         39
##  4 celebrating hispanic            37
##  5 national    hispanic            33
##  6 beto        orourke             32
##  7 bobby       beto                30
##  8 hispanic    caucus              30
##  9 lacks       hispanic            30
## 10 caucus      refuses             29
## 11 orourke     membership          29
## 12 refuses     bobby               29
## 13 flashback   hispanic            27
## 14 heritage    festival            25
## 15 celebrate   hispanic            20
## 16 heritage    night               18
## 17 month       celebration         14
## 18 annual      hispanic            13
## 19 heritage    day                 13
## 20 celebrates  hispanic            10
  1. “viz”
  • A tbl_graph object that can be used to produce a network visualisation.
counts %>%
  purrr::pluck("viz")
## # A tbl_graph: 20 nodes and 20 edges
## #
## # A directed acyclic simple graph with 1 component
## #
## # Node Data: 20 × 2 (active)
##   word        word_freq
##   <chr>           <int>
## 1 hispanic          638
## 2 heritage          531
## 3 month             275
## 4 day               107
## 5 celebration        96
## 6 celebrating        81
## # … with 14 more rows
## #
## # Edge Data: 20 × 3
##    from    to ngram_freq
##   <int> <int>      <int>
## 1     1     2        503
## 2     2     3        253
## 3     2     5         39
## # … with 17 more rows

Visualise the network

Now we can use the tbl_graph object we generated using count_ngrams() to produce a network visualisation.

counts %>%
  purrr::pluck("viz") %>%
  ParseR::viz_ngram(emphasis = TRUE)