This workflow lets you index a text dataset once and then instantly count how many times any substring appears in it. It serializes the dataset into a flat binary format, constructs a suffix array (a ...
The libsais library provides fast (see Benchmarks below) linear-time construction of suffix array (SA), generalized suffix array (GSA), longest common prefix (LCP) array, permuted LCP (PLCP) array, ...