The libsais library provides fast (see Benchmarks below) linear-time construction of suffix array (SA), generalized suffix array (GSA), longest common prefix (LCP) array, permuted LCP (PLCP) array, ...
This pipeline performs substring-level exact deduplication on text datasets. Instead of removing entire duplicate documents, it identifies and removes repeated substrings (e.g., boilerplate headers, ...
Abstract: Suffix arrays and trees are important and fundamental string data structures which lie at the foundation of many string algorithms, with important applications in computational biology, text ...
Abstract: The suffix array and Burrows-Wheeler Transform are critical index structures in next generation sequence analysis. The construction of such index structures for mammalian-sized genomes can ...