A C++ implementation of the dynamic r-index and dynamic FM-index — space-efficient data structures supporting pattern matching queries and dynamic text updates.
- Quick Start
- Overview
- Requirements
- Installation
- Usage
- API Documentation
- Dependencies
- License
- References
# 1. Install SDSL library (if not already installed)
git clone https://github.com/simongog/sdsl-lite.git
cd sdsl-lite
./install.sh ~/local
cd ..
# 2. Clone and setup
git clone https://github.com/TNishimoto/dynamic_r_index.git
cd dynamic_r_index
git submodule update --init --recursive
# 3. Build
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release \
-DSDSL_LIBRARY_DIR=~/local/lib \
-DSDSL_INCLUDE_DIR=~/local/include
make
# 4. Try it out
./build_r_index -i ../examples/ab.txt -o ab.dri
./query -i ab.dri -q ../examples/command.tsv -w result.log
cat result.logNote
If SDSL is already installed elsewhere, skip step 1 and adjust the paths in step 3. See Installation for alternative SDSL locations (e.g., Homebrew).
The dynamic r-index is a dynamic version of the r-index, supporting count and locate queries on an input string
For more details on the r-index, see the r-index repository.
| Operation | Time Complexity | Description |
|---|---|---|
| build_from_BWT( |
Build index from BWT |
|
| insert_string( |
avg. |
Insert string |
| delete_string( |
avg. |
Delete |
| count_query( |
Count occurrences of pattern |
|
| locate_query( |
Find all positions of pattern |
|
| backward_search( |
Return the SA-interval of |
Note
-
$σ$ : alphabet size of$T$ -
$L_{avg}$ : average LCP value in the LCP array of$T$ - Worst-case time for insert/delete:
$O((m + L_{max}) \log σ \log n)$
The dynamic FM-index, proposed by Salson et al., is a dynamic version of the FM-index.
It can be stored in
For more details, see the original implementation.
| Operation | Time Complexity | Description |
|---|---|---|
| build_from_BWT( |
Build index from BWT |
|
| insert_string( |
avg. |
Insert string |
| delete_string( |
avg. |
Delete |
| count_query( |
Count occurrences of pattern |
|
| locate_query( |
Find all positions of pattern |
|
| backward_search( |
Return the SA-interval of |
| Feature | Dynamic r-index | Dynamic FM-index |
|---|---|---|
| Space | ||
| Locate query | ||
| Best for | Highly repetitive texts | General texts |
| Trade-off | Smaller for repetitive data | Adjustable via parameter |
Recommendation:
- Use r-index for highly repetitive texts (e.g., genome collections, versioned documents)
- Use FM-index for general-purpose indexing with tunable space/time trade-off
Important
These time complexities are slightly larger than described in the original papers due to the use of B-trees for performance.
- C++ Compiler: C++17 or later (GCC 7+, Clang 5+, MSVC 2017+)
- CMake: 3.10 or later
- Make: GNU Make
- SDSL Library: sdsl-lite
git clone https://github.com/TNishimoto/dynamic_r_index.git
cd dynamic_r_index
git submodule update --init --recursiveIf you haven't installed SDSL yet:
git clone https://github.com/simongog/sdsl-lite.git
cd sdsl-lite
./install.sh ~/local
cd ..This installs the library to ~/local/lib and headers to ~/local/include.
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release \
-DSDSL_LIBRARY_DIR=~/local/lib \
-DSDSL_INCLUDE_DIR=~/local/include
makeTip
Alternative SDSL locations:
- Homebrew (Apple Silicon):
-DSDSL_LIBRARY_DIR=/opt/homebrew/lib -DSDSL_INCLUDE_DIR=/opt/homebrew/include - Homebrew (Intel Mac):
-DSDSL_LIBRARY_DIR=/usr/local/lib -DSDSL_INCLUDE_DIR=/usr/local/include
Generated executables:
| Executable | Description |
|---|---|
build_bwt |
Compute BWT from a text file |
build_r_index |
Build dynamic r-index from text or BWT |
build_fm_index |
Build dynamic FM-index from text or BWT |
print_index |
Display index information and extract text/BWT |
query |
Execute queries from a command file |
Computes and outputs the BWT of a given file.
Options:
-i, --input_file_path Input text file path
-o, --output_file_path Output BWT file path
-c, --null_terminated_string End-of-text character (default: \0)
-e, --detailed_message_flag Print detailed messages (0 or 1)
-?, --help Show help
Note
The end-of-text character is appended to the input and must be lexicographically smallest.
Example:
./build_bwt -i ../examples/ab.txt -o ab.bwt -c "$"=============RESULT===============
Input File: ../examples/ab.txt
Text Length: 26
Text: aaaaABAaaaaABAaaaABAaaaab$
BWT: baaaBBBAAAaaaaaaaaAA$Aaaaa
Number of BWT Runs: 9
==================================
Builds the dynamic r-index from a text file or BWT.
Options:
-i, --input_file_path Input file path (text or BWT)
-o, --output_file_path Output index file path (.dri)
-c, --null_terminated_string End-of-text character (default: \0)
-u, --is_bwt Set to 1 if input is BWT
-?, --help Show help
Example:
# From text
./build_r_index -i ../examples/ab.txt -o ab.dri
# From BWT
./build_bwt -i ../examples/ab.txt -o ab.bwt -c "$"
./build_r_index -i ab.bwt -o ab.dri -u 1Displays r-index information and optionally extracts the text/BWT.
Options:
-i, --input_file_path Input index file path (.dri)
-o, --output_text_path Output text file path (optional)
-b, --output_bwt_path Output BWT file path (optional)
-?, --help Show help
Example:
./print_index -i ab.driStatistics(DynamicRIndex):
Text length: 26
Text: aaaaABAaaaaABAaaaABAaaaab$
Alphabet size: 5
Alphabet: [$, A, B, a, b]
BWT: baaaBBBAAAaaaaaaaaAA$Aaaaa
The number of runs: 9
Executes queries on a dynamic r-index from a TSV command file.
Options:
-i, --input_index_path Input index file path (.dri)
-q, --command_file Command file path (TSV format)
-w, --log_file Output log file path
-o, --output_index_path Save updated index (optional)
-?, --help Show help
Supported Commands:
| Command | Param 1 | Param 2 | Description |
|---|---|---|---|
COUNT |
P | - | Count occurrences of pattern P |
LOCATE |
P | - | Find all positions of pattern P |
LOCATE_SUM |
P | - | Sum of all occurrence positions |
INSERT |
i | P | Insert string P at position i |
DELETE |
i | m | Delete m characters starting at position i |
PRINT |
- | - | Print current text and BWT |
Example command file (command.tsv):
PRINT
COUNT ABA
LOCATE ABA
INSERT 3 bbbb
PRINT
DELETE 3 5
PRINT
Example:
./build_r_index -i ../examples/ab.txt -o ab.dri
./query -i ab.dri -q ../examples/command.tsv -w result.log
cat result.logSample output (result.log):
0 PRINT Text: aaaaABAaaaaABAaaaABAaaaab$ BWT: baaaBBBAAAaaaaaaaaAA$Aaaaa
1 COUNT The number of occurrences of the given pattern: 3 Time (microseconds): 51
2 LOCATE The occurrences of the given pattern: [11, 4, 17] Time (microseconds): 57
3 INSERT Reorder count: 3 Time (microseconds): 25
4 PRINT Text: aaabbbbaABAaaaaABAaaaABAaaaab$ BWT: baaaBBBAAAabaaaaAAAa$aaaaabbba
...
Builds the dynamic FM-index from a text file or BWT.
Options:
-i, --input_file_path Input file path (text or BWT)
-o, --output_file_path Output index file path (.dfmi)
-c, --null_terminated_string End-of-text character (default: \0)
-u, --is_bwt Set to 1 if input is BWT
-s, --sampling_interval SA sampling interval (default: 32)
-?, --help Show help
Example:
# Default sampling interval (32)
./build_fm_index -i ../examples/ab.txt -o ab.dfmi
# Custom sampling interval
./build_fm_index -i ../examples/ab.txt -o ab.dfmi -s 8Displays FM-index information and optionally extracts the text/BWT.
./print_index -i ab.dfmiStatistics(DynamicFMIndex):
Text length: 26
Text: aaaaABAaaaaABAaaaABAaaaab$
Alphabet size: 5
Alphabet: [$, A, B, a, b]
BWT: baaaBBBAAAaaaaaaaaAA$Aaaaa
Sampling interval: 8
The number of sampled values: 5
Executes queries on a dynamic FM-index from a TSV command file.
The command format is identical to the r-index version.
Example:
./build_fm_index -i ../examples/ab.txt -o ab.dfmi -s 8
./query -i ab.dfmi -q ../examples/command.tsv -w result.log
cat result.logNote
The query command automatically detects whether the input is a dynamic r-index (.dri) or dynamic FM-index (.dfmi).
Doxygen Documentation (in preparation)
This project uses the following libraries:
- SDSL — Succinct Data Structure Library
- STool — String utilities
- B-tree_plus_alpha — B-tree implementation
This project is licensed under the MIT License. See the LICENSE file for details.
If you use the library, please cite the following paper:
@misc{nishimoto2025dynamicrindexupdatableselfindex,
title={Dynamic r-index: An Updatable Self-Index in LCP-bounded Time},
author={Takaaki Nishimoto and Yasuo Tabei},
year={2025},
eprint={2504.19482},
archivePrefix={arXiv},
primaryClass={cs.DS},
url={https://arxiv.org/abs/2504.19482},
}
- T. Nishimoto, "Dynamic r-index: An Updatable Self-Index in LCP-bounded Time," arXiv:2504.19482, 2025.
- T. Gagie, G. Navarro, N. Prezza, "Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space," ACM DL, 2020.
- M. Salson et al., "Dynamic extended suffix arrays," ScienceDirect, 2010.