Querying
-Now it is possible to query the index. The command queries treats each
-line of the standard input (or a file if -q is present) as a separate
-query. A query line contains a whitespace-delimited list of tokens.
-These tokens are either interpreted as terms (if --terms is defined,
-which will be used to resolve term IDs) or as term IDs (if --terms is
-not defined). Optionally, a query can contain query ID delimited by a
-colon:
The command queries treats each line of the standard input (or a file if -q
+is present) as a separate query. A query line contains a whitespace-delimited
+list of tokens. These tokens are either interpreted as terms (if --terms is
+defined, which will be used to resolve term IDs) or as term IDs (if --terms is
+not defined). Optionally, a query can contain query ID delimited by a colon:
Q1:one two three
^^ ^^^^^^^^^^^^^
query ID terms
@@ -195,25 +191,51 @@ Querying
-w test_collection.wand \ # metadata file
-q ../test/test_data/queries # query input file
-This performs conjunctive queries (and). In place of and other
-operators can be used (see Query algorithms), and
-also multiple operators separated by colon (and:or:wand), which will
-run multiple passes, one per algorithm.
This performs conjunctive queries (and). In place of and other operators can
+be used (see Query algorithms). To run multiple algorithms,
+provide -a multiple times (for example, -a and -a or -a wand).
The tool outputs a JSON with query execution statistics including mean, median
+(q50), and percentiles (q90, q95, q99) for different per-query
+aggregation (none, min, mean, median, max).
If the WAND file is compressed, append --compressed-wand flag.
Supported algorithms
+The following algorithms are available via the -a option:
-
+
and
+or
+or_freq
+wand
+block_max_wand
+block_max_maxscore
+ranked_and
+block_max_ranked_and
+ranked_or
+maxscore
+ranked_or_taat
+ranked_or_taat_lazy
+
Additional options
+-
+
--runs <N>: Number of runs per query (default: 3)
+-o, --output <FILE>: Output file for per-run query timing data
+--safe: Rerun if not enough results with pruning (requires--thresholds)
+--quantized: Quantized scores
+
Build additional data
-To perform BM25 queries it is necessary to build an additional file -containing the parameters needed to compute the score, such as the -document lengths. The file can be built with the following command:
+To perform BM25 queries it is necessary to build an additional file containing +the parameters needed to compute the score, such as the document lengths. The +file can be built with the following command:
$ ./bin/create_wand_data \
-c ../test/test_data/test_collection \
-o test_collection.wand
-If you want to compress the file append --compress at the end of the
-command. When using variable-sized blocks (for VBMW) via the
---variable-block parameter, you can also specify lambda with the -l <float> or --lambda <float> flags. The value of lambda impacts the
-mean size of the variable blocks that are output. See the VBMW paper
-(listed below) for more details. If using fixed-sized blocks, which is
-the default, you can supply the desired block size using the -b <UINT> or --block-size <UINT> arguments.
If you want to compress the file append --compress at the end of the command.
+When using variable-sized blocks (for VBMW) via the --variable-block
+parameter, you can also specify lambda with the -l <float> or
+--lambda <float> flags. The value of lambda impacts the mean size of the
+variable blocks that are output. See the VBMW paper (listed below) for more
+details. If using fixed-sized blocks, which is the default, you can supply the
+desired block size using the -b <UINT> or --block-size <UINT> arguments.