@@ -280,12 +280,10 @@ and also inefficient, because the allele frequencies for one population
280280gets used in computing many of those values.
281281So, statistics that take a `sample_sets` argument also take an `indexes` argument,
282282which for a statistic that operates on `k` sample sets will be a list of `k`-tuples.
283- If `indexes` is a length `n` list of `k`-tuples,
284- then the output will have `n` columns,
285- and if `indexes[j]` is a tuple `(i0, ..., ik)`,
286- then the `j`-th column will contain values of the statistic computed on
287- `(sample_sets[i0], sample_sets[i1], ..., sample_sets[ik])`.
288-
283+ If `indexes` is a length `n` list of `k`-tuples, then the output will have `n`
284+ columns, so if `indexes[j]` is the tuple `(i0, ..., ik)`
285+ the `j`-th column will contain values of the statistic computed on
286+ `(sample_sets[i0], sample_sets[i1], ..., sample_sets[ik])`.
289287
290288How multiple statistics are handled differs slightly between statistics
291289that operate on single sample sets and multiple sample sets.
@@ -338,14 +336,28 @@ The `indexes` parameter is interpreted in the following way:
338336 statistic selecting the specified sample sets and we remove the last dimension
339337 in the result array as described in the {ref}`sec_stats_output_dimensions` section.
340338
341- - If if is `None` and `sample_sets` contains exactly `k` sample sets,
339+ - If it is `None` and `sample_sets` contains exactly `k` sample sets,
342340 this is equivalent to `indexes=range(k)`. **Note
343341 that we also drop the outer dimension of the result array in this case**.
344342
345- - If is is a list of `k`-tuples (each consisting of integers
343+ - If it is a list of `k`-tuples (each consisting of integers
346344 of integers between `0` and `len(sample_sets) - 1`) of length `n` we
347345 compute `n` statistics based on these selections of sample sets.
348346
347+ For instance, a common requirement in the case of a pairwise (i.e. `k=2`) symmetrical
348+ statistic such as `divergence` is to obtain the statistic between all pairs of
349+ sample sets. This can be done by indexing all possible pairs as follows:
350+
351+ ```{code-cell} ipython3
352+ # Compute all pairwise divergences between diploid individuals. Or e.g. for pairwise
353+ # divergences between each sampled genome, use `sample_sets=[[s] for s in ts.samples()]`
354+ sample_sets=[i.nodes for i in ts.individuals()]
355+ index_all_pairs = np.triu_indices(len(sample_sets)) # ((0,1), (0,2), ...
356+ d = ts.divergence(sample_sets, indexes=tuple(zip(*index_all_pairs)))
357+ pairwise_matrix = np.zeros((len(sample_sets), len(sample_sets)))
358+ pairwise_matrix[index_all_pairs] = d
359+ print(pairwise_matrix) # NB: only upper triangular values filled
360+ ```
349361
350362(sec_stats_windows)=
351363
0 commit comments