Skip to content

Python bindings Reference

Anuradha Wickramarachchi edited this page Mar 28, 2026 · 1 revision

Python bindings reference (pykmertools)

Module summary

import pykmertools as kt
from pykmertools import utils as ktutils

Top-level exports:

  • kt.OligoComputer
  • kt.CgrComputer
  • kt.KmerGenerator
  • kt.MinimiserGenerator
  • ktutils.to_numeric
  • ktutils.to_acgt

utils

to_numeric(kmer: str) -> tuple[int, int]

Converts an ACGT k-mer string to numeric forward and reverse-complement values.

fmer, rmer = ktutils.to_numeric("ACGTT")  # (111, 27)

Notes:

  • Raises ValueError when len(kmer) > 32.
  • Inputs should be biological DNA/RNA letters (A/C/G/T/U, case-insensitive).

to_acgt(kmer: int, ksize: int) -> str

Converts a numeric k-mer to an ACGT string.

ktutils.to_acgt(111, 5)  # "ACGTT"

KmerGenerator

Constructor

KmerGenerator(seq: str, ksize: int)

Creates an iterator over (forward_kmer, reverse_kmer) tuples, encoded as integers.

kg = kt.KmerGenerator("ACGTCC", 3)
for fmer, rmer in kg:
    ...

Methods

__iter__ and __next__:

  • Iterator output type: tuple[int, int]
  • Emits only valid windows.
  • Ambiguous bases break continuity; windows spanning them are skipped.

kmer_pos_maps() -> tuple[list[int], dict[int, int], int]

Returns:

  • min_mer_pos_map: length 4^k, mapping each numeric k-mer to an index of its canonical (min-complement) representative.
  • pos_min_mer_map: reverse mapping from compact index to numeric canonical k-mer.
  • min_mer_count: number of canonical k-mers.

MinimiserGenerator

Constructor

MinimiserGenerator(seq: str, wsize: int, msize: int)

Creates an iterator over minimisers in a sliding window.

mg = kt.MinimiserGenerator(seq, wsize=31, msize=7)

Methods

__iter__ and __next__:

  • Iterator output type: tuple[int, int, int]
  • Tuple is (minimiser, start, end).
  • start is inclusive and end is exclusive (seq[start:end]).
  • Ambiguous bases split minimiser runs.

to_acgt(mmer: int) -> str:

  • Converts numeric minimiser to its ACGT string using msize.

OligoComputer

Constructor

OligoComputer(k: int)

Builds an oligonucleotide vectoriser for k-mer size k.

Methods

vectorise_one(seq: str, norm: bool = True, mins: bool = True) -> list[float]

  • Returns one oligo vector for one sequence.
  • mins=True: canonical min-complement counting.
  • mins=False: raw forward-space counting.
  • norm=True: divides by total count used by implementation.
  • Ambiguous windows are skipped.

vectorise_batch(seqs: list[str], norm: bool = True, mins: bool = True) -> list[list[float]]

  • Same as vectorise_one, batched and computed in parallel.

get_header(mins: bool = True) -> list[str]

  • Returns ACGT labels matching the vector column order.

Vector lengths:

  • mins=True
    • odd k: 4^k / 2
    • even k: (4^k + 4^(k/2)) / 2
  • mins=False
    • length is 4^k

CgrComputer

Constructor

CgrComputer(vecsize: int)

Creates a CGR computer with coordinates in a vecsize x vecsize square.

Methods

vectorise_one(seq: str) -> list[tuple[float, float]]

  • Returns one (x, y) point per nucleotide.
  • Raises ValueError("Bad nucleotide, unable to proceed") on invalid bases.

vectorise_batch(seqs: list[str]) -> list[list[tuple[float, float]]]

  • Batch version of vectorise_one, computed in parallel.

Parameter expectations and limits

Current bindings do not aggressively validate all constructor arguments. Use:

  • k > 0 for OligoComputer / KmerGenerator
  • msize > 0 and wsize >= msize for MinimiserGenerator
  • k <= 32 for numeric conversion interoperability (utils.to_numeric enforces this)

If you pass invalid ranges, the underlying Rust implementation may fail.

Clone this wiki locally