-
Notifications
You must be signed in to change notification settings - Fork 3
Python bindings Reference
import pykmertools as kt
from pykmertools import utils as ktutilsTop-level exports:
kt.OligoComputerkt.CgrComputerkt.KmerGeneratorkt.MinimiserGeneratorktutils.to_numericktutils.to_acgt
Converts an ACGT k-mer string to numeric forward and reverse-complement values.
fmer, rmer = ktutils.to_numeric("ACGTT") # (111, 27)Notes:
- Raises
ValueErrorwhenlen(kmer) > 32. - Inputs should be biological DNA/RNA letters (
A/C/G/T/U, case-insensitive).
Converts a numeric k-mer to an ACGT string.
ktutils.to_acgt(111, 5) # "ACGTT"KmerGenerator(seq: str, ksize: int)
Creates an iterator over (forward_kmer, reverse_kmer) tuples, encoded as integers.
kg = kt.KmerGenerator("ACGTCC", 3)
for fmer, rmer in kg:
...__iter__ and __next__:
- Iterator output type:
tuple[int, int] - Emits only valid windows.
- Ambiguous bases break continuity; windows spanning them are skipped.
kmer_pos_maps() -> tuple[list[int], dict[int, int], int]
Returns:
-
min_mer_pos_map: length4^k, mapping each numeric k-mer to an index of its canonical (min-complement) representative. -
pos_min_mer_map: reverse mapping from compact index to numeric canonical k-mer. -
min_mer_count: number of canonical k-mers.
MinimiserGenerator(seq: str, wsize: int, msize: int)
Creates an iterator over minimisers in a sliding window.
mg = kt.MinimiserGenerator(seq, wsize=31, msize=7)__iter__ and __next__:
- Iterator output type:
tuple[int, int, int] - Tuple is
(minimiser, start, end). -
startis inclusive andendis exclusive (seq[start:end]). - Ambiguous bases split minimiser runs.
to_acgt(mmer: int) -> str:
- Converts numeric minimiser to its ACGT string using
msize.
OligoComputer(k: int)
Builds an oligonucleotide vectoriser for k-mer size k.
vectorise_one(seq: str, norm: bool = True, mins: bool = True) -> list[float]
- Returns one oligo vector for one sequence.
-
mins=True: canonical min-complement counting. -
mins=False: raw forward-space counting. -
norm=True: divides by total count used by implementation. - Ambiguous windows are skipped.
vectorise_batch(seqs: list[str], norm: bool = True, mins: bool = True) -> list[list[float]]
- Same as
vectorise_one, batched and computed in parallel.
get_header(mins: bool = True) -> list[str]
- Returns ACGT labels matching the vector column order.
Vector lengths:
-
mins=True- odd
k:4^k / 2 - even
k:(4^k + 4^(k/2)) / 2
- odd
-
mins=False- length is
4^k
- length is
CgrComputer(vecsize: int)
Creates a CGR computer with coordinates in a vecsize x vecsize square.
vectorise_one(seq: str) -> list[tuple[float, float]]
- Returns one
(x, y)point per nucleotide. - Raises
ValueError("Bad nucleotide, unable to proceed")on invalid bases.
vectorise_batch(seqs: list[str]) -> list[list[tuple[float, float]]]
- Batch version of
vectorise_one, computed in parallel.
Current bindings do not aggressively validate all constructor arguments. Use:
-
k > 0forOligoComputer/KmerGenerator -
msize > 0andwsize >= msizeforMinimiserGenerator -
k <= 32for numeric conversion interoperability (utils.to_numericenforces this)
If you pass invalid ranges, the underlying Rust implementation may fail.
kmertools - k-mer driven genomics analytics toolkit