Python package for prepping 454 data for use with AmpliconNoise (Quince et al BMC Bioinformatics 2011, Quince et al Nature Methods 2009):
raw.sff -> anoisetools -> Processed Data
The source for AmpliconNoise is also included.
For flowgram data, we target the original .sff files.
Ensure that your computer meets the minimum requirements. Currently, just Python 2.7, plus the requirements of AmpliconNoise.
Install BioPython if you don't have it. Note that numpy is not required for this project. If you're not planning to use BioPython, you can answer "No" when the BioPython installer prompts you about numpy.
Download and install:
curl -L https://github.com/fhcrc/ampliconnoise/tarball/master | tar xjf - cd fhcrc-ampliconnoise-* python2.7 setup.py install # may require sudo
See Installing Python Modules for more information and options.
Build the AmpliconNoise binaries, and ensure they're present in your
path
Running setup.py installs the anoisetools package, mostly accessible from
the anoise script.
anoise is called with a subcommand:
anoise [subcommand]
Help can be accessed via anoise -h or anoise <subcommand> -h.
For our analyses, initial preprocessing is two steps:
- Split the original
.sfffile into one.sffper barcoded sample - Process each sample using wrappers for PyroNoise and SeqNoise
To split an .sff, use anoise split, providing a file with comma-delimited
base_path_for_output,barcode,primer records, e.g.:
sample1/sample1,ATAG,TAAATGGCAGTCTAGCAGAARAAG
will fill ./sample1/sample1.sff, with all sequences starting with
ATAG, followed by TAAATGGCAGTCTAGCAGAARAAG.
Degenerate primers should be specified as such.
If the barcode map is named barcodes.csv, and the full SFF is G0YK51K01.sff,
one would call [1]:
anoise split barcodes.csv G0YK51K01.sff
For each sample in our analyses, we follow a process along the lines of:
#!/bin/sh
MPIARGS="-np 12"
TMP_DIR="."
# Run PyroNoise
# This cleans flowgrams prior
anoise pyronoise \
--mpi-args "$MPIARGS" \
--temp-dir $TMP_DIR \
sample1.sff
anoise truncate "{barcode}" 400 < sample1-pnoise_cd.fa > sample1-pnoise_trunc.fa
# Run SeqNoise
anoise seqnoise \
--mpi-args "$MPIARGS" \
--stub sample1 \
--temp-dir $TMP_DIR \
sample1-pnoise_trunc.fa \
sample1-pnoise.mapping
Both pyronoise and seqnoise create a temporary direcory for processing.
If running MPI jobs spanning multiple nodes, be sure to set --temp-dir to a
location accessible from all.
| [1] | Note: the split step creates a child process for each sample. |