-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Hi all, I was trying to get the latest version of Anacapa to run through the Singularity container and ran into a few lines of code that need fixing to get Anacapa to run. The first is an erroneous (and double) print statement in line 327 in blca_from_bowtie.py, which results in termination of the script and therefore failure of the pipeline.
The second is a problem in local mode in the run_*_blca.sh scripts, which pass -p ${DB}/muscle as the muscle path to blca_from_bowtie.py, where ${DB} points to the Anacapa_db directory. The result is failure on line 369. This should point to the muscle path as specifiied in anacapa_config.sh (related to issue #40 ?)
As I'm not sure whether this pipeline is still being maintained, I've attached all code that is required to get Anacapa running through Singularity in local mode. NOTE: I get slightly different taxonomy annotations in the 12S example, see issue #60.
Download and modify files
#!/bin/bash
# Path to install Anacapa
BASE_PATH="/path/to/preferred/directory"
ANACAPA_PATH="${BASE_PATH}/anacapa"
# Download singularity container
mkdir ${ANACAPA_PATH}
cd ${ANACAPA_PATH}
wget https://zenodo.org/record/2602180/files/anacapa-1.5.0.img?download=1 \
-O anacapa-1.5.0.img
# Test if container can be executed and whether muscle is callable
# singularity shell ${ANACAPA_PATH}/anacapa-1.5.0.img
# muscle
# exit
# Clone the Anacapa repository
git clone https://github.com/limey-bean/Anacapa
# Replace the configuration with one for singularity usage
CONFIG_PATH=${ANACAPA_PATH}/Anacapa/Anacapa_db/scripts/anacapa_config.sh
mv ${CONFIG_PATH} ${CONFIG_PATH/config.sh/config.sh.bak}
wget https://raw.githubusercontent.com/dat-ecosystem-archive/anacapa-container/master/config/anacapa_config.sh \
-O ${CONFIG_PATH}
# Remove a print statement that causes an error in the BLCA procedure
BLCA_PY_PATH=${ANACAPA_PATH}/Anacapa/Anacapa_db/scripts/blca_from_bowtie.py
sed -i.bak '327d' ${BLCA_PY_PATH}
# Alter the path to MUSCLE in the run_blca.sh script - it assumes that
# MUSCLE is callabale from Anacapa_db/muscle, whereas it is called from
# the $PATH variable in the container. This is the default option,
# so the -p option can be removed
BLCA_SH_PATH=${ANACAPA_PATH}/Anacapa/Anacapa_db/scripts/run_blca.sh
BOWTIE_BLCA_SH_PATH=${ANACAPA_PATH}/Anacapa/Anacapa_db/scripts/run_bowtie2_blca.sh
sed -i.bak 's;-p ${DB}/muscle;;g' ${BLCA_SH_PATH}
sed -i.bak 's;-p ${DB}/muscle;;g' ${BOWTIE_BLCA_SH_PATH}
Run the 12S example
#!/bin/bash
# Define paths
BASE_PATH="/path/to/preferred/directory"
ANACAPA_PATH="${BASE_PATH}/anacapa/Anacapa"
CONTAINER_PATH="${BASE_PATH}/anacapa/anacapa-1.5.0.img"
# Unzip the 12S bowtie and tax databases
cd ${ANACAPA_PATH}
unzip ${ANACAPA_PATH}/Example_data/12S_Oct2019.zip 12S_Oct2019/*
mv 12S_Oct2019 ${ANACAPA_PATH}/Anacapa_db/12S
# Test the QC and ASV parsing module
singularity exec -B ${ANACAPA_PATH} ${CONTAINER_PATH} /bin/bash -c "${ANACAPA_PATH}/Anacapa_db/anacapa_QC_dada2.sh \
-i ${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_test_data \
-o ${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_time_test \
-d ${ANACAPA_PATH}/Anacapa_db \
-f ${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_test_data/forward.txt \
-r ${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_test_data/reverse.txt \
-e ${ANACAPA_PATH}/Anacapa_db/metabarcode_loci_min_merge_length.txt \
-a nextera \
-t MiSeq \
-l"
# Compare the output with the expected output. Should not return any lines.
OUT="${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_time_test/12S"
EXP="${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/Anacapa_test_data_expected_output_after_QC_dada2/12S"
git diff --no-index --stat ${EXP} ${OUT}
# Test the taxonomic classification module
singularity exec -B ${ANACAPA_PATH} ${CONTAINER_PATH} /bin/bash -c \
"${ANACAPA_PATH}/Anacapa_db/anacapa_classifier.sh \
-o ${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_time_test \
-d ${ANACAPA_PATH}/Anacapa_db \
-l"
# Compare the output with the expected output
OUT="${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_time_test/12S"
EXP="${ANACAPA_PATH}/Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/Anacapa_test_data_expected_output_after_classifier/12S"
git diff --no-index --stat ${EXP} ${OUT}
Edit: added reference to issue #60.