Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d5be260
first web-specific commit
Aug 17, 2016
58546d0
correct address for awsMPDWeb1
akotlar Aug 17, 2016
d5fc8d0
working message passing to front end, working queue server
Aug 18, 2016
965e772
Merge branch 'web' of github.com:akotlar/mpd-perl into web
Aug 18, 2016
a6a8b7e
minor edits for debugging
Aug 20, 2016
2ac7650
check we can use Excel::Writer::XLSX
Aug 20, 2016
327e905
rm unused
Aug 20, 2016
49531bd
add lib for dev
Aug 20, 2016
3a67a16
fixes from dev, updates to hg38 config
Aug 21, 2016
b8cbfd0
updated beanstalk server to support updated config
akotlar Aug 21, 2016
7da1f9e
moved more options into advanced
Aug 23, 2016
7e895f2
store bool values as JSON boolean types
akotlar Aug 23, 2016
6c01114
option description: more concise, provide forward/reverse adapter des…
akotlar Aug 23, 2016
e4a4ce2
fixed JSON::PP::Boolean not being coerced into Moose Bool type
Aug 23, 2016
9e97538
Merge branch 'web' of github.com:akotlar/mpd-perl into web
Aug 23, 2016
992599e
added parameter checking to iterative primer design
Aug 24, 2016
a463891
code tidy
Aug 24, 2016
ab9b124
working on beanstalkd
Aug 27, 2016
e4ba747
Messaging fixes to primer design
akotlar Aug 27, 2016
98c4da1
changed beanstalk queue to reflect seq-web api changes; fixed mispell…
akotlar Sep 2, 2016
5ff8946
forgot to include IO role that handles compression
akotlar Sep 2, 2016
c9220a3
moved mpd-dat folder to something mpd-specific
akotlar Mar 4, 2017
46ae56e
reduced logging
akotlar Feb 17, 2018
1b1a1f7
added line limit to production version; output inputHref for debug
akotlar Feb 17, 2018
8798b23
simple check that bed file is tab delimited
akotlar Feb 28, 2018
73c9789
allow users to submit a header in their bed file
akotlar Mar 16, 2018
dcdc83d
default, web-like config, that separates core and user attributes. ne…
akotlar Oct 23, 2018
0e39f5a
provide relative path to mpd; moosex type constraint file won't searc…
akotlar Oct 23, 2018
547f2bf
working Dockerfile, updated readme to show use
akotlar Oct 23, 2018
735fe9d
no Mouse::Role -> no Moose::Role ; add beanstalk queue-related deps
akotlar Oct 23, 2018
6002fd2
Create beanstalk_queue_server_test.pl
akotlar Apr 5, 2019
b29819b
Merge pull request #2 from akotlar/web
Jmeigs1 Apr 12, 2019
9d55887
Enforce 2000bp size limit on bed file input
Jmeigs1 Apr 15, 2019
7961e6a
Merge master into web
Jmeigs1 Apr 15, 2019
ac0ff22
Remove 2000bp check on covered object post testing.
Jmeigs1 Apr 15, 2019
3365785
Add test and note fatal error capture
Jmeigs1 Apr 22, 2019
220806e
Merge pull request #5 from Jmeigs1/web
Jmeigs1 Apr 22, 2019
869c24d
Change isPCR related moose fields
Jmeigs1 Apr 22, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Based on Jeremiah H. Savage <jeremiahsavage@gmail.com> 's kent image
FROM fedora:28

MAINTAINER Alex Kotlar <akotlar@emory.edu>

ENV PATH="/root/mpd-perl/bin:${PATH}" \
PERL5LIB="/root/perl5/lib/perl5:/root/mpd-perl/lib:${PERL5LIB}"

WORKDIR /root

RUN dnf install -y \
gcc \
gcc-c++ \
libpng-devel \
libuuid-devel \
make \
mariadb-devel \
patch \
perl \
rsync \
unzip \
wget \
which \
git \
openssh-clients

RUN wget http://hgdownload.cse.ucsc.edu/admin/jksrc.v371.zip \
&& unzip -q jksrc.v371.zip \
&& rm jksrc.v371.zip

RUN mkdir -p bin/x86_64 \
&& export MACHTYPE=x86_64 \
&& cd kent/src/ && make libs \
&& cd lib/ && make \
&& cd ../jkOwnLib/ && make \
&& cd ../isPcr/ && make \
&& cd /root && rm -rf kent

WORKDIR /root

RUN git clone https://bitbucket.org/wingolab/mpd-dat \
&& mkdir /root/2bit && cd $_ \
&& wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit \
&& cd /root

WORKDIR /root

ADD ./ /root/mpd-perl/

RUN curl -L https://cpanmin.us | perl - App::cpanminus \
&& mkdir -p /root/perl5/lib/perl \
&& cpanm --local-lib=/root/perl5 local::lib && eval $(perl -I /root/perl5/lib/perl5 -Mlocal::lib) \
&& cd /root/mpd-perl && cpanm MPD.tar.gz && cpanm --installdeps . \
&& git clone https://github.com/wingolab-org/mpd-c /root/mpd-c \
&& cd /root/mpd-c && make

WORKDIR /root/mpd-perl/
Binary file removed MPD-0.001.tar.gz
Binary file not shown.
Binary file added MPD.tar.gz
Binary file not shown.
25 changes: 24 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,29 @@ This package assists in the automation of multiplex primer design. This package

Please cite our [paper](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1453-3) if you use MPD in your work. Thanks.

## Installation
## Install in Docker

Run mpd-perl inside of a Docker instance. Configured for hg38.

```sh
git clone https://github.com/wingolab-org/mpd-perl && cd $_
docker build -t mpd ./

# Run
docker run <docker_args> mpd design.pl <mpd_args>
```

Example running MPD from within Docker:

```sh
# Assuming you have a /mnt/data and wish to mount it as /data inside of the docker container
# that you have ~/data/markers.txt.bed with your targets, and that you wish to write to ~/data/outdir/outfile.txt
# config/hg38.yml comes installed with this docker image, inside of the image
# if you wish, you can pass in your own config
docker run -v ~/data:/data mpd design.pl -b /data/markers.txt.bed -c config/hg38.yml -d ~/data/outdir -o outfile.txt
```

## Manual Installation

- Compiling the c binaries. Follow the instructions here: [mpd-c](http://github.com/wingolab-org/mpd-c).
- Clone the perl MPD package (e.g., `git clone https://github.com/wingolab-org/mpd-perl.git`).
Expand All @@ -18,6 +40,7 @@ Please cite our [paper](https://bmcbioinformatics.biomedcentral.com/articles/10.
- See examples scripts in the `ex` directory or look at the tests, specifically, `t/05-Mpd.t` to see how to build and use the MPD object.

## Optional features

- The MPD package can be made to use the standalone binary for isPcr by Jim Kent. If you are not familiar with isPcr [here is a web version](https://genome.ucsc.edu/cgi-bin/hgPcr) which has details about obtaining the source code to build the stand alone binary.
- If you use the isPcr you will need the 2bit genome of the organism.

Expand Down
273 changes: 273 additions & 0 deletions bin/beantalk_queue_server.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
#!/usr/bin/env perl
# Name: snpfile_annotate_mongo_redis_queue.pl
# Description:
# Date Created: Wed Dec 24
# By: Alex Kotlar
# Requires: Snpfile::AnnotatorBase

#Todo: Handle job expiration (what happens when job:id expired; make sure no other job operations happen, let Node know via sess:?)
#There may be much more performant ways of handling this without loss of reliability; loook at just storing entire message in perl, and relying on decode_json
#Todo: (Probably in Node.js): add failed jobs, and those stuck in processingJobs list for too long, back into job queue, for N attempts (stored in jobs:jobID)

use 5.10.0;
use strict;
use warnings;

use Beanstalk::Client;
use Parallel::ForkManager;
use Cpanel::JSON::XS;
use DDP output => 'stdout';
use Getopt::Long;
use File::Basename;
use Log::Any::Adapter;
use Path::Tiny;
use Try::Tiny;
use Hash::Merge::Simple qw/merge/;
use YAML::XS qw/LoadFile/;

use lib './lib';

use MPD;

# use AnyEvent;
# use AnyEvent::PocketIO::Client;
#use Sys::Info;
#use Sys::Info::Constants qw( :device_cpu )
#for choosing max connections based on available resources

# max of 1 job at a time for now

my $DEBUG = 0;
my $conf = LoadFile("./config/queue.yaml");

# Beanstalk servers will be sharded
my $beanstalkHost = $conf->{beanstalk_host_1};
my $beanstalkPort = $conf->{beanstalk_port_1};

my $configPathBaseDir = "./config/web/";

my $verbose = 1;

my $beanstalk = Beanstalk::Client->new(
{
server => $conf->{beanstalkd}{host} . ':' . $conf->{beanstalkd}{port},
default_tube => $conf->{beanstalkd}{tubes}{annotation}{submission},
connect_timeout => 1,
encoder => sub { encode_json( \@_ ) },
decoder => sub { @{ decode_json(shift) } },
}
);

my $beanstalkEvents = Beanstalk::Client->new(
{
server => $conf->{beanstalkd}{host} . ':' . $conf->{beanstalkd}{port},
default_tube => $conf->{beanstalkd}{tubes}{annotation}{events},
connect_timeout => 1,
encoder => sub { encode_json( \@_ ) },
decoder => sub { @{ decode_json(shift) } },
}
);

# my $pm = Parallel::ForkManager->new(8);

while ( my $job = $beanstalk->reserve ) {

# Parallel ForkManager used only to throttle number of jobs run in parallel
# cannot use run_on_finish with blocking reserves, use try catch instead
# Also using forks helps clean up leaked memory from LMDB_File
# Unfortunately, parallel fork manager doesn't play nicely with try tiny
# prevents anything within the try from executing
my $jobDataHref = decode_json( $job->data );

$beanstalkEvents->put(
{
priority => 0,
data => encode_json {
event => 'started',
queueId => $job->id,
}
}
);

my ( $err, $result ) = handleJob( $jobDataHref, $job->id );

if ($err) {
say "job " . $job->id . " failed with $err";

$beanstalkEvents->put(
{
priority => 0,
data => encode_json(
{
event => 'failed',
queueId => $job->id,
reason => $err,
}
)
}
);

$beanstalk->bury( $job->id );
}
else {
say "completed job with queue id " . $job->id;

# Signal completion before completion actually occurs via delete
# To be conservative; since after delete message is lost
$beanstalkEvents->put(
{
priority => 0,
data => encode_json(
{
event => 'completed',
queueId => $job->id,
results => $result,
}
)
}
);

$beanstalk->delete( $job->id );
}

say "finished";
}

sub handleJob {
my $submittedJob = shift;
my $queueId = shift;

my $failed;

my $inputHref = coerceInputs( $submittedJob, $queueId );

say "inputHref is";
p $inputHref;

try {
my $dir = path( $inputHref->{OutDir} );

if ( !$dir->is_dir ) { $dir->mkpath(); }

my $m = MPD->new_with_config($inputHref);

say "MPD is ";
p $m;

my $result = $m->RunAll();

return ( undef, $result );
}
catch {
my $indexOfConstructor = index( $_, "MPD::" );

if ( ~$indexOfConstructor ) {
$failed = substr( $_, 0, $indexOfConstructor );
}
else {
$failed = $_;
}

return ( $_, undef );
};
}

#Here we may wish to read a json or yaml file containing argument mappings
sub coerceInputs {
my $jobDetailsHref = shift;
my $queueId = shift;

my $inputFilePath = $jobDetailsHref->{inputFilePath};
my $outputDir = $jobDetailsHref->{dirs}{out};
my $outputExt = $jobDetailsHref->{name};

my $configFilePath = getConfigFilePath( $jobDetailsHref->{assembly} );

my $config = LoadFile($configFilePath);

my $coreHref = $config->{Core};

########## Gather basic and advanced options ###################
my $basic = $config->{User}{Basic};
my $advanced = $config->{User}{Advanced};

my %basicOptions = map { $_->{name} => $_->{val} } @$basic;
my %advancedOptions = map { $_->{name} => $_->{val} } @$advanced;

my $userBasic = $jobDetailsHref->{options}{Basic};
my $userAdvanced = $jobDetailsHref->{options}{Advanced};

my %userBasicOptions = map { $_->{name} => $_->{val} } @$userBasic;
my %userAdvancedOptions = map { $_->{name} => $_->{val} } @$userAdvanced;

# right hand precedence;

my $mergedConfig =
merge( $coreHref, \%basicOptions, \%advancedOptions, \%userBasicOptions,
\%userAdvancedOptions );

# JSON::PP::Boolean will not pass moose constraint for Bool
foreach my $val ( values %$mergedConfig ) {
if ( ref $val eq 'JSON::PP::Boolean' ) {
$val = !!$val;
}
}

$mergedConfig->{publisher} = {
server => $conf->{beanstalkd}{host} . ':' . $conf->{beanstalkd}{port},
queue => $conf->{beanstalkd}{tubes}{annotation}{events},
messageBase => {
event => 'progress',
queueId => $queueId,
data => undef,
},
};

$mergedConfig->{configfile} = $configFilePath;
$mergedConfig->{BedFile} = $jobDetailsHref->{inputFilePath};
$mergedConfig->{OutExt} = $jobDetailsHref->{name};
$mergedConfig->{OutDir} = $jobDetailsHref->{dirs}{out};
$mergedConfig->{ProjectName} = $jobDetailsHref->{name};

# need to compress so web can get a single file to download
$mergedConfig->{compress} = 1;

return $mergedConfig;
}

# {
# configfile => $config_file,
# BedFile => $bed_file,
# OutExt => $out_ext,
# OutDir => $dir,
# InitTmMin => 58,
# InitTmMax => 61,
# PoolMin => $poolMin,
# Debug => $verbose,
# IterMax => 2,
# RunIsPcr => 0,
# Act => $act,
# ProjectName => $out_ext,
# FwdAdapter => 'ACACTGACGACATGGTTCTACA',
# RevAdapter => 'TACGGTAGCAGAGACTTGGTCT',
# Offset => 0,
# Randomize => 1,
# a => 1
# }

sub getConfigFilePath {
my $assembly = shift;

my @maybePath = glob( path($configPathBaseDir)->child($assembly . ".y*ml")->stringify );

if ( scalar @maybePath ) {
if ( scalar @maybePath > 1 ) {
#should log
say "\n\nMore than 1 config path found, choosing first";
}

return $maybePath[0];
}

die "\n\nNo config path found for the assembly $assembly. Exiting\n\n";
}
Loading