Update run fragpipe script to take command line args #25
Update run fragpipe script to take command line args #25
Conversation
|
I would like to add additional scripts here (or perhaps conditionals within the current shell script) to process HOPE and CPTAC mzML files when querying these cohorts. This will likely require an additional |
|
@rjcorb It seems |
|
@chaodi51 I have removed the blast filter step so I think this should be good to test now. Can you run the following two test run scripts from root to check if the runs complete? --query input/custom.fasta \
--manifest input/PDC000180filesmanifest.fp-manifest \
--workflow input/PDC000180customworkflow.workflow \
--res_dir results \
--cohort "cptac" \
--run_subset |
|
It seems the Dockerfile is installing fragPipe-23.1, but the docker image is still fragPipe-22.0 in the registry and the shell script is using fragPipe-22.0. Can you make these consistent? @rjcorb |
|
@chaodi51 the Dockerfile has been updated and I think the issue with running the HOPE cohort should also be resolved. |
| mv *decoys-contam-custom.fasta.fas decoys-contam-custom-canonical.fasta | ||
| gunzip -c $Uniprot_canonical | $tools_dir/Philosopher/philosopher-v5.1.1 database --custom $query_fullpath --add /dev/stdin --contam | ||
|
|
||
| # Remove canonical peptides annotated to genes in custom fasta |
There was a problem hiding this comment.
Why not remove canonical peptides in the custom.fasta at the first place and then build the database?
| if (index($0,g)) {keep=0; break} | ||
| } | ||
| } | ||
| keep' *decoys-contam-custom.fasta.fas > decoys-contam-custom-canonical.fasta |
There was a problem hiding this comment.
This awk command seems did not filter anything. I got a 2025-12-11-decoys-contam-custom.fasta.fas file from last step and as input here, which is the same as decoys-contam-custom-canonical.fasta
I'm a bit confused about this. The gene_symbols.txt file contains:
GN=chr8
GN=chr9
GN=chrX
GN=chrY
GN=fusion|ADCYAP1--LINC01904(23137),ENSG00000272461(11462)|ADCYAP1|chr18
GN=fusion|ANK2--PTCH1|ANK2|chr4
GN=fusion|CMTM1--CMTM1|CMTM1|chr16
GN=fusion|CREBBP--EP400|CREBBP|chr16
However, the combined FASTA file decoys-contam-custom.fasta.fas would never contain peptide headers matching the patterns GN=chr or GN=fusion*, since the original source files (UP000005640_9606.fasta.gz and customa.fasta) do not include those annotations.
There was a problem hiding this comment.
@chaodi51 I just updated the gene filtering. I think this may have been applicable in a different case previously, but I agree, this was not functioning as expected using the original code. I think, with the updates, you should see several hundred genes filtered out of the decoys-contam-custom-canonical.fasta file.
There was a problem hiding this comment.
@rjcorb Now I have an error:
bash run_fragpipe.sh --query input/custom_filtered.fasta --manifest input/PDC000180filesmanifest.fp-manifest --workflow input/PDC000180customworkflow.workflow --res_dir results --cohort "cptac" --run_subset
Adding decoys and contaminants to FASTA files..
time="21:49:47" level=info msg="Executing Workspace v5.1.1"
time="21:49:47" level=info msg="Creating workspace"
time="21:49:47" level=warning msg="A meta data folder was found and will not be overwritten. "
time="21:49:47" level=info msg=Done
time="21:49:47" level=info msg="Executing Database v5.1.1"
time="21:49:47" level=info msg="Generating the target-decoy database"
time="21:49:48" level=info msg="Creating file"
time="21:49:48" level=info msg=Done
awk: line 1: syntax error at or near ,
There was a problem hiding this comment.
ah I think it's an issue with running awk within the docker container. I have fixed the issue.
There was a problem hiding this comment.
awk: cannot open *decoys-contam-custom.fasta.fas (No such file or directory)
You may also need to change *decoys-contam-custom.fasta.fas to *decoys-contam-custom_filtered.fasta.fas since we're now using custom_filtered.fasta instead of custom.fasta from Impact-trail runs? @rjcorb
| print line | ||
| }' "$query_fullpath" \ | ||
| | grep -v '^$' \ | ||
| | sort -u > splice_event_genes.txt |
There was a problem hiding this comment.
I got these these entries in splice_event_genes.txt that don’t look like gene names:
001337425.1|c.1659G>T|p.K553N|peptMutPos=553
001351982.1|c.342G>T|p.L114F|peptMutPos=114
016878921.1|c.2543G>A|p.R848Q|peptMutPos=848
024301812.1|c.782C>A|p.S261Y|peptMutPos=261
1031insGG|p.H344Rfs*88|peptMutPos=344
36847del|p.A12280Kfs*11|peptMutPos=12280
50+1insT|p.S18Lfs*3|peptMutPos=17
611del|p.E203del|peptMutPos=203
66del|p.L22del|peptMutPos=22
phase0
phase1
phase2
site:1
site:13
site:14
site:15
site:170
site:18
site:189
site:20
|
These issues have been resolved in the fragpipe_cwl workflow, you can close it. |
Purpose/implementation Section
What scientific question is your analysis addressing?
This PR updates
run_fragpipe.shas follows:--cohortthat takes values "cptac" or "hope". If specified, the script will attempt to mount associated cavatica project and copy and unzip mzML files. This can also be run with argument--run_subsetto only run the first experiment of each cohort dataset.What was your approach?
What GitHub issue does your pull request address?
#24
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is there anything that you want to discuss further?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Results
What types of results are included (e.g., table, figure)?
What is your summary of the results?
Reproducibility Checklist
Documentation Checklist
READMEand it is up to date.