Skip to content

Update run fragpipe script to take command line args #25

Open
rjcorb wants to merge 11 commits intomainfrom
rjcorb/24-command-line-args
Open

Update run fragpipe script to take command line args #25
rjcorb wants to merge 11 commits intomainfrom
rjcorb/24-command-line-args

Conversation

@rjcorb
Copy link
Contributor

@rjcorb rjcorb commented Oct 9, 2025

Purpose/implementation Section

What scientific question is your analysis addressing?

This PR updates run_fragpipe.sh as follows:

  • now requires command line arguments for: 1) query fasta file, 2) manifest file, 3) workflow file, and 4) results directory path
  • Add optional argument --cohort that takes values "cptac" or "hope". If specified, the script will attempt to mount associated cavatica project and copy and unzip mzML files. This can also be run with argument --run_subset to only run the first experiment of each cohort dataset.
  • adds step to run blast filtering against canonical uniprot peptides.

What was your approach?

What GitHub issue does your pull request address?

#24

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • The analytical code is documented and contains comments.

@rjcorb
Copy link
Contributor Author

rjcorb commented Oct 9, 2025

I would like to add additional scripts here (or perhaps conditionals within the current shell script) to process HOPE and CPTAC mzML files when querying these cohorts. This will likely require an additional --cohort argument.

@chaodi51
Copy link

chaodi51 commented Dec 1, 2025

@rjcorb It seems Blast is not installed in the Docker:

bash run_fragpipe.sh --query input/custom.fasta --manifest input/PDC000180filesmanifest.fp-manifest --workflow input/PDC000180customworkflow.workflow --res_dir ./test --cohort hope  
Filtering out 100% matches to canonical peptides...
scripts/run_blast_filter.sh: line 36: blastp: command not found

@rjcorb
Copy link
Contributor Author

rjcorb commented Dec 4, 2025

@chaodi51 I have removed the blast filter step so I think this should be good to test now. Can you run the following two test run scripts from root to check if the runs complete?

--query input/custom.fasta \
--manifest input/PDC000180filesmanifest.fp-manifest \
--workflow input/PDC000180customworkflow.workflow \
--res_dir results \
--cohort "cptac" \
--run_subset
bash run_fragpipe.sh \
--query input/custom.fasta \
--manifest input/HOPE-files-manifest.fp-manifest \
--workflow input/HOPEproteome_TMT11workflow.workflow \
--res_dir results \
--cohort "hope" \
--run_subset

@chaodi51
Copy link

chaodi51 commented Dec 11, 2025

It seems the Dockerfile is installing fragPipe-23.1, but the docker image is still fragPipe-22.0 in the registry and the shell script is using fragPipe-22.0. Can you make these consistent? @rjcorb

@rjcorb
Copy link
Contributor Author

rjcorb commented Dec 12, 2025

@chaodi51 the Dockerfile has been updated and I think the issue with running the HOPE cohort should also be resolved.

@chaodi51
Copy link

@chaodi51 the Dockerfile has been updated and I think the issue with running the HOPE cohort should also be resolved.

The script is working now. Could you also take a look at my comments in the script? I don’t quite understand the part about removing canonical peptides. Thanks! @rjcorb

mv *decoys-contam-custom.fasta.fas decoys-contam-custom-canonical.fasta
gunzip -c $Uniprot_canonical | $tools_dir/Philosopher/philosopher-v5.1.1 database --custom $query_fullpath --add /dev/stdin --contam

# Remove canonical peptides annotated to genes in custom fasta

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not remove canonical peptides in the custom.fasta at the first place and then build the database?

if (index($0,g)) {keep=0; break}
}
}
keep' *decoys-contam-custom.fasta.fas > decoys-contam-custom-canonical.fasta

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This awk command seems did not filter anything. I got a 2025-12-11-decoys-contam-custom.fasta.fas file from last step and as input here, which is the same as decoys-contam-custom-canonical.fasta

I'm a bit confused about this. The gene_symbols.txt file contains:

GN=chr8 
GN=chr9 
GN=chrX 
GN=chrY 
GN=fusion|ADCYAP1--LINC01904(23137),ENSG00000272461(11462)|ADCYAP1|chr18 
GN=fusion|ANK2--PTCH1|ANK2|chr4 
GN=fusion|CMTM1--CMTM1|CMTM1|chr16 
GN=fusion|CREBBP--EP400|CREBBP|chr16 

However, the combined FASTA file decoys-contam-custom.fasta.fas would never contain peptide headers matching the patterns GN=chr or GN=fusion*, since the original source files (UP000005640_9606.fasta.gz and customa.fasta) do not include those annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaodi51 I just updated the gene filtering. I think this may have been applicable in a different case previously, but I agree, this was not functioning as expected using the original code. I think, with the updates, you should see several hundred genes filtered out of the decoys-contam-custom-canonical.fasta file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjcorb Now I have an error:

bash run_fragpipe.sh --query input/custom_filtered.fasta  --manifest input/PDC000180filesmanifest.fp-manifest --workflow input/PDC000180customworkflow.workflow --res_dir results --cohort "cptac" --run_subset
Adding decoys and contaminants to FASTA files..
time="21:49:47" level=info msg="Executing Workspace  v5.1.1"
time="21:49:47" level=info msg="Creating workspace"
time="21:49:47" level=warning msg="A meta data folder was found and will not be overwritten. "
time="21:49:47" level=info msg=Done
time="21:49:47" level=info msg="Executing Database  v5.1.1"
time="21:49:47" level=info msg="Generating the target-decoy database"
time="21:49:48" level=info msg="Creating file"
time="21:49:48" level=info msg=Done
awk: line 1: syntax error at or near ,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I think it's an issue with running awk within the docker container. I have fixed the issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awk: cannot open *decoys-contam-custom.fasta.fas (No such file or directory)
You may also need to change *decoys-contam-custom.fasta.fas to *decoys-contam-custom_filtered.fasta.fas since we're now using custom_filtered.fasta instead of custom.fasta from Impact-trail runs? @rjcorb

print line
}' "$query_fullpath" \
| grep -v '^$' \
| sort -u > splice_event_genes.txt

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got these these entries in splice_event_genes.txt that don’t look like gene names:

001337425.1|c.1659G>T|p.K553N|peptMutPos=553
001351982.1|c.342G>T|p.L114F|peptMutPos=114
016878921.1|c.2543G>A|p.R848Q|peptMutPos=848
024301812.1|c.782C>A|p.S261Y|peptMutPos=261
1031insGG|p.H344Rfs*88|peptMutPos=344
36847del|p.A12280Kfs*11|peptMutPos=12280
50+1insT|p.S18Lfs*3|peptMutPos=17
611del|p.E203del|peptMutPos=203
66del|p.L22del|peptMutPos=22
phase0
phase1
phase2
site:1
site:13
site:14
site:15
site:170
site:18
site:189
site:20

@chaodi51
Copy link

chaodi51 commented Feb 5, 2026

These issues have been resolved in the fragpipe_cwl workflow, you can close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants