Skip to content

Changes in tensorqtl_postprocessor.R#406

Merged
gaow merged 1 commit intoStatFunGen:mainfrom
al4225:main
Jul 25, 2025
Merged

Changes in tensorqtl_postprocessor.R#406
gaow merged 1 commit intoStatFunGen:mainfrom
al4225:main

Conversation

@al4225
Copy link
Copy Markdown
Collaborator

@al4225 al4225 commented Jul 25, 2025

Changes in tensorqtl_postprocessor.R

  1. Chromosome and position extraction

    If chrom and pos columns are missing in the cis pair files (e.g., in Xiaoling's data), these values are now automatically extracted from the variant_id column.

  2. Flexible file reading

    Optimized file I/O to handle different formats:

    • Supports both .gz and .parquet formats
    • Automatically detects and parses accordingly
  3. Refined q-value recalculation logic

    • If -qvalue-pattern is missing in cis pair files:
      • Iterate over each chromosome-level cis pair file
      • Recalculate q-values grouped by molecular_id_col
      • If -additional_pvalue_cols is provided, also compute q-values for those columns
      • Save each updated file with the suffix .qvalue_computed.tsv.gz
    • If -pvalue-threshold < 1, filter based on the threshold
    • Use gc() to clean up memory after processing each chromosome
    • Finally, combine the filtered per-chromosome results into a single output file

MWE and message:

sos run /home/al4225/xqtl_protocol_data/xqtl-protocol/code/association_scan/qtl_association_postprocessing.ipynb default
--gene-coordinates /home/al4225/project/resource/look_up_gene_id.tsv
--cwd /home/al4225/xqtl_data/cis_association_xiaoling/ROSMAP/eQTL/DLPFC/
--sub-dir "interaction/age_mwe"
--maf-cutoff 0 --cis-window 0
--pecotmr-path /home/al4225/xqtl_protocol_data/pecotmr
--molecular_id_col phenotype_id
--regional-pattern ".cis_qtl_top_assoc.txt.gz$"
--pvalue-pattern "pval_gi"
--additional_pvalue_cols "pval_g,pval_i"
--qvalue-pattern "qval_gi"
--qtl_pattern "
.cis_qtl_pairs.*.parquet$"
--output_dir /home/al4225/xqtl_data/cis_association_xiaoling/output
--archive-dir /home/al4225/xqtl_data/cis_association_xiaoling/archive
--fdr-threshold 0.25
-s force

INFO: Running default:
Archive setting - input: 'False', converted: FALSE
Extracting 'chrom' and 'pos' from 'variant_id' column
Successfully extracted chrom and pos for 429 variants
Found 'tests_emt' column in regional data, converting to n_variants
workdir is /mnt/vast/hpc/homes/al4225/xqtl_data/cis_association_xiaoling/ROSMAP/eQTL/DLPFC/interaction/age_mwe
Loaded gene coordinates with 60668 entries
Computing q-values for QTL files...
Q-value column not found, computing q-values...
Processing 2 files with main p-value column: pval_gi → qval_gi
Additional p-value columns: pval_g, pval_i → qval_g, qval_i
Will apply p-value filter < 0.05 during processing to save memory
Processing file 1/2: age_int_21.cis_qtl_pairs.21.parquet
Extracting 'chrom' and 'pos' from 'variant_id' column
Successfully extracted chrom and pos for 621590 variants
Computed pval_g → qval_g for age_int_21.cis_qtl_pairs.21.parquet
Computed pval_i → qval_i for age_int_21.cis_qtl_pairs.21.parquet
Saving computed q-values for file: age_int_21.cis_qtl_pairs.21.parquet

Saved complete q-value computed file: age_int_21.cis_qtl_pairs.21.qvalue_computed.tsv.gz
Applied p-value filter: 621590 → 31172 rows
Processing file 2/2: age_int_22.cis_qtl_pairs.22.parquet
Extracting 'chrom' and 'pos' from 'variant_id' column
Successfully extracted chrom and pos for 1647452 variants
Computed pval_g → qval_g for age_int_22.cis_qtl_pairs.22.parquet
Computed pval_i → qval_i for age_int_22.cis_qtl_pairs.22.parquet
Saving computed q-values for file: age_int_22.cis_qtl_pairs.22.parquet
Saved complete q-value computed file: age_int_22.cis_qtl_pairs.22.qvalue_computed.tsv.gz
Applied p-value filter: 1647452 → 81837 rows
Using pre-computed and pre-filtered q-value data
Column 'chrom' already exists, no extraction needed
Combined data from 2 files: 113009 total rows
Applying Bonferroni local adjustment (filter applied: No)...
Original data: 113009 rows (avg n_variants per event: 1513)
Applying both FDR and qvalue global adjustments...
Identifying significant SNPs using Bonferroni adjusted p-value thresholds...
No significant events identified at fdr_bonferroni_min threshold 0.25
Identifying significant SNPs using q-value per event method...
Using Bonferroni-based q_bonferroni_min for significant events in qvalue-based QTL identification
No significant events found using q_bonferroni_min threshold 0.25
There were 50 or more warnings (use warnings() to see the first 50)
INFO: default is completed.
INFO: default output: /home/al4225/xqtl_data/cis_association_xiaoling/output/DLPFC/interaction/age_mwe/DLPFC_multiple_testing_consolidated.rds
INFO: Workflow default (ID=w3ee5c2d25e3f8458) is executed successfully with 1 completed step.

@gaow gaow merged commit 1129fe5 into StatFunGen:main Jul 25, 2025
4 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants