Adding anndata concatenation with bioalpha to main TileDB branch #116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

ariannalandini wants to merge 11 commits into tiledb_locusbreaker from bioalpha_ad_concat

README.md

-Original file line number
+Diff line change
@@ Expand Up @@
     ### Example: Run Only Colocalization (with existing `.h5ad`)
     ```bash
-    nextflow run Biostatistics-Unit-HT/Flanders -r 1.0    -profile [docker|singularity|conda]    --coloc_input /path/to/finemapping_output.h5ad    --run_colocalization true    --coloc_id my_coloc_run    -w ./work    -resume
+    nextflow run Biostatistics-Unit-HT/Flanders -r 1.0    -profile [docker|singularity|conda]    --coloc_h5ad_input /path/to/finemapping_output.h5ad    --run_colocalization true    --coloc_id my_coloc_run    -w ./work    -resume
     ```
     ### Quick run with example dataset
@@ Expand Down Expand Up @@
     </br>
     </br>
+    #### 4.1 Getting external credible sets from Open Targets
+      - It is possible to create an Anndata using credible-sets already genereted from Open Targets. To do that follow these steps
+        - Download the Open Targets [credible-set](https://platform.opentargets.org/downloads/credible_set/access) and [study](https://platform.opentargets.org/downloads/study/access).
+        - Create a conda environment with the following packages: anndata, polars, pyarrow, scipy, math, argparse
+        - Run the python script in the bin folder called parse_credible_sets_refined.py following the instructions from the arguments
+      </br>
+      </br>
     ### Step 2: Colocalization analysis
     #### Inputs
@@ Expand Down Expand Up / @@ -125,7 +133,6 @@ iCOLOC approach allows to: @@
     </br>
     </br>
     ## 👩‍🔬 Credits
     Developed by the Biostatistics and Genome Analysis Units at [Human Technopole](https://humantechnopole.it/en/)<br>
     -  [Arianna Landini](mailto:arianna.landini@fht.org)<br>
@@ Expand Down @@

assets/OT_mapping_credset_table.txt

-Original file line number
+Diff line change
@@ -0,0 +1,6 @@
+    project,finemapping,data,description
+    GCST,SuSiE-inf,gwas,GWAS catalog data
+    GTEx,SuSie-inf,eqtl,eQTLs from GTEx
+    UKB_PPP_EUR,SuSiE-inf,pqtl,Data from UKB-PPP considering only Europeans
+    FINNGEN_R12,SuSie,gwas,Data from Finngen consortium

bin/funs_locus_breaker_cojo_finemap_all_at_once.R

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -583,7 +583,7 @@ expand_cs <- function(fitted) {
  
    #' cs_idx <- which(lbf_vec > 2)

    #' find_threshold(lbf_vec, original_idx = cs_idx)

    find_threshold <- function(vector, original_idx = NULL) {

      if (var(vector) != 0) {

      if ( var(vector) != 0 && !is.na(var(vector)) ) {

        thres <- optim(

          fn = function(th) thresholder(th, vector),

          p = 0,

    @@ -716,7 +716,7 @@ from_susie_to_anndata <- function(finemap_list=NULL, cs_indices=NULL, analysis_i
  
        lABFs_list <- c(lABFs_list, lABFs)

        min_res_labf_vec <- c(min_res_labf_vec, min_res_labf)

        top_pvalue_vec <- c(top_pvalue_vec, top_pvalue)

        purity_df <- rbind(purity_df, finemap$sets$purity |> dplyr::mutate(logsum.logABF=logsum.logABF)) 

        purity_df <- rbind(purity_df, finemap$sets$purity |> dplyr::mutate(logsum.logABF=logsum.logABF, coverage = finemap$sets$requested_coverage)) 

        comment_section <- c(comment_section, rep(finemap$comment_section, length(finemap$sets$cs_index)))

        comment_section[is.na(comment_section)] <- "NaN"

        metadata_df <- rbind(metadata_df, finemap$metadata)

    @@ -731,7 +731,7 @@ from_susie_to_anndata <- function(finemap_list=NULL, cs_indices=NULL, analysis_i
  
        metadata_df$L_index,

        sep = "::"

      )

    # Prepare `obs_df` metadata

      obs_df <- metadata_df |> dplyr::select(-L_index) |> dplyr::rename(type=TYPE)

      obs_df$chr <- paste0("chr", obs_df$chr)

    @@ -767,6 +767,11 @@ from_susie_to_anndata <- function(finemap_list=NULL, cs_indices=NULL, analysis_i
  
        )

      }

      # Filter out values for SNPs out of the credible set right before matrices creation               

      for (cs in seq_along(lABFs_list)) {

        lABFs_list[[cs]] <- lABFs_list[[cs]] |> dplyr::filter(is_cs)

      }

      lABF_matrix_sparse <- create_sparse_matrix("lABF")

      beta_matrix_sparse <- create_sparse_matrix("bC")

      se_matrix_sparse <- create_sparse_matrix("bC_se")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding anndata concatenation with bioalpha to main TileDB branch #116

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Adding anndata concatenation with bioalpha to main TileDB branch #116

Are you sure you want to change the base?

Uh oh!

Adding anndata concatenation with bioalpha to main TileDB branch #116

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!