+
+#update
-A proposal by project instigator 1, project instigator 2, etc.
## Abstract
+The Human Genome Project has laid bare the DNA sequence of the entire human genome, revealing the blueprint for tens of thousands of genes involved in a plethora of biological process and pathways.
+In addition to this (coding) part of the human genome, DNA contains millions of non-coding elements involved in the regulation of said genes.
+
+Such regulatory elements control the expression levels of genes, in a way that is, at least in part, encoded in their primary genomic sequence.
+Many human diseases and disorders are the result of genes being misregulated.
+As such, being able to control the behavior of such elements, and thus their effect on gene expression, offers the tantalizing opportunity of correcting disease-related misregulation.
+
+Although such cellular programming should in principle be possible through changing the sequence of regulatory elements, the rules for doing so are largely unknown.
+A number of experimental efforts have been guided by preconceived notions and assumptions about what constitutes a regulatory element, essentialy resulting in a "trial and error" approach.
+
+Here, we instead propose to use a large-scale data-driven approach to learn and apply the rules underlying regulatory element sequences, applying the latest generative modelling techniques.
-Provide brief outline motivating the project. How would it positively impact biological research? What is the hypothesis behind it? No need to discuss datasets or models yet, we will do that later. Focus on the grand picture and \textit{why} the community should care about it.
## Introduction and Prior Work
+The goal of this project is to investigate the application and adaptation of recent diffusion models (see https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ for a nice intro and references) to genomics data. Diffusion models are powerful models that have been used for image generation (e.g. stable diffusion, DALL-E), music generation (recent version of the magenta project) with outstanding results.
+A particular model formulation called "guided" diffusion allows to bias the generative process toward a particular direction if during training a text or continuous/discrete labels are provided. This allows the creation of "AI artists" that, based on a text prompt, can create beautiful and complex images (a lot of examples here: https://www.reddit.com/r/StableDiffusion/).
+
+Some groups have reported the possibility of generating synthetic DNA regulatory elements in a context-dependent system, for example, cell-specific enhancers.
+(https://elifesciences.org/articles/41279 ,
+ https://www.biorxiv.org/content/10.1101/2022.07.26.501466v1)
+
+
+### Step 1: generative model
+
+We propose to develop models that can generate cell type specific or context specific DNA-sequences with certain regulatory properties based on an input text prompt.
+For example:
+
+ - "A sequence that will correspond to open (or closed) chromatin in cell type X"
+
+ - "A sequence that will activate a gene to its maximum expression level in cell type X"
+
+ - "A sequence active in cell type X that contains binding site(s) for the transcription factor Y"
+
+ - "A sequence that activates a gene in liver and heart, but not in brain"
+
+
+### Step 2: extensions and improvements
+
+Beyond individual regulatory elements, so called "Locus Control Regions" are known to harbour multiple regulatory elements in specific configurations, working in concert to result in more complex regulatory rulesets. Having parallels with "collaging" approaches, in which multiple stable diffusion steps are combined into one final (graphical) output, we want to apply this notion to DNA sequences with the goal of designing larger regulatory loci. This is a particularly exciting and, to our knowledge, hitherto unexplored direction.
+
+Besides synthetic DNA creations, a diffusion model can help understand and interpret regulatory sequence element components and for instance be a valuable tool for studying single nucleotide variations (https://www.biorxiv.org/content/10.1101/2022.08.22.504706v1) and evolution.
+(https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1502-5)
+
+
+Taken together, we believe our work can accelerate our understanding of the intrinsic properties of DNA-regulatory sequence in normal development and different diseases.
+
+## Proposed framework
+
+For this work we propose to build a Bit Diffusion model based on the formulation proposed by Chen, Zhang and Hinton https://arxiv.org/abs/2208.04202. This model is a generic approach for generating discrete data with continuous diffusion models. An implementation of this approach already exists, and this is a potential code base to build upon:
+
+https://github.com/lucidrains/bit-diffusion
+
+## Tasks and potential roadmap:
+ - Collecting genomic datasets
+ - Implementing the guided diffusion based on the code base
+ - Thinking about the best encoding of biological information for the guided diffusion (e.g. cell type: K562, very strong activating sequence for chromatin, or cell type: GM12878, very open chromatin)
+ - Plans for validation based on existing datasets or how to perform new biological experiments (we need to think about potential active learning strategies).
-Provide a short (preferably beginner friendly) introduction to the project and a brief outline of the literature most relevant to it. How does the project fit into this context?
## Deliverables
-What do we plan to provide the broader community with upon the completion of the project? Datasets? Models? APIs? Every deliverable should preferably have its own subsection with its associated potential impact, although it is not required.
+ - __Dataset:__ compile and provide a complete database of cell-specific regulatory regions (DNAse assay) to allow scientists to train and generate different diffusion models based on the regulatory sequences.
+
+
+ - __Models:__ Provide a model that can generate regulatory sequences given a specific cell type and genomic context.
+
+
+ - __API:__ Provide an API to make it possible to manipulate DNA regulatory models and a visual playground to generate synthetic contextual sequences.
+
+
+## Datasets
+
+### DHS Index:
+Chromatin (DNA + associated proteins) that is actively used for the regulation of genes (i.e. "regulatory elements") is typically accessible to DNA-binding proteins such as transcription factors ([review](https://www.nature.com/articles/s41576-018-0089-8), [relevant paper](https://www.nature.com/articles/nature11232)).
+Through the use of a technique called [DNase-seq](https://en.wikipedia.org/wiki/DNase-Seq), we've measured which parts of the genome are accessible across 733 human biosamples encompassing 438 cell and tissue types and states, resulting in more than 3.5 million DNase Hypersensitive Sites (DHSs).
+Using Non-Negative Matrix Factorization, we've summarized these data into 16 _components_, each corresponding to a different cellular context (e.g. 'cardiac', 'neural', 'lymphoid').
+
+For the efforts described in this proposal, and as part of an earlier [ongoing project](https://www.meuleman.org/research/synthseqs/) in the research group of Wouter Meuleman,
+we've put together smaller subsets of these data that can be used to train models to generate synthetic sequences for each NMF component.
-### Datasets
+Please find these data, along with a data dictionary, [here](https://www.meuleman.org/research/synthseqs/#material).
-If applicable, how large is the dataset that the project aims to produce? How difficult is producing such a dataset expected to be? What kind of resources are needed? What license will the dataset be licensed under? MIT is preferred but not required.
+### Other potential datasets:
+- DNA-sequences data corresponding to annotated regulatory sequences such as gene promoters or distal regulatory sequences such as enhancers annotated (based on chromatin marks or accessibility) for hundreds of cells by the NHGRI funded projects like ENCODE or Roadmap Epigenomics.
-### Models
+- Data from MPRA assays that test the regulatory potential of hundred of DNA sequences in parallel (https://elifesciences.org/articles/69479.pdf , https://www.nature.com/articles/s41588-021-01009-4 , ... )
-If applicable, does the project aim to release more than one model? What would be the input modality? What about the output modality? How large are the models that the project aims to release? Are there other important differences between the models to be released? If the models are very different, consider writing a short subsection for each model type.
+- MIAA assays that test the ability of open chromatin within a given cell type.
-### APIs
+## Models
-If applicable, what kind of API does the project aim to release? Are there any existing APIs that it could be integrated into? What kind of documentation could the project provide?
+## Input modality:
+ A) Cell type + regulatory element ex: Brain tumor cell weak Enhancer
+ B) Cell type + regulatory elements + TF combination (presence or absence) Ex: Prostate cell, enhancer , AR(present), TAFP2a (present) and ER (absent),
+ C) Cell type + TF combination + TF positions Ex: Blood Stem cell GATA2(presence) and ER(absent) + GATA1 (100-108)
+ D) Sequencing having a GENETIC VARIANT -> low number diffusion steps = nucleotide importance prediction
-### Paper
+### Output:
+ DNA-sequence
+__Model size:__
+ The number of enhancers and biological sequences isn’t bigger than the number of available images on the Lion dataset. The dimensionality of our generated DNA outputs should not be longer than 4 bases [A,C,T,G] X ~1kb. The final models should be bigger than ~2 GB.
-Can the project be turned into a paper? What does the evaluation process for such a paper look like? What conferences are we targeting? Can we release a blog post as well as the paper?
+__Models:__
+ Different models can be created based on the total sequence length.
-## Resources
+## APIs
+TBD depending on interest
-### Requirements
+## Paper
+__Can the project be turned into a paper? What does the evaluation process for such a paper look like? What conferences are we targeting? Can we release a blog post as well as the paper?__
-What kinds of resources (e.g. GPU hours, RAM, storage) are needed to complete the project?
+Yes, We intend to have a mix of our in silico generations and experimental validations to study our models' performance on classic regulatory systems ( ex: Sickle cell and Cancer).
+Our group and collaborators present a substantial reputation in the academic community and different publications in high-impact journals, such as Nature and Cell.
-### Timeline
-What is a (rough) timeline for this project?
+## Resources Requirements
+__What kinds of resources (e.g. GPU hours, RAM, storage) are needed to complete the project?__
+
+Our initial model can be trained with small datasets (~1k sequences) in about 3 hours ( ~500 epochs) on a colab PRO (24GB ram ) single GPU Tesla K80. Based on this we expect that to train this or similar models on the large dataset mentioned above ( ~3 million sequences (4x200) we will need several high-performant GPUs for about 3 months. ( Optimization suggestions are welcome!)
+
+## Timeline
+__What is a (rough) timeline for this project?__
+
+6 months to 1 year.
## Broader Impact
+__How is the project expected to positively impact biological research at large?__
+
+We believe this project will help to better understand genomic regulatory sequences: their composition and the potential regulators acting on them in different biological contexts and with the potential to create therapeutics based on this knowledge.
-How is the project expected to positively impact biological research at large?
## Reproducibility
+We will use best practices to make sure our code is reproducible and with versioning. We will release data processing scripts and conda environments/docker to make sure other researchers can easily run it.
-What steps are going to be taken to ensure the project's reproducibility? Will data processing scripts be released? What about training logs?
+We have several assays and technologies to test the synthetic sequences generated by these models at scale based on CRISPR genome editing or massively parallel reporter assays (MPRA).
-## Failure Case
-If our findings are unsatisfactory, do we have an exit plan? Do we have deliverables along the way that we can still provide the community with?
+## Failure Case
+Regardless of the performance of the final models, we believe it is important to test diffusion models on novel domains and other groups can build on top of our investigations.
## Preliminary Findings
+Using the Bit Diffusion model we were able to reconstruct 200 bp sequences that presented very similar motif composition to those trained sequences. The plan is to add the cell conditional variables to the model to check how different regulatory regions depend on the cell-specific context.
-If applicable, mention any preliminary findings (e.g. experiments you have run on your own or heard about) that support the project's importance.
## Next Steps
+Expand the model lengh to generate complete regulatory regions (enhancers + Gene promoter pairs)
+Use our syntethic enhancers on in-vivo models and check how they can regulate the transcriptional dynamics in biological scenarios (Besides the MPRA arrays).
+
+
+## How to contribute
+If this project sounds exciting to you, **please join us**!
+Join the OpenBioML discord: https://discord.gg/Y9CN2dUzQJ, we are discussing this project in the **dna-diffusion** channel and we will provide instructions on how to get involved.
+
+## Known contributors
-If the project is successfully completed, are there any obvious next steps?
+You can access the contributor list [here](https://docs.google.com/spreadsheets/d/1_nxDI6DIoWbyUDpIDX-tJIILejrJ0kEYrcXXdWlzPvU/edit#gid=1871728801).
-## Known contributors
-Please list community members that you know are interested in contributing. It is best if a project proposal already has an associated team capable of going ahead with the project by themselves, but it is not necessary.
diff --git a/diff_first.gif b/diff_first.gif
new file mode 100644
index 0000000..84ac96b
Binary files /dev/null and b/diff_first.gif differ
diff --git a/dna-diffusion/.DS_Store b/dna-diffusion/.DS_Store
new file mode 100644
index 0000000..a01fc9b
Binary files /dev/null and b/dna-diffusion/.DS_Store differ
diff --git a/dna-diffusion/README.md b/dna-diffusion/README.md
new file mode 100644
index 0000000..592fe9b
--- /dev/null
+++ b/dna-diffusion/README.md
@@ -0,0 +1,6 @@
+# DNA-diffusion files structure
+- data: contains the data used in the DNA diffusion project
+- losses: contains the losses used in the DNA diffusion project
+- metrics: contains the internal metrics used to evaluate the quality of the generated sequences after training
+- models: contains the models used in the DNA diffusion project (UNET, VQ-VAE, etc.)
+- utils: contains the utils used in the DNA diffusion project
diff --git a/dna-diffusion/data/README.md b/dna-diffusion/data/README.md
new file mode 100644
index 0000000..750980a
--- /dev/null
+++ b/dna-diffusion/data/README.md
@@ -0,0 +1 @@
+# Data
\ No newline at end of file
diff --git a/dna-diffusion/losses/README.md b/dna-diffusion/losses/README.md
new file mode 100644
index 0000000..0349c30
--- /dev/null
+++ b/dna-diffusion/losses/README.md
@@ -0,0 +1 @@
+# Losses
\ No newline at end of file
diff --git a/dna-diffusion/metrics/README.md b/dna-diffusion/metrics/README.md
new file mode 100644
index 0000000..1f361fe
--- /dev/null
+++ b/dna-diffusion/metrics/README.md
@@ -0,0 +1 @@
+# Internal metrics to assess the quality of generated sequences after training.
\ No newline at end of file
diff --git a/dna-diffusion/models/README.md b/dna-diffusion/models/README.md
new file mode 100644
index 0000000..485889d
--- /dev/null
+++ b/dna-diffusion/models/README.md
@@ -0,0 +1 @@
+ 3# Models
diff --git a/dna-diffusion/utils/README.md b/dna-diffusion/utils/README.md
new file mode 100644
index 0000000..4e6d550
--- /dev/null
+++ b/dna-diffusion/utils/README.md
@@ -0,0 +1 @@
+# Utils
\ No newline at end of file
diff --git a/notebooks/README.md b/notebooks/README.md
new file mode 100644
index 0000000..fc673ef
--- /dev/null
+++ b/notebooks/README.md
@@ -0,0 +1,4 @@
+# Notebook
+- experiments: Playgroud for experiments
+- refactoring: Notebooks to be refactored to the codebase
+- tutorials: Tutorials ex (how to use the model to generate new sequences, how to find motifs, etc)
\ No newline at end of file
diff --git a/notebooks/experiments/README.md b/notebooks/experiments/README.md
new file mode 100644
index 0000000..2c05815
--- /dev/null
+++ b/notebooks/experiments/README.md
@@ -0,0 +1,4 @@
+# Add here notebooks experimenting with the DNA-diffusion model
+This is a collection of notebooks that are used to experiment with the DNA-diffusion model.
+
+
diff --git a/notebooks/experiments/conditional_diffusion/README.MD b/notebooks/experiments/conditional_diffusion/README.MD
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/notebooks/experiments/conditional_diffusion/README.MD
@@ -0,0 +1 @@
+
diff --git a/notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb b/notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb
new file mode 100644
index 0000000..8bf7a1f
--- /dev/null
+++ b/notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb
@@ -0,0 +1,7119 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Goal of this notebook is to clean up and structure the code from the initial great work: https://github.com/pinellolab/DNA-Diffusion/blob/dna-diffusion/notebooks/experiments/conditional_diffusion/easy_training_Conditional_Code_to_refactor_UNET_ANNOTATED_v4%20(2).ipynb\n",
+ "\n",
+ "This notebook is divided in following chapters:\n",
+ "1. [Utility functions](#utility_functions)\n",
+ "2. [Data import and preperation](#Data-import-and-preperation)\n",
+ "3. [Stable Diffusion architecture](##Stable-Diffusion-architecture)\n",
+ "4. [Stable Diffusion training metric functions and evaluation](##Stable-+-Diffusion-+-training-+-metric-+-functions-+-and-+-evaluation)\n",
+ "\n",
+ "Concrete improvements from the previous notebook:\n",
+ "\n",
+ "1. Complete restructuring and refactoring of the code.\n",
+ "2. Delete unecessary code\n",
+ "4. Seed set for complete reproducibility (such that we know when improvements are really improving and not due to randomness)\n",
+ "5. Decoupled the logic between data loading and processing\n",
+ "6. Wrote doc strings for most of the classes and functions\n",
+ "7. Exposed hyperparameters immediately at the beggining (for easier hydra refactoring)\n",
+ "8. Added more functionality (such as injectable/interchangable metric functions etc.)\n",
+ "9. Documented all of the main chapters and subchapters, with brief theoretical description behind stable diffusion, architecture tackled there, classifier-Free Diffusion Guidance and EMEA.\n",
+ "10.\n",
+ "\n",
+ "\n",
+ "**NOTE**: \n",
+ "\n",
+ "1. KL and other metrics remained the same or improved.\n",
+ "2. When calculating KL we are measuring occurances of particular motifs. That means we are not measuring distribution between \"letters\". This was the assumption from the original nb. We are also not measuring occurances at particular place, just wether and how much a motif occures between the train and test.\n",
+ "\n",
+ "\n",
+ "The main goal of this notebook is:\n",
+ "\n",
+ "1. Set a major pre refactoring step such that we can easily move this to code base.\n",
+ "2. Set an easier entry point to newcomers.\n",
+ "3. Server as the new benchmark for prototyping.\n",
+ "4. As the refactoring is ongoing all of the code here should be abstracted away from imports from the codebase.\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "uSkxs7Ny5vnj"
+ },
+ "id": "uSkxs7Ny5vnj"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Optional libraries that need to be installed:"
+ ],
+ "metadata": {
+ "id": "PHAIjS8c7RQz"
+ },
+ "id": "PHAIjS8c7RQz"
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "!pip install einops\n",
+ "!pip install einops\n",
+ "!pip install torchmetrics\n",
+ "!pip install gimmemotifs #can take around 10 minutes\n",
+ "!genomepy install hg38\n",
+ "!pip install livelossplot"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Xl22F7485tMf",
+ "outputId": "bd8f92af-2d94-45d8-c2af-aa6ea3f280b9"
+ },
+ "id": "Xl22F7485tMf",
+ "execution_count": 1,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
+ "Collecting einops\n",
+ " Downloading einops-0.6.0-py3-none-any.whl (41 kB)\n",
+ "\u001b[K |████████████████████████████████| 41 kB 231 kB/s \n",
+ "\u001b[?25hInstalling collected packages: einops\n",
+ "Successfully installed einops-0.6.0\n",
+ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
+ "Requirement already satisfied: einops in /usr/local/lib/python3.7/dist-packages (0.6.0)\n",
+ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
+ "Collecting torchmetrics\n",
+ " Downloading torchmetrics-0.10.3-py3-none-any.whl (529 kB)\n",
+ "\u001b[K |████████████████████████████████| 529 kB 12.4 MB/s \n",
+ "\u001b[?25hRequirement already satisfied: torch>=1.3.1 in /usr/local/lib/python3.7/dist-packages (from torchmetrics) (1.12.1+cu113)\n",
+ "Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from torchmetrics) (21.3)\n",
+ "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torchmetrics) (4.1.1)\n",
+ "Requirement already satisfied: numpy>=1.17.2 in /usr/local/lib/python3.7/dist-packages (from torchmetrics) (1.21.6)\n",
+ "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->torchmetrics) (3.0.9)\n",
+ "Installing collected packages: torchmetrics\n",
+ "Successfully installed torchmetrics-0.10.3\n",
+ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
+ "Collecting gimmemotifs\n",
+ " Downloading gimmemotifs-0.17.0.tar.gz (34.5 MB)\n",
+ "\u001b[K |████████████████████████████████| 34.5 MB 25 kB/s \n",
+ "\u001b[?25hCollecting biofluff\n",
+ " Downloading biofluff-3.0.4.tar.gz (28 kB)\n",
+ "Requirement already satisfied: setuptools>=0.7 in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (57.4.0)\n",
+ "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (1.21.6)\n",
+ "Requirement already satisfied: scipy>=0.9.0 in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (1.7.3)\n",
+ "Requirement already satisfied: matplotlib>=2 in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (3.2.2)\n",
+ "Collecting iteround\n",
+ " Downloading iteround-1.0.4-py3-none-any.whl (7.3 kB)\n",
+ "Requirement already satisfied: jinja2 in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (2.11.3)\n",
+ "Requirement already satisfied: pandas>=1.1 in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (1.3.5)\n",
+ "Requirement already satisfied: pyarrow>=0.16.0 in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (9.0.0)\n",
+ "Requirement already satisfied: pyyaml>=3.10 in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (6.0)\n",
+ "Collecting pybedtools\n",
+ " Downloading pybedtools-0.9.0.tar.gz (12.5 MB)\n",
+ "\u001b[K |████████████████████████████████| 12.5 MB 55.1 MB/s \n",
+ "\u001b[?25hRequirement already satisfied: statsmodels in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (0.12.2)\n",
+ "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (1.0.2)\n",
+ "Requirement already satisfied: seaborn in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (0.11.2)\n",
+ "Collecting pysam\n",
+ " Downloading pysam-0.20.0-cp37-cp37m-manylinux_2_24_x86_64.whl (15.4 MB)\n",
+ "\u001b[K |████████████████████████████████| 15.4 MB 49.0 MB/s \n",
+ "\u001b[?25hRequirement already satisfied: xgboost>=0.71 in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (0.90)\n",
+ "Collecting xdg\n",
+ " Downloading xdg-5.1.1-py3-none-any.whl (5.0 kB)\n",
+ "Collecting diskcache\n",
+ " Downloading diskcache-5.4.0-py3-none-any.whl (44 kB)\n",
+ "\u001b[K |████████████████████████████████| 44 kB 3.4 MB/s \n",
+ "\u001b[?25hCollecting xxhash\n",
+ " Downloading xxhash-3.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)\n",
+ "\u001b[K |████████████████████████████████| 212 kB 57.5 MB/s \n",
+ "\u001b[?25hCollecting configparser\n",
+ " Downloading configparser-5.3.0-py3-none-any.whl (19 kB)\n",
+ "Collecting genomepy>=0.8.3\n",
+ " Downloading genomepy-0.14.0-py3-none-any.whl (80 kB)\n",
+ "\u001b[K |████████████████████████████████| 80 kB 9.8 MB/s \n",
+ "\u001b[?25hRequirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (4.64.1)\n",
+ "Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from gimmemotifs) (7.1.2)\n",
+ "Collecting logomaker\n",
+ " Downloading logomaker-0.8-py2.py3-none-any.whl (11.8 MB)\n",
+ "\u001b[K |████████████████████████████████| 11.8 MB 33.1 MB/s \n",
+ "\u001b[?25hCollecting qnorm\n",
+ " Downloading qnorm-0.8.1-py3-none-any.whl (15 kB)\n",
+ "Collecting loguru\n",
+ " Downloading loguru-0.6.0-py3-none-any.whl (58 kB)\n",
+ "\u001b[K |████████████████████████████████| 58 kB 6.9 MB/s \n",
+ "\u001b[?25hCollecting colorama\n",
+ " Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)\n",
+ "Collecting biopython>=1.73\n",
+ " Downloading biopython-1.80-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)\n",
+ "\u001b[K |████████████████████████████████| 3.0 MB 53.4 MB/s \n",
+ "\u001b[?25hCollecting mysql-connector-python\n",
+ " Downloading mysql_connector_python-8.0.31-cp37-cp37m-manylinux1_x86_64.whl (23.5 MB)\n",
+ "\u001b[K |████████████████████████████████| 23.5 MB 68.3 MB/s \n",
+ "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from genomepy>=0.8.3->gimmemotifs) (3.8.0)\n",
+ "Collecting pyfaidx>=0.5.7\n",
+ " Downloading pyfaidx-0.7.1.tar.gz (103 kB)\n",
+ "\u001b[K |████████████████████████████████| 103 kB 69.4 MB/s \n",
+ "\u001b[?25hRequirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from genomepy>=0.8.3->gimmemotifs) (7.1.2)\n",
+ "Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from genomepy>=0.8.3->gimmemotifs) (2.23.0)\n",
+ "Collecting mygene\n",
+ " Downloading mygene-3.2.2-py2.py3-none-any.whl (5.4 kB)\n",
+ "Requirement already satisfied: appdirs in /usr/local/lib/python3.7/dist-packages (from genomepy>=0.8.3->gimmemotifs) (1.4.4)\n",
+ "Collecting norns>=0.1.5\n",
+ " Downloading norns-0.1.6.tar.gz (3.6 kB)\n",
+ "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2->gimmemotifs) (0.11.0)\n",
+ "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2->gimmemotifs) (3.0.9)\n",
+ "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2->gimmemotifs) (2.8.2)\n",
+ "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2->gimmemotifs) (1.4.4)\n",
+ "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib>=2->gimmemotifs) (4.1.1)\n",
+ "Collecting nose\n",
+ " Downloading nose-1.3.7-py3-none-any.whl (154 kB)\n",
+ "\u001b[K |████████████████████████████████| 154 kB 58.8 MB/s \n",
+ "\u001b[?25hRequirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.1->gimmemotifs) (2022.6)\n",
+ "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from pyfaidx>=0.5.7->genomepy>=0.8.3->gimmemotifs) (1.15.0)\n",
+ "Collecting HTSeq\n",
+ " Downloading HTSeq-2.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)\n",
+ "\u001b[K |████████████████████████████████| 1.8 MB 46.7 MB/s \n",
+ "\u001b[?25hRequirement already satisfied: palettable in /usr/local/lib/python3.7/dist-packages (from biofluff->gimmemotifs) (3.3.0)\n",
+ "Collecting pyBigWig\n",
+ " Downloading pyBigWig-0.3.18.tar.gz (64 kB)\n",
+ "\u001b[K |████████████████████████████████| 64 kB 4.1 MB/s \n",
+ "\u001b[?25hRequirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from jinja2->gimmemotifs) (2.0.1)\n",
+ "Collecting biothings-client>=0.2.6\n",
+ " Downloading biothings_client-0.2.6-py2.py3-none-any.whl (37 kB)\n",
+ "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->genomepy>=0.8.3->gimmemotifs) (2.10)\n",
+ "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->genomepy>=0.8.3->gimmemotifs) (1.24.3)\n",
+ "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->genomepy>=0.8.3->gimmemotifs) (3.0.4)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->genomepy>=0.8.3->gimmemotifs) (2022.9.24)\n",
+ "Requirement already satisfied: protobuf<=3.20.1,>=3.11.0 in /usr/local/lib/python3.7/dist-packages (from mysql-connector-python->genomepy>=0.8.3->gimmemotifs) (3.19.6)\n",
+ "Requirement already satisfied: numba in /usr/local/lib/python3.7/dist-packages (from qnorm->gimmemotifs) (0.56.4)\n",
+ "Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.7/dist-packages (from numba->qnorm->gimmemotifs) (0.39.1)\n",
+ "Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from numba->qnorm->gimmemotifs) (4.13.0)\n",
+ "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->numba->qnorm->gimmemotifs) (3.10.0)\n",
+ "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->gimmemotifs) (3.1.0)\n",
+ "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->gimmemotifs) (1.2.0)\n",
+ "Requirement already satisfied: patsy>=0.5 in /usr/local/lib/python3.7/dist-packages (from statsmodels->gimmemotifs) (0.5.3)\n",
+ "Building wheels for collected packages: gimmemotifs, norns, pyfaidx, biofluff, pybedtools, pyBigWig\n",
+ " Building wheel for gimmemotifs (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Created wheel for gimmemotifs: filename=gimmemotifs-0.17.0-cp37-cp37m-linux_x86_64.whl size=10668514 sha256=a5bab7e55ed9915dc2492cbf2a9fb42304404fe0fd56d3a19dd2b09a16b9cb1b\n",
+ " Stored in directory: /root/.cache/pip/wheels/62/8c/eb/95b4e046d22f6c98a7b173cb5b9f57dcb875f6c45fd8d0f6f6\n",
+ " Building wheel for norns (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Created wheel for norns: filename=norns-0.1.6-py3-none-any.whl size=4013 sha256=1ca144902f9b4ce14057ed8e2b9b7d319f7ed7fb30908edd96bd17a8eb26ed25\n",
+ " Stored in directory: /root/.cache/pip/wheels/e5/0c/40/7af959b70310e5211f8af8a1e2426ff7abe6cc3e95cd6a7e72\n",
+ " Building wheel for pyfaidx (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Created wheel for pyfaidx: filename=pyfaidx-0.7.1-py3-none-any.whl size=27748 sha256=9b5bbb4f33292ac8386a269af1b7f3bcd470a772d646021e71c88e4dec61bdf7\n",
+ " Stored in directory: /root/.cache/pip/wheels/1a/d6/99/7334c4d11bfb574e6d6ea706256053b268a12f2127af1cfd40\n",
+ " Building wheel for biofluff (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Created wheel for biofluff: filename=biofluff-3.0.4-py3-none-any.whl size=31254 sha256=c6ba89ae696b24c542715935c5ca14f970cde7f11b9ea62ac577584a023209ae\n",
+ " Stored in directory: /root/.cache/pip/wheels/5b/04/b0/85c5d2fb010c66eb4d5e2e906c96a69824a52c58f277491c33\n",
+ " Building wheel for pybedtools (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Created wheel for pybedtools: filename=pybedtools-0.9.0-cp37-cp37m-linux_x86_64.whl size=13616828 sha256=694074b25c9bf5c57023ecaad261ac8dd11d0adf351423ddc0fcdfc2bb2a1a50\n",
+ " Stored in directory: /root/.cache/pip/wheels/7a/44/0d/3a7449885adaf8ebb157da8c3c834a712f48b3b3b84ba51dda\n",
+ " Building wheel for pyBigWig (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+ " Created wheel for pyBigWig: filename=pyBigWig-0.3.18-cp37-cp37m-linux_x86_64.whl size=197787 sha256=34b1b242fcdcb1bf0b0cdf7d8f2b74d1d8d082832a8def108e90ac10dff6e3d5\n",
+ " Stored in directory: /root/.cache/pip/wheels/28/eb/46/c761563ba38bd516bcc6accde3d4188cd84eec067f9201cbec\n",
+ "Successfully built gimmemotifs norns pyfaidx biofluff pybedtools pyBigWig\n",
+ "Installing collected packages: pysam, nose, biothings-client, pyfaidx, pyBigWig, pybedtools, norns, mysql-connector-python, mygene, loguru, HTSeq, diskcache, colorama, biopython, xxhash, xdg, qnorm, logomaker, iteround, genomepy, configparser, biofluff, gimmemotifs\n",
+ "Successfully installed HTSeq-2.0.2 biofluff-3.0.4 biopython-1.80 biothings-client-0.2.6 colorama-0.4.6 configparser-5.3.0 diskcache-5.4.0 genomepy-0.14.0 gimmemotifs-0.17.0 iteround-1.0.4 logomaker-0.8 loguru-0.6.0 mygene-3.2.2 mysql-connector-python-8.0.31 norns-0.1.6 nose-1.3.7 pyBigWig-0.3.18 pybedtools-0.9.0 pyfaidx-0.7.1 pysam-0.20.0 qnorm-0.8.1 xdg-5.1.1 xxhash-3.1.0\n",
+ "\u001b[32m10:13:28\u001b[0m \u001b[1m|\u001b[0m \u001b[34mINFO\u001b[0m \u001b[1m|\u001b[0m Downloading assembly summaries from GENCODE\n",
+ "\u001b[32m10:13:34\u001b[0m \u001b[1m|\u001b[0m \u001b[34mINFO\u001b[0m \u001b[1m|\u001b[0m Downloading assembly summaries from UCSC\n",
+ "\u001b[32m10:13:38\u001b[0m \u001b[1m|\u001b[0m \u001b[34mINFO\u001b[0m \u001b[1m|\u001b[0m Downloading assembly summaries from Ensembl\n",
+ "\u001b[32m10:13:56\u001b[0m \u001b[1m|\u001b[0m \u001b[34mINFO\u001b[0m \u001b[1m|\u001b[0m Downloading genome from UCSC. Target URL: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz...\n",
+ "Download: 100% 938M/938M [01:03<00:00, 15.6MB/s]\u001b[0m\n",
+ "\u001b[0m\u001b[32m10:15:00\u001b[0m \u001b[1m|\u001b[0m \u001b[34mINFO\u001b[0m \u001b[1m|\u001b[0m Genome download successful, starting post processing...\n",
+ "\u001b[32m10:15:21\u001b[0m \u001b[1m|\u001b[0m \u001b[34mINFO\u001b[0m \u001b[1m|\u001b[0m name: hg38\n",
+ "\u001b[32m10:15:21\u001b[0m \u001b[1m|\u001b[0m \u001b[34mINFO\u001b[0m \u001b[1m|\u001b[0m local name: hg38\n",
+ "\u001b[32m10:15:21\u001b[0m \u001b[1m|\u001b[0m \u001b[34mINFO\u001b[0m \u001b[1m|\u001b[0m fasta: /root/.local/share/genomes/hg38/hg38.fa\n",
+ "Filtering Fasta: 64.2M lines [00:33, 1.94M lines/s]\u001b[0m\n",
+ "\u001b[0m\u001b[0mLooking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
+ "Collecting livelossplot\n",
+ " Downloading livelossplot-0.5.5-py3-none-any.whl (22 kB)\n",
+ "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from livelossplot) (3.2.2)\n",
+ "Requirement already satisfied: ipython==7.* in /usr/local/lib/python3.7/dist-packages (from livelossplot) (7.9.0)\n",
+ "Requirement already satisfied: bokeh in /usr/local/lib/python3.7/dist-packages (from livelossplot) (2.3.3)\n",
+ "Requirement already satisfied: numpy<1.22 in /usr/local/lib/python3.7/dist-packages (from livelossplot) (1.21.6)\n",
+ "Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from ipython==7.*->livelossplot) (0.7.5)\n",
+ "Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from ipython==7.*->livelossplot) (2.6.1)\n",
+ "Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from ipython==7.*->livelossplot) (4.4.2)\n",
+ "Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.7/dist-packages (from ipython==7.*->livelossplot) (5.1.1)\n",
+ "Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.7/dist-packages (from ipython==7.*->livelossplot) (57.4.0)\n",
+ "Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from ipython==7.*->livelossplot) (2.0.10)\n",
+ "Collecting jedi>=0.10\n",
+ " Downloading jedi-0.18.2-py2.py3-none-any.whl (1.6 MB)\n",
+ "\u001b[K |████████████████████████████████| 1.6 MB 16.6 MB/s \n",
+ "\u001b[?25hRequirement already satisfied: backcall in /usr/local/lib/python3.7/dist-packages (from ipython==7.*->livelossplot) (0.2.0)\n",
+ "Requirement already satisfied: pexpect in /usr/local/lib/python3.7/dist-packages (from ipython==7.*->livelossplot) (4.8.0)\n",
+ "Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.7/dist-packages (from jedi>=0.10->ipython==7.*->livelossplot) (0.8.3)\n",
+ "Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython==7.*->livelossplot) (0.2.5)\n",
+ "Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython==7.*->livelossplot) (1.15.0)\n",
+ "Requirement already satisfied: tornado>=5.1 in /usr/local/lib/python3.7/dist-packages (from bokeh->livelossplot) (6.0.4)\n",
+ "Requirement already satisfied: packaging>=16.8 in /usr/local/lib/python3.7/dist-packages (from bokeh->livelossplot) (21.3)\n",
+ "Requirement already satisfied: pillow>=7.1.0 in /usr/local/lib/python3.7/dist-packages (from bokeh->livelossplot) (7.1.2)\n",
+ "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from bokeh->livelossplot) (2.8.2)\n",
+ "Requirement already satisfied: Jinja2>=2.9 in /usr/local/lib/python3.7/dist-packages (from bokeh->livelossplot) (2.11.3)\n",
+ "Requirement already satisfied: PyYAML>=3.10 in /usr/local/lib/python3.7/dist-packages (from bokeh->livelossplot) (6.0)\n",
+ "Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from bokeh->livelossplot) (4.1.1)\n",
+ "Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from Jinja2>=2.9->bokeh->livelossplot) (2.0.1)\n",
+ "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=16.8->bokeh->livelossplot) (3.0.9)\n",
+ "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->livelossplot) (1.4.4)\n",
+ "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->livelossplot) (0.11.0)\n",
+ "Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.7/dist-packages (from pexpect->ipython==7.*->livelossplot) (0.7.0)\n",
+ "Installing collected packages: jedi, livelossplot\n",
+ "Successfully installed jedi-0.18.2 livelossplot-0.5.5\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Importing the dependencies:"
+ ],
+ "metadata": {
+ "id": "qKfYq_kYd6DS"
+ },
+ "id": "qKfYq_kYd6DS"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "4d974712-ea83-4ab5-96fa-0b590e7e03f7",
+ "metadata": {
+ "id": "4d974712-ea83-4ab5-96fa-0b590e7e03f7"
+ },
+ "outputs": [],
+ "source": [
+ "import os; os.getpid()\n",
+ "from functools import partial\n",
+ "from scipy.stats import zscore\n",
+ "import torch\n",
+ "import copy\n",
+ "from torch.utils.data import DataLoader, Dataset\n",
+ "import torchvision.transforms as T\n",
+ "import torch\n",
+ "from IPython.display import display\n",
+ "import torch.nn as nn\n",
+ "from torchvision.utils import make_grid\n",
+ "from torchvision.utils import save_image\n",
+ "from IPython.display import Image\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import random\n",
+ "import pandas as pd\n",
+ "from tqdm import tqdm_notebook\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "from torch.nn.modules.activation import ReLU\n",
+ "from torch.optim import Adam\n",
+ "from tqdm import tqdm_notebook\n",
+ "from torchvision.utils import save_image\n",
+ "import matplotlib\n",
+ "import math\n",
+ "from inspect import isfunction\n",
+ "from functools import partial\n",
+ "import scipy\n",
+ "from scipy.special import rel_entr\n",
+ "from torch import nn, einsum\n",
+ "import torch.nn.functional as F\n",
+ "import matplotlib.pyplot as plt\n",
+ "from tqdm.auto import tqdm\n",
+ "from einops import rearrange\n",
+ "from torch import nn, einsum\n",
+ "import torch.nn.functional as F\n",
+ "import matplotlib.animation as animation\n",
+ "import matplotlib.image as mpimg\n",
+ "import glob\n",
+ "from PIL import Image\n",
+ "from typing import List, Union\n",
+ "from torchmetrics.functional import kl_divergence\n",
+ "import random\n",
+ "import gc\n",
+ "from livelossplot import PlotLosses\n",
+ "\n",
+ "%matplotlib inline\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### **Hyperparameters** that are exposed and need to be abstracted through hydra like interface and/or (potentially) hyperoptimised:"
+ ],
+ "metadata": {
+ "id": "y7xe6dqZeVga"
+ },
+ "id": "y7xe6dqZeVga"
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Global seed\n",
+ "GLOBAL_SEED=42\n",
+ "# Nucleic acids encoding the motifs\n",
+ "NUCLEOTIDES = ['A', 'C', 'T', 'G']\n",
+ "# Number of samples to run the training, evaluation. Increase to get statistical significance\n",
+ "N_SAMPLES=1000\n",
+ "# Enumarate cell names\n",
+ "ENUMARATED_CELL_NAME = '''7 Trophoblasts\n",
+ "5 CD8_cells\n",
+ "15 CD34_cells\n",
+ "9 Fetal_heart\n",
+ "12 Fetal_muscle\n",
+ "14 HMVEC(vascular)\n",
+ "3 hESC(Embryionic)\n",
+ "8 Fetal(Neural)\n",
+ "13 Intestine\n",
+ "2 Skin(stromalA)\n",
+ "4 Fibroblast(stromalB)\n",
+ "6 Renal(Cancer)\n",
+ "16 Esophageal(Cancer)\n",
+ "11 Fetal_Lung\n",
+ "10 Fetal_kidney\n",
+ "1 Tissue_Invariant'''.split('\\n')\n",
+ "# Cell names\n",
+ "CELL_NAMES = {int(x.split(' ')[0]): x.split(' ')[1] for x in ENUMARATED_CELL_NAME}\n",
+ "# Number of epochs to train for\n",
+ "EPOCHS = 10000\n",
+ "# save and compare metrics after specified epoch\n",
+ "SAVE_AND_SAMPLE_EVERY = 5\n",
+ "# show loss after speficied epoch\n",
+ "EPOCHS_LOSS_SHOW = 5\n",
+ "# Number of BP for the motif\n",
+ "IMAGE_SIZE = 200\n",
+ "# iamge channels, with us singular\n",
+ "CHANNELS = 1\n",
+ "# Learning rate\n",
+ "LEARNING_RATE=1e-4\n",
+ "# timesteps for diffusion\n",
+ "TIMESTEPS=50\n",
+ "# Number of resnet block groups in the UNET architecture\n",
+ "RESNET_BLOCK_GROUPS=4\n",
+ "# Batch size\n",
+ "BATCH_SIZE = 16\n",
+ "# total number of components\n",
+ "TOTAL_CLASS_NUMBER = 17"
+ ],
+ "metadata": {
+ "id": "iFAhLytNeVvl"
+ },
+ "id": "iFAhLytNeVvl",
+ "execution_count": 22,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def seed_everything(seed=GLOBAL_SEED):\n",
+ " \"\"\"\"\n",
+ " Seed everything.\n",
+ " \"\"\" \n",
+ " random.seed(seed)\n",
+ " os.environ['PYTHONHASHSEED'] = str(seed)\n",
+ " np.random.seed(seed)\n",
+ " torch.manual_seed(seed)\n",
+ " torch.cuda.manual_seed(seed)\n",
+ " torch.cuda.manual_seed_all(seed)\n",
+ " torch.backends.cudnn.deterministic = True"
+ ],
+ "metadata": {
+ "id": "X_QMQwoH8MY5"
+ },
+ "id": "X_QMQwoH8MY5",
+ "execution_count": 4,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "seed_everything()"
+ ],
+ "metadata": {
+ "id": "mEarm8ko8Q03"
+ },
+ "id": "mEarm8ko8Q03",
+ "execution_count": 5,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Download the data if you dont have it locally:"
+ ],
+ "metadata": {
+ "id": "PezugMTDCokd"
+ },
+ "id": "PezugMTDCokd"
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# #downloading data\n",
+ "!wget https://www.dropbox.com/s/db6up7c0d4jwdp4/train_all_classifier_WM20220916.csv.gz?dl=2\n",
+ "# #changing name\n",
+ "!mv train_all_classifier_WM20220916.csv.gz?dl=2 train_all_classifier_WM20220916.csv.gz\n",
+ "# #unpcaking \n",
+ "!gunzip -d /train_all_classifier_WM20220916.csv.gz?dl=2 train_all_classifier_WM20220916.csv\n",
+ "# #new seqs"
+ ],
+ "metadata": {
+ "id": "40Wix06PCo4C",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "212dcae1-83a1-493a-f2c5-baae24330239"
+ },
+ "id": "40Wix06PCo4C",
+ "execution_count": 6,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "--2022-11-26 10:17:29-- https://www.dropbox.com/s/db6up7c0d4jwdp4/train_all_classifier_WM20220916.csv.gz?dl=2\n",
+ "Resolving www.dropbox.com (www.dropbox.com)... 162.125.65.18, 2620:100:6021:18::a27d:4112\n",
+ "Connecting to www.dropbox.com (www.dropbox.com)|162.125.65.18|:443... connected.\n",
+ "HTTP request sent, awaiting response... 302 Found\n",
+ "Location: /s/raw/db6up7c0d4jwdp4/train_all_classifier_WM20220916.csv.gz [following]\n",
+ "--2022-11-26 10:17:29-- https://www.dropbox.com/s/raw/db6up7c0d4jwdp4/train_all_classifier_WM20220916.csv.gz\n",
+ "Reusing existing connection to www.dropbox.com:443.\n",
+ "HTTP request sent, awaiting response... 302 Found\n",
+ "Location: https://ucbcd8e5eef3ef7a6d16db144774.dl.dropboxusercontent.com/cd/0/inline/Bxcfr5LH8pAc4jc9C3ZxSC-wUzvYDj8duIAQTBKgIcRxE8bOKHvepDKfj2a1jLaKZvyg1SLT0_jrx5EzwUygETkXPwOMy00T54qmjJRdiKWxytgK_vA83wnkqH1a66kTREkD3lE-ZjlRdoH3CyTrNx6Pk23LPTa7PFH5lX4jJVswiQ/file# [following]\n",
+ "--2022-11-26 10:17:30-- https://ucbcd8e5eef3ef7a6d16db144774.dl.dropboxusercontent.com/cd/0/inline/Bxcfr5LH8pAc4jc9C3ZxSC-wUzvYDj8duIAQTBKgIcRxE8bOKHvepDKfj2a1jLaKZvyg1SLT0_jrx5EzwUygETkXPwOMy00T54qmjJRdiKWxytgK_vA83wnkqH1a66kTREkD3lE-ZjlRdoH3CyTrNx6Pk23LPTa7PFH5lX4jJVswiQ/file\n",
+ "Resolving ucbcd8e5eef3ef7a6d16db144774.dl.dropboxusercontent.com (ucbcd8e5eef3ef7a6d16db144774.dl.dropboxusercontent.com)... 162.125.65.15, 2620:100:6021:15::a27d:410f\n",
+ "Connecting to ucbcd8e5eef3ef7a6d16db144774.dl.dropboxusercontent.com (ucbcd8e5eef3ef7a6d16db144774.dl.dropboxusercontent.com)|162.125.65.15|:443... connected.\n",
+ "HTTP request sent, awaiting response... 302 Found\n",
+ "Location: /cd/0/inline2/BxeyJR3okxXp5EeekKjefgP7xmxZrBNFK-OSpRueqR-2FaztB_c470dMZLPjf0ZiT82AJM7CHGE0p7wx6uxz1snkBR97B3W4EunAMSaqA1gznzqrxeko8DqIDRCMPJJkpKvLW1oIM1ODeX2SjBSEugIaRR7ApG7yR-ED9gp3O13J0jZAOK2bw2-kqRX1rwQQMmmUf2W2a7nk6zRg2ixEvKEObFlAhAUSE64AWrlomjnK-MBbxJbDRS2772hQrzNgwtISFfErj1CF_wz_cuTsqRSMmPkEltGJnMYrNDZ1GaWa70mxfJfIEJA2O9EJzxDe7sbWttbqSx22DPBAxMjBLp48QhY0UYq0jESFYGlvDv4UDTX8sq-MkgD-KIUuHLEn9UzVOrXFXKFJIZ3pnkSEDKEjBzLvbGmqgM4oOpCzUn7GEQ/file [following]\n",
+ "--2022-11-26 10:17:30-- https://ucbcd8e5eef3ef7a6d16db144774.dl.dropboxusercontent.com/cd/0/inline2/BxeyJR3okxXp5EeekKjefgP7xmxZrBNFK-OSpRueqR-2FaztB_c470dMZLPjf0ZiT82AJM7CHGE0p7wx6uxz1snkBR97B3W4EunAMSaqA1gznzqrxeko8DqIDRCMPJJkpKvLW1oIM1ODeX2SjBSEugIaRR7ApG7yR-ED9gp3O13J0jZAOK2bw2-kqRX1rwQQMmmUf2W2a7nk6zRg2ixEvKEObFlAhAUSE64AWrlomjnK-MBbxJbDRS2772hQrzNgwtISFfErj1CF_wz_cuTsqRSMmPkEltGJnMYrNDZ1GaWa70mxfJfIEJA2O9EJzxDe7sbWttbqSx22DPBAxMjBLp48QhY0UYq0jESFYGlvDv4UDTX8sq-MkgD-KIUuHLEn9UzVOrXFXKFJIZ3pnkSEDKEjBzLvbGmqgM4oOpCzUn7GEQ/file\n",
+ "Reusing existing connection to ucbcd8e5eef3ef7a6d16db144774.dl.dropboxusercontent.com:443.\n",
+ "HTTP request sent, awaiting response... 200 OK\n",
+ "Length: 22394917 (21M) [application/octet-stream]\n",
+ "Saving to: ‘train_all_classifier_WM20220916.csv.gz?dl=2’\n",
+ "\n",
+ "train_all_classifie 100%[===================>] 21.36M 14.9MB/s in 1.4s \n",
+ "\n",
+ "2022-11-26 10:17:32 (14.9 MB/s) - ‘train_all_classifier_WM20220916.csv.gz?dl=2’ saved [22394917/22394917]\n",
+ "\n",
+ "gzip: /train_all_classifier_WM20220916.csv.gz?dl=2.gz: No such file or directory\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "afc21faf-97ed-4ce9-97d6-a95bcaa8a4b1",
+ "metadata": {
+ "id": "afc21faf-97ed-4ce9-97d6-a95bcaa8a4b1",
+ "tags": []
+ },
+ "source": [
+ "Mount the drive if you are using data from it:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# from google.colab import drive\n",
+ "# drive.mount('/content/drive')"
+ ],
+ "metadata": {
+ "id": "-l6DRzxy9v6x"
+ },
+ "id": "-l6DRzxy9v6x",
+ "execution_count": 7,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#Utility functions"
+ ],
+ "metadata": {
+ "id": "Qfv63ka9VZa0"
+ },
+ "id": "Qfv63ka9VZa0"
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def motif_scoring_KL_divergence(original: pd.Series, \n",
+ " generated: pd.Series) -> torch.Tensor:\n",
+ " \n",
+ " \"\"\"\n",
+ " This function encapsulates the logic of evaluating the KL divergence metric\n",
+ " between two sequences.\n",
+ " Returns\n",
+ " -------\n",
+ " kl_divergence: Float\n",
+ " The KL divergence between the input and output (generated)\n",
+ " sequences' distribution\n",
+ " \"\"\"\n",
+ "\n",
+ " kl_pq = rel_entr(original, generated)\n",
+ " return np.sum(kl_pq)"
+ ],
+ "metadata": {
+ "id": "DE86Iht8VYWw"
+ },
+ "id": "DE86Iht8VYWw",
+ "execution_count": 8,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def compare_motif_list(df_motifs_a, df_motifs_b, motif_scoring_metric=motif_scoring_KL_divergence, plot_motif_probs=False):\n",
+ " \"\"\" \n",
+ " This function encapsulates the logic of evaluating the difference between the distribution\n",
+ " of frequencies between generated (diffusion/df_motifs_a) and the input (training/df_motifs_b) for an arbitrary metric (\"motif_scoring_metric\")\n",
+ "\n",
+ " Please note that some metrics, like KL_divergence, are not metrics in official sense. Reason \n",
+ " for that is that they dont satisfy certain properties, such as in KL case, the simmetry property.\n",
+ " Hence it makes a big difference what are the positions of input.\n",
+ " \"\"\"\n",
+ " set_all_mot = set(df_motifs_a.index.values.tolist() + df_motifs_b.index.values.tolist())\n",
+ " create_new_matrix = []\n",
+ " for x in set_all_mot:\n",
+ " list_in = []\n",
+ " list_in.append(x) # adding the name\n",
+ " if x in df_motifs_a.index:\n",
+ " list_in.append(df_motifs_a.loc[x][0])\n",
+ " else:\n",
+ " list_in.append(1)\n",
+ " \n",
+ " if x in df_motifs_b.index:\n",
+ " list_in.append(df_motifs_b.loc[x][0])\n",
+ " else:\n",
+ " list_in.append(1)\n",
+ " \n",
+ " create_new_matrix.append(list_in) \n",
+ " \n",
+ "\n",
+ " df_motifs = pd.DataFrame(create_new_matrix, columns=['motif', 'motif_a', 'motif_b'])\n",
+ " \n",
+ " df_motifs['Diffusion_seqs'] = df_motifs['motif_a'] / df_motifs['motif_a'].sum() \n",
+ " df_motifs['Training_seqs'] = df_motifs['motif_b'] / df_motifs['motif_b'].sum()\n",
+ " if plot_motif_probs:\n",
+ " plt.rcParams[\"figure.figsize\"] = (3,3)\n",
+ " sns.regplot(x='Diffusion_seqs', y='Training_seqs',data=df_motifs)\n",
+ " plt.xlabel('Diffusion Seqs')\n",
+ " plt.ylabel('Training Seqs')\n",
+ " plt.title('Motifs Probs')\n",
+ " plt.show()\n",
+ "\n",
+ " return motif_scoring_metric(df_motifs['Diffusion_seqs'].values, df_motifs['Training_seqs'].values)"
+ ],
+ "metadata": {
+ "id": "WQvYEJcUo2Kw"
+ },
+ "id": "WQvYEJcUo2Kw",
+ "execution_count": 9,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def metric_comparison_between_components(original_data, generated_data, x_label_plot, y_label_plot):\n",
+ " \"\"\"\n",
+ " This functions takes as inputs dictionaries, which contain as keys different components (cell types)\n",
+ " and as values the distribution of occurances of different motifs. These two dictionaries represent two different datasets, i.e.\n",
+ " generated dataset and the input (train) dataset.\n",
+ "\n",
+ " The goal is to then plot a the main evaluation metric (KL or otherwise) across all different types of cell types\n",
+ " in a heatmap fashion.\n",
+ " \"\"\"\n",
+ " final_comparison_all_components = []\n",
+ " for components_1, motif_occurance_frequency in original_data.items():\n",
+ " comparisons_single_component = []\n",
+ " for components_2 in generated_data.keys():\n",
+ " compared_motifs_occurances = compare_motif_list(motif_occurance_frequency, generated_data[components_2])\n",
+ " comparisons_single_component.append(compared_motifs_occurances)\n",
+ "\n",
+ " final_comparison_all_components.append(comparisons_single_component)\n",
+ "\n",
+ " plt.rcParams[\"figure.figsize\"] = (10,10)\n",
+ " df_plot = pd.DataFrame(final_comparison_all_components)\n",
+ " df_plot.columns = [CELL_NAMES[x] for x in cell_components]\n",
+ " df_plot.index = df_plot.columns\n",
+ " sns.heatmap(df_plot, cmap='Blues_r', annot=True, lw=0.1, vmax=1, vmin=0 )\n",
+ " plt.title(f'Kl divergence \\n {x_label_plot} sequences x {y_label_plot} sequences \\n MOTIFS probabilities')\n",
+ " plt.xlabel(f'{x_label_plot} Sequences \\n(motifs dist)')\n",
+ " plt.ylabel(f'{y_label_plot} \\n (motifs dist)')"
+ ],
+ "metadata": {
+ "id": "gAd4xx4WoWRc"
+ },
+ "id": "gAd4xx4WoWRc",
+ "execution_count": 10,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def one_hot_encode(seq, nucleotides, max_seq_len):\n",
+ " \"\"\"\n",
+ " One-hot encode a sequence of nucleotides.\n",
+ " \"\"\"\n",
+ " seq_len = len(seq)\n",
+ " seq_array = np.zeros((max_seq_len, len(nucleotides)))\n",
+ " for i in range(seq_len):\n",
+ " seq_array[i, nucleotides.index(seq[i])] = 1\n",
+ " return seq_array"
+ ],
+ "metadata": {
+ "id": "hOsUifHlj9Pj"
+ },
+ "id": "hOsUifHlj9Pj",
+ "execution_count": 11,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "def log(t, eps = 1e-20):\n",
+ " \"\"\"\n",
+ " Toch log for the purporses of diffusion time steps t.\n",
+ " \"\"\"\n",
+ " return torch.log(t.clamp(min = eps))"
+ ],
+ "metadata": {
+ "id": "btRvcMIAiqyH"
+ },
+ "id": "btRvcMIAiqyH",
+ "execution_count": 12,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "**EMA (exponential moving average)**\n",
+ " the idea is that you have two sets of parameters, the set that is recently affected by what it recently saw during the training, and the set that is updated as an average over multiple iterations which supposedly have parameters which are more appropriate over the entire dataset.\n",
+ "\n",
+ "This is because without EMA, models tend to overfit during the last iterations. With EMA the weights you use for inference are an average of all the weights you got during the last training iterations, which usually reduce this \"last-iterations overfitting\".\n",
+ "\n",
+ "https://github.com/dome272/Diffusion-Models-pytorch/blob/main/modules.py"
+ ],
+ "metadata": {
+ "id": "eBjBedftu1oS"
+ },
+ "id": "eBjBedftu1oS"
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "class EMA:\n",
+ " def __init__(self, beta):\n",
+ " super().__init__()\n",
+ " self.beta = beta\n",
+ " self.step = 0\n",
+ "\n",
+ " def update_model_average(self, ma_model, current_model):\n",
+ " for current_params, ma_params in zip(current_model.parameters(), ma_model.parameters()):\n",
+ " old_weight, up_weight = ma_params.data, current_params.data\n",
+ " ma_params.data = self.update_average(old_weight, up_weight)\n",
+ "\n",
+ " def update_average(self, old, new):\n",
+ " if old is None:\n",
+ " return new\n",
+ " return old * self.beta + (1 - self.beta) * new\n",
+ "\n",
+ " def step_ema(self, ema_model, model, step_start_ema=2000):\n",
+ " if self.step < step_start_ema:\n",
+ " self.reset_parameters(ema_model, model)\n",
+ " self.step += 1\n",
+ " return\n",
+ " self.update_model_average(ema_model, model)\n",
+ " self.step += 1\n",
+ "\n",
+ " def reset_parameters(self, ema_model, model):\n",
+ " ema_model.load_state_dict(model.state_dict())\n",
+ " \n",
+ " "
+ ],
+ "metadata": {
+ "id": "9F1m7Rdzu0gx"
+ },
+ "id": "9F1m7Rdzu0gx",
+ "execution_count": 13,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Data import and preperation"
+ ],
+ "metadata": {
+ "id": "PfGB5nJ309_u"
+ },
+ "id": "PfGB5nJ309_u"
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "class DataLoading():\n",
+ " \"\"\" \n",
+ " Main goal of this loading class is to abstract away all of the logic behind taking the input NHS Dataset: https://www.meuleman.org/research/synthseqs/\n",
+ " And then perform certain manipulations to get the data preprocessed in different formats we care about.\n",
+ " Here are some of the things that are abstracted away (happening in order presented):\n",
+ " \n",
+ " 1. Read in the raw data from the csv [The read_csv() method]\n",
+ " 2. Create a subsetted dataset based on components [specified through subset_components argument and executed through create_subsetted_components_df()]\n",
+ " 3. Split the data into train, test and shuffled dataset [The create_train_groups() method]\n",
+ " \"\"\"\n",
+ "\n",
+ " def __init__(self, input_csv, sample_number=N_SAMPLES, subset_components=None, plot_components_distribution=True, change_component_index = True):\n",
+ " \"\"\"\n",
+ " \"\"\"\n",
+ " \n",
+ " self.csv = input_csv\n",
+ " self.plot_components_distribution = plot_components_distribution\n",
+ " self.sample_number = sample_number\n",
+ " self.subset_components = subset_components\n",
+ " self.change_comp_index = change_component_index\n",
+ " self.data = self.read_csv()\n",
+ " self.df_generate = self.create_subsetted_components_df() \n",
+ " self.df_train_raw_grouped, self.df_test_raw_grouped , self.df_shuffled_raw_grouped = self.create_train_groups()\n",
+ " \n",
+ " def read_csv(self):\n",
+ " \"\"\"\n",
+ " Read the raw csv.\n",
+ " \"\"\"\n",
+ " df = pd.read_csv( self.csv , sep=\"\\t\")\n",
+ " if self.change_comp_index:\n",
+ " df['component']= df['component'] + 1\n",
+ " return df\n",
+ "\n",
+ " def create_subsetted_components_df(self):\n",
+ " \"\"\"\n",
+ " Subset the raw csv based on components.\n",
+ " \"\"\"\n",
+ " df_subsetted_components = self.data.copy()\n",
+ " if self.subset_components != None and type(self.subset_components) == list:\n",
+ " df_subsetted_components = df_subsetted_components.query( ' or '.join([f'component == {c}' for c in self.subset_components])).copy()\n",
+ " print ('Subseting...')\n",
+ " \n",
+ " if self.plot_components_distribution:\n",
+ " (df_subsetted_components.groupby('component').count()['raw_sequence'] / df_subsetted_components.groupby('component').count()['raw_sequence'].sum() ).plot.bar()\n",
+ " plt.title('Component % on Training Sample')\n",
+ " plt.show()\n",
+ "\n",
+ " return df_subsetted_components\n",
+ " \n",
+ " def create_train_groups(self):\n",
+ " \"\"\"\n",
+ " Split the subsetted df into train test and suffled.\n",
+ " \"\"\"\n",
+ " df_sampled = self.df_generate.sample(self.sample_number*2) #getting train and test\n",
+ " df_train = df_sampled.iloc[:self.sample_number].copy()\n",
+ " df_test = df_sampled.iloc[self.sample_number:].copy()\n",
+ " df_train_shuffled = df_train.copy()\n",
+ " df_train_shuffled['raw_sequence'] = df_train_shuffled['raw_sequence'].apply(lambda x : ''.join(random.sample(list(x), len(x))) )\n",
+ " return df_train , df_test , df_train_shuffled\n",
+ "\n",
+ "\n",
+ "class DataPreprocessing():\n",
+ " \"\"\" \n",
+ " Main goal of this loading class is to abstract away all of the logic behind preprocessing the raw data\n",
+ "\n",
+ " Here are some of the things that are abstracted away (happening in order presented):\n",
+ " \n",
+ " 1. Generate motifs\n",
+ " 2. Generate fasta files\n",
+ " 3. Save motifs per components in dictionary\n",
+ " \"\"\"\n",
+ "\n",
+ " def __init__(self, df_train_raw_grouped, df_test_raw_grouped, df_shuffled_raw_grouped, subset_components, sample_number):\n",
+ " \"\"\"\n",
+ " \"\"\"\n",
+ " self.df_train_raw_grouped=df_train_raw_grouped\n",
+ " self.df_test_raw_grouped=df_test_raw_grouped\n",
+ " self.df_shuffled_raw_grouped=df_shuffled_raw_grouped\n",
+ " self.sample_number=sample_number\n",
+ " self.subset_components=subset_components\n",
+ " self.train = None\n",
+ " self.test = None\n",
+ " self.train_shuffle = None\n",
+ " self.get_motif()\n",
+ "\n",
+ " \n",
+ " def get_motif(self):\n",
+ " \"\"\"\n",
+ " Fetch the motifs and generate fastas for train, test and shuffled.\n",
+ " \"\"\"\n",
+ " self.train = self.generate_motifs_and_fastas(self.df_train_raw_grouped, 'train')\n",
+ " self.test = self.generate_motifs_and_fastas(self.df_test_raw_grouped, 'test')\n",
+ " self.train_shuffle = self.generate_motifs_and_fastas(self.df_shuffled_raw_grouped,'train_shuffle')\n",
+ " \n",
+ " \n",
+ " def generate_motifs_and_fastas(self, df, name):\n",
+ " \"\"\"\n",
+ " Generate a dictionary containing:\n",
+ " 1. Fasta saved.\n",
+ " 2. Motifs.\n",
+ " 3. Motifs per component.\n",
+ " 4. Dataset.\n",
+ " \"\"\"\n",
+ " print ('Generating Fasta and Motis:', name)\n",
+ " print ('---' * 10)\n",
+ " fasta_saved = self.save_fasta(df, f\"{name}_{self.sample_number}_{'_'.join([str(c) for c in self.subset_components])}\")\n",
+ " print('Generating Motifs (all seqs)')\n",
+ " motif_all_components = self.motifs_from_fasta(fasta_saved)\n",
+ " print('Generating Motifs per component')\n",
+ " train_comp_motifs_dict = self.generate_motifs_components(df)\n",
+ " \n",
+ " return {'fasta_name':fasta_saved ,\n",
+ " 'motifs': motif_all_components , \n",
+ " 'motifs_per_components_dict':train_comp_motifs_dict ,\n",
+ " 'dataset': df}\n",
+ " \n",
+ " def motifs_from_fasta(self, fasta):\n",
+ " \"\"\"\n",
+ " Extract motifs from fasta.\n",
+ " \"\"\"\n",
+ " print ('Computing Motifs....')\n",
+ " !gimme scan $fasta -p JASPAR2020_vertebrates -g hg38 > train_results_motifs.bed\n",
+ " df_results_seq_guime = pd.read_csv('train_results_motifs.bed', sep='\\t', skiprows=5, header=None)\n",
+ " df_results_seq_guime['motifs'] = df_results_seq_guime[8].apply(lambda x: x.split( 'motif_name \"' )[1].split('\"')[0] )\n",
+ "\n",
+ " df_results_seq_guime[0] = df_results_seq_guime[0].apply(lambda x : '_'.join( x.split('_')[:-1]) )\n",
+ " df_results_seq_guime_count_out = df_results_seq_guime[[0,'motifs']].drop_duplicates().groupby('motifs').count()\n",
+ " return df_results_seq_guime_count_out\n",
+ "\n",
+ " def save_fasta(self, df , name_fasta):\n",
+ " \"\"\"\n",
+ " Saving fasta file.\n",
+ " \"\"\"\n",
+ " fasta_final_name = name_fasta + '.fasta'\n",
+ " save_fasta_file= open(fasta_final_name, 'w')\n",
+ " write_fasta_component = '\\n'.join(df[['Unnamed: 0', 'raw_sequence', 'component']].apply(lambda x : f'>{x[0]}_component_{x[2]}\\n{x[1]}', axis=1).values.tolist())\n",
+ " save_fasta_file.write(write_fasta_component)\n",
+ " save_fasta_file.close()\n",
+ " return fasta_final_name\n",
+ " \n",
+ " def generate_motifs_components(self, df):\n",
+ " \"\"\"\n",
+ " Generating a dictionary with motif components.\n",
+ " \"\"\"\n",
+ " final_comp_values = {}\n",
+ " for comp,v_comp in df.groupby('component'):\n",
+ " print (comp)\n",
+ " name_c_fasta = self.save_fasta(v_comp, 'temp_component')\n",
+ " final_comp_values[comp] = self.motifs_from_fasta(name_c_fasta)\n",
+ " return final_comp_values\n"
+ ],
+ "metadata": {
+ "id": "LFmNkscf0ruL"
+ },
+ "id": "LFmNkscf0ruL",
+ "execution_count": 14,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Note that this .csv dataset should be downloaded from here: !wget https://www.dropbox.com/s/db6up7c0d4jwdp4/train_all_classifier_WM20220916.csv.gz?dl=2\n",
+ "# And then potentially saved locally (in gc drive)\n",
+ "raw_data = DataLoading(\"train_all_classifier_WM20220916.csv\", subset_components=[3,8,12,15])\n",
+ "preprocessed_data= DataPreprocessing(raw_data.df_train_raw_grouped, raw_data.df_test_raw_grouped , raw_data.df_shuffled_raw_grouped, subset_components=[3,8,12,15], sample_number=raw_data.sample_number)"
+ ],
+ "metadata": {
+ "id": "AFKLs-Jc8trL",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "outputId": "a0accd43-6789-4a9e-ac3e-3628ecf35c20"
+ },
+ "id": "AFKLs-Jc8trL",
+ "execution_count": 15,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Subseting...\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "3 | \n", + "0.000637 | \n", + "0.001032 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "3 | \n", + "0.001273 | \n", + "0.001032 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.000637 | \n", + "0.000688 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "7 | \n", + "0.001592 | \n", + "0.002408 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "6 | \n", + "0.001273 | \n", + "0.002064 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 849 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "9 | \n", + "0.002547 | \n", + "0.003096 | \n", + "
| 850 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "4 | \n", + "0.001592 | \n", + "0.001376 | \n", + "
| 851 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "3 | \n", + "0.001910 | \n", + "0.001032 | \n", + "
| 852 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000637 | \n", + "0.000344 | \n", + "
| 853 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "2 | \n", + "0.001273 | \n", + "0.000688 | \n", + "
854 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "1 | \n", + "0.000636 | \n", + "0.000324 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "1 | \n", + "0.001271 | \n", + "0.000324 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "6 | \n", + "0.000636 | \n", + "0.001945 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "4 | \n", + "0.001589 | \n", + "0.001297 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "5 | \n", + "0.001271 | \n", + "0.001621 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 854 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "1 | \n", + "0.002543 | \n", + "0.000324 | \n", + "
| 855 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "10 | \n", + "0.001589 | \n", + "0.003241 | \n", + "
| 856 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "5 | \n", + "0.001907 | \n", + "0.001621 | \n", + "
| 857 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000636 | \n", + "0.000324 | \n", + "
| 858 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "1 | \n", + "0.001271 | \n", + "0.000324 | \n", + "
859 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "4 | \n", + "0.000636 | \n", + "0.001536 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "4 | \n", + "0.001272 | \n", + "0.001536 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.000636 | \n", + "0.000768 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "3 | \n", + "0.001590 | \n", + "0.001152 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "2 | \n", + "0.001272 | \n", + "0.000768 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 853 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "6 | \n", + "0.002544 | \n", + "0.002304 | \n", + "
| 854 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "7 | \n", + "0.001590 | \n", + "0.002688 | \n", + "
| 855 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "1 | \n", + "0.001908 | \n", + "0.000384 | \n", + "
| 856 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000636 | \n", + "0.000384 | \n", + "
| 857 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "3 | \n", + "0.001272 | \n", + "0.001152 | \n", + "
858 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "5 | \n", + "0.000636 | \n", + "0.001777 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "3 | \n", + "0.001271 | \n", + "0.001066 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "4 | \n", + "0.000636 | \n", + "0.001421 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "3 | \n", + "0.001589 | \n", + "0.001066 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "1 | \n", + "0.001271 | \n", + "0.000355 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 855 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "2 | \n", + "0.002542 | \n", + "0.000711 | \n", + "
| 856 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "3 | \n", + "0.001589 | \n", + "0.001066 | \n", + "
| 857 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "2 | \n", + "0.001907 | \n", + "0.000711 | \n", + "
| 858 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "3 | \n", + "0.000636 | \n", + "0.001066 | \n", + "
| 859 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "8 | \n", + "0.001271 | \n", + "0.002843 | \n", + "
860 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "3 | \n", + "0.001021 | \n", + "0.001032 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000681 | \n", + "0.001032 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.000681 | \n", + "0.000688 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "7 | \n", + "0.001021 | \n", + "0.002408 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "6 | \n", + "0.001702 | \n", + "0.002064 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 849 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "9 | \n", + "0.000681 | \n", + "0.003096 | \n", + "
| 850 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "4 | \n", + "0.002383 | \n", + "0.001376 | \n", + "
| 851 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "3 | \n", + "0.002723 | \n", + "0.001032 | \n", + "
| 852 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "1 | \n", + "0.000340 | \n", + "0.000344 | \n", + "
| 853 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "2 | \n", + "0.000340 | \n", + "0.000688 | \n", + "
854 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "1 | \n", + "0.001024 | \n", + "0.000326 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000683 | \n", + "0.000326 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "6 | \n", + "0.000683 | \n", + "0.001953 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "4 | \n", + "0.001024 | \n", + "0.001302 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "5 | \n", + "0.001706 | \n", + "0.001628 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 841 | \n", + "UN0238.1_ZNF823 | \n", + "1 | \n", + "1 | \n", + "0.000341 | \n", + "0.000326 | \n", + "
| 842 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "1 | \n", + "0.000683 | \n", + "0.000326 | \n", + "
| 843 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "10 | \n", + "0.002389 | \n", + "0.003255 | \n", + "
| 844 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "5 | \n", + "0.002730 | \n", + "0.001628 | \n", + "
| 845 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "1 | \n", + "0.000341 | \n", + "0.000326 | \n", + "
846 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "4 | \n", + "0.001022 | \n", + "0.001541 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "4 | \n", + "0.000682 | \n", + "0.001541 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.000682 | \n", + "0.000770 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "3 | \n", + "0.001022 | \n", + "0.001156 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "2 | \n", + "0.001704 | \n", + "0.000770 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 845 | \n", + "UN0238.1_ZNF823 | \n", + "1 | \n", + "2 | \n", + "0.000341 | \n", + "0.000770 | \n", + "
| 846 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "6 | \n", + "0.000682 | \n", + "0.002311 | \n", + "
| 847 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "7 | \n", + "0.002386 | \n", + "0.002696 | \n", + "
| 848 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "1 | \n", + "0.002727 | \n", + "0.000385 | \n", + "
| 849 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "3 | \n", + "0.000341 | \n", + "0.001156 | \n", + "
850 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "5 | \n", + "0.001019 | \n", + "0.001777 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000679 | \n", + "0.001066 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "4 | \n", + "0.000679 | \n", + "0.001421 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "3 | \n", + "0.001019 | \n", + "0.001066 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "1 | \n", + "0.001698 | \n", + "0.000355 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 855 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "2 | \n", + "0.000679 | \n", + "0.000711 | \n", + "
| 856 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "3 | \n", + "0.002378 | \n", + "0.001066 | \n", + "
| 857 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "2 | \n", + "0.002717 | \n", + "0.000711 | \n", + "
| 858 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "3 | \n", + "0.000340 | \n", + "0.001066 | \n", + "
| 859 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "8 | \n", + "0.000340 | \n", + "0.002843 | \n", + "
860 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "3 | \n", + "0.001174 | \n", + "0.001033 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000783 | \n", + "0.001033 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "2 | \n", + "0.002348 | \n", + "0.000689 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "7 | \n", + "0.001566 | \n", + "0.002411 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "6 | \n", + "0.000783 | \n", + "0.002067 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 845 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "9 | \n", + "0.003131 | \n", + "0.003100 | \n", + "
| 846 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "4 | \n", + "0.004305 | \n", + "0.001378 | \n", + "
| 847 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "3 | \n", + "0.000391 | \n", + "0.001033 | \n", + "
| 848 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "1 | \n", + "0.000391 | \n", + "0.000344 | \n", + "
| 849 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "2 | \n", + "0.001957 | \n", + "0.000689 | \n", + "
850 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "1 | \n", + "0.001171 | \n", + "0.000324 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000781 | \n", + "0.000324 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "6 | \n", + "0.002343 | \n", + "0.001947 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "4 | \n", + "0.001562 | \n", + "0.001298 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "5 | \n", + "0.000781 | \n", + "0.001622 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 851 | \n", + "UN0238.1_ZNF823 | \n", + "2 | \n", + "1 | \n", + "0.000781 | \n", + "0.000324 | \n", + "
| 852 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "1 | \n", + "0.003124 | \n", + "0.000324 | \n", + "
| 853 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "10 | \n", + "0.004295 | \n", + "0.003245 | \n", + "
| 854 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "5 | \n", + "0.000390 | \n", + "0.001622 | \n", + "
| 855 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "1 | \n", + "0.001952 | \n", + "0.000324 | \n", + "
856 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "4 | \n", + "0.001182 | \n", + "0.001551 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "4 | \n", + "0.000788 | \n", + "0.001551 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "2 | \n", + "0.002364 | \n", + "0.000775 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "3 | \n", + "0.001576 | \n", + "0.001163 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "2 | \n", + "0.000788 | \n", + "0.000775 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 828 | \n", + "UN0238.1_ZNF823 | \n", + "2 | \n", + "2 | \n", + "0.000788 | \n", + "0.000775 | \n", + "
| 829 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "6 | \n", + "0.003152 | \n", + "0.002326 | \n", + "
| 830 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "7 | \n", + "0.004334 | \n", + "0.002714 | \n", + "
| 831 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000394 | \n", + "0.000388 | \n", + "
| 832 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "3 | \n", + "0.001970 | \n", + "0.001163 | \n", + "
833 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "5 | \n", + "0.001170 | \n", + "0.001777 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000780 | \n", + "0.001066 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "4 | \n", + "0.002339 | \n", + "0.001421 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "3 | \n", + "0.001559 | \n", + "0.001066 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "1 | \n", + "0.000780 | \n", + "0.000355 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 855 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "2 | \n", + "0.003119 | \n", + "0.000711 | \n", + "
| 856 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "3 | \n", + "0.004288 | \n", + "0.001066 | \n", + "
| 857 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "2 | \n", + "0.000390 | \n", + "0.000711 | \n", + "
| 858 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "3 | \n", + "0.000390 | \n", + "0.001066 | \n", + "
| 859 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "8 | \n", + "0.001949 | \n", + "0.002843 | \n", + "
860 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "3 | \n", + "0.002030 | \n", + "0.001030 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000677 | \n", + "0.001030 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "2 | \n", + "0.001015 | \n", + "0.000687 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "7 | \n", + "0.001353 | \n", + "0.002403 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "6 | \n", + "0.001353 | \n", + "0.002060 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 855 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "9 | \n", + "0.001015 | \n", + "0.003090 | \n", + "
| 856 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "4 | \n", + "0.002030 | \n", + "0.001373 | \n", + "
| 857 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "3 | \n", + "0.000338 | \n", + "0.001030 | \n", + "
| 858 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000677 | \n", + "0.000343 | \n", + "
| 859 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "2 | \n", + "0.003383 | \n", + "0.000687 | \n", + "
860 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "1 | \n", + "0.002026 | \n", + "0.000324 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000675 | \n", + "0.000324 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "6 | \n", + "0.001013 | \n", + "0.001941 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "4 | \n", + "0.001351 | \n", + "0.001294 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "5 | \n", + "0.001351 | \n", + "0.001618 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 860 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "1 | \n", + "0.001013 | \n", + "0.000324 | \n", + "
| 861 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "10 | \n", + "0.002026 | \n", + "0.003235 | \n", + "
| 862 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "5 | \n", + "0.000338 | \n", + "0.001618 | \n", + "
| 863 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000675 | \n", + "0.000324 | \n", + "
| 864 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "1 | \n", + "0.003377 | \n", + "0.000324 | \n", + "
865 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "4 | \n", + "0.002027 | \n", + "0.001533 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "4 | \n", + "0.000676 | \n", + "0.001533 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "2 | \n", + "0.001014 | \n", + "0.000766 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "3 | \n", + "0.001351 | \n", + "0.001149 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "2 | \n", + "0.001351 | \n", + "0.000766 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 859 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "6 | \n", + "0.001014 | \n", + "0.002299 | \n", + "
| 860 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "7 | \n", + "0.002027 | \n", + "0.002682 | \n", + "
| 861 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000338 | \n", + "0.000383 | \n", + "
| 862 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000676 | \n", + "0.000383 | \n", + "
| 863 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "3 | \n", + "0.003378 | \n", + "0.001149 | \n", + "
864 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "5 | \n", + "0.002029 | \n", + "0.001776 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000676 | \n", + "0.001066 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "4 | \n", + "0.001015 | \n", + "0.001421 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "3 | \n", + "0.001353 | \n", + "0.001066 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "1 | \n", + "0.001353 | \n", + "0.000355 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 856 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "2 | \n", + "0.001015 | \n", + "0.000710 | \n", + "
| 857 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "3 | \n", + "0.002029 | \n", + "0.001066 | \n", + "
| 858 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "2 | \n", + "0.000338 | \n", + "0.000710 | \n", + "
| 859 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "3 | \n", + "0.000676 | \n", + "0.001066 | \n", + "
| 860 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "8 | \n", + "0.003382 | \n", + "0.002842 | \n", + "
861 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "3 | \n", + "0.000637 | \n", + "0.001032 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "3 | \n", + "0.001273 | \n", + "0.001032 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.000637 | \n", + "0.000688 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "7 | \n", + "0.001592 | \n", + "0.002408 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "6 | \n", + "0.001273 | \n", + "0.002064 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 849 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "9 | \n", + "0.002547 | \n", + "0.003096 | \n", + "
| 850 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "4 | \n", + "0.001592 | \n", + "0.001376 | \n", + "
| 851 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "3 | \n", + "0.001910 | \n", + "0.001032 | \n", + "
| 852 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000637 | \n", + "0.000344 | \n", + "
| 853 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "2 | \n", + "0.001273 | \n", + "0.000688 | \n", + "
854 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "1 | \n", + "0.000636 | \n", + "0.000324 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "1 | \n", + "0.001271 | \n", + "0.000324 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "6 | \n", + "0.000636 | \n", + "0.001945 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "4 | \n", + "0.001589 | \n", + "0.001297 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "5 | \n", + "0.001271 | \n", + "0.001621 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 854 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "1 | \n", + "0.002543 | \n", + "0.000324 | \n", + "
| 855 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "10 | \n", + "0.001589 | \n", + "0.003241 | \n", + "
| 856 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "5 | \n", + "0.001907 | \n", + "0.001621 | \n", + "
| 857 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000636 | \n", + "0.000324 | \n", + "
| 858 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "1 | \n", + "0.001271 | \n", + "0.000324 | \n", + "
859 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "4 | \n", + "0.000636 | \n", + "0.001536 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "4 | \n", + "0.001272 | \n", + "0.001536 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.000636 | \n", + "0.000768 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "3 | \n", + "0.001590 | \n", + "0.001152 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "2 | \n", + "0.001272 | \n", + "0.000768 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 853 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "6 | \n", + "0.002544 | \n", + "0.002304 | \n", + "
| 854 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "7 | \n", + "0.001590 | \n", + "0.002688 | \n", + "
| 855 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "1 | \n", + "0.001908 | \n", + "0.000384 | \n", + "
| 856 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000636 | \n", + "0.000384 | \n", + "
| 857 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "3 | \n", + "0.001272 | \n", + "0.001152 | \n", + "
858 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "5 | \n", + "0.000636 | \n", + "0.001777 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "3 | \n", + "0.001271 | \n", + "0.001066 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "4 | \n", + "0.000636 | \n", + "0.001421 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "3 | \n", + "0.001589 | \n", + "0.001066 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "1 | \n", + "0.001271 | \n", + "0.000355 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 855 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "2 | \n", + "0.002542 | \n", + "0.000711 | \n", + "
| 856 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "3 | \n", + "0.001589 | \n", + "0.001066 | \n", + "
| 857 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "2 | \n", + "0.001907 | \n", + "0.000711 | \n", + "
| 858 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "3 | \n", + "0.000636 | \n", + "0.001066 | \n", + "
| 859 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "8 | \n", + "0.001271 | \n", + "0.002843 | \n", + "
860 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "3 | \n", + "0.001021 | \n", + "0.001032 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000681 | \n", + "0.001032 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.000681 | \n", + "0.000688 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "7 | \n", + "0.001021 | \n", + "0.002408 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "6 | \n", + "0.001702 | \n", + "0.002064 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 849 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "9 | \n", + "0.000681 | \n", + "0.003096 | \n", + "
| 850 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "4 | \n", + "0.002383 | \n", + "0.001376 | \n", + "
| 851 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "3 | \n", + "0.002723 | \n", + "0.001032 | \n", + "
| 852 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "1 | \n", + "0.000340 | \n", + "0.000344 | \n", + "
| 853 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "2 | \n", + "0.000340 | \n", + "0.000688 | \n", + "
854 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "1 | \n", + "0.001024 | \n", + "0.000326 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000683 | \n", + "0.000326 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "6 | \n", + "0.000683 | \n", + "0.001953 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "4 | \n", + "0.001024 | \n", + "0.001302 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "5 | \n", + "0.001706 | \n", + "0.001628 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 841 | \n", + "UN0238.1_ZNF823 | \n", + "1 | \n", + "1 | \n", + "0.000341 | \n", + "0.000326 | \n", + "
| 842 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "1 | \n", + "0.000683 | \n", + "0.000326 | \n", + "
| 843 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "10 | \n", + "0.002389 | \n", + "0.003255 | \n", + "
| 844 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "5 | \n", + "0.002730 | \n", + "0.001628 | \n", + "
| 845 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "1 | \n", + "0.000341 | \n", + "0.000326 | \n", + "
846 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "4 | \n", + "0.001022 | \n", + "0.001541 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "4 | \n", + "0.000682 | \n", + "0.001541 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.000682 | \n", + "0.000770 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "3 | \n", + "0.001022 | \n", + "0.001156 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "2 | \n", + "0.001704 | \n", + "0.000770 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 845 | \n", + "UN0238.1_ZNF823 | \n", + "1 | \n", + "2 | \n", + "0.000341 | \n", + "0.000770 | \n", + "
| 846 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "6 | \n", + "0.000682 | \n", + "0.002311 | \n", + "
| 847 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "7 | \n", + "0.002386 | \n", + "0.002696 | \n", + "
| 848 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "1 | \n", + "0.002727 | \n", + "0.000385 | \n", + "
| 849 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "3 | \n", + "0.000341 | \n", + "0.001156 | \n", + "
850 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "5 | \n", + "0.001019 | \n", + "0.001777 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000679 | \n", + "0.001066 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "4 | \n", + "0.000679 | \n", + "0.001421 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "3 | \n", + "0.001019 | \n", + "0.001066 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "1 | \n", + "0.001698 | \n", + "0.000355 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 855 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "2 | \n", + "0.000679 | \n", + "0.000711 | \n", + "
| 856 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "3 | \n", + "0.002378 | \n", + "0.001066 | \n", + "
| 857 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "2 | \n", + "0.002717 | \n", + "0.000711 | \n", + "
| 858 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "3 | \n", + "0.000340 | \n", + "0.001066 | \n", + "
| 859 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "8 | \n", + "0.000340 | \n", + "0.002843 | \n", + "
860 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "3 | \n", + "0.001174 | \n", + "0.001033 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000783 | \n", + "0.001033 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "2 | \n", + "0.002348 | \n", + "0.000689 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "7 | \n", + "0.001566 | \n", + "0.002411 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "6 | \n", + "0.000783 | \n", + "0.002067 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 845 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "9 | \n", + "0.003131 | \n", + "0.003100 | \n", + "
| 846 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "4 | \n", + "0.004305 | \n", + "0.001378 | \n", + "
| 847 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "3 | \n", + "0.000391 | \n", + "0.001033 | \n", + "
| 848 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "1 | \n", + "0.000391 | \n", + "0.000344 | \n", + "
| 849 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "2 | \n", + "0.001957 | \n", + "0.000689 | \n", + "
850 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "1 | \n", + "0.001171 | \n", + "0.000324 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000781 | \n", + "0.000324 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "6 | \n", + "0.002343 | \n", + "0.001947 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "4 | \n", + "0.001562 | \n", + "0.001298 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "5 | \n", + "0.000781 | \n", + "0.001622 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 851 | \n", + "UN0238.1_ZNF823 | \n", + "2 | \n", + "1 | \n", + "0.000781 | \n", + "0.000324 | \n", + "
| 852 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "1 | \n", + "0.003124 | \n", + "0.000324 | \n", + "
| 853 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "10 | \n", + "0.004295 | \n", + "0.003245 | \n", + "
| 854 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "5 | \n", + "0.000390 | \n", + "0.001622 | \n", + "
| 855 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "1 | \n", + "0.001952 | \n", + "0.000324 | \n", + "
856 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "4 | \n", + "0.001182 | \n", + "0.001551 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "4 | \n", + "0.000788 | \n", + "0.001551 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "2 | \n", + "0.002364 | \n", + "0.000775 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "3 | \n", + "0.001576 | \n", + "0.001163 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "2 | \n", + "0.000788 | \n", + "0.000775 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 828 | \n", + "UN0238.1_ZNF823 | \n", + "2 | \n", + "2 | \n", + "0.000788 | \n", + "0.000775 | \n", + "
| 829 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "6 | \n", + "0.003152 | \n", + "0.002326 | \n", + "
| 830 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "7 | \n", + "0.004334 | \n", + "0.002714 | \n", + "
| 831 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000394 | \n", + "0.000388 | \n", + "
| 832 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "3 | \n", + "0.001970 | \n", + "0.001163 | \n", + "
833 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "5 | \n", + "0.001170 | \n", + "0.001777 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000780 | \n", + "0.001066 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "4 | \n", + "0.002339 | \n", + "0.001421 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "3 | \n", + "0.001559 | \n", + "0.001066 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "1 | \n", + "0.000780 | \n", + "0.000355 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 855 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "2 | \n", + "0.003119 | \n", + "0.000711 | \n", + "
| 856 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "3 | \n", + "0.004288 | \n", + "0.001066 | \n", + "
| 857 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "2 | \n", + "0.000390 | \n", + "0.000711 | \n", + "
| 858 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "3 | \n", + "0.000390 | \n", + "0.001066 | \n", + "
| 859 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "8 | \n", + "0.001949 | \n", + "0.002843 | \n", + "
860 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "3 | \n", + "0.002030 | \n", + "0.001030 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000677 | \n", + "0.001030 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "2 | \n", + "0.001015 | \n", + "0.000687 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "7 | \n", + "0.001353 | \n", + "0.002403 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "6 | \n", + "0.001353 | \n", + "0.002060 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 855 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "9 | \n", + "0.001015 | \n", + "0.003090 | \n", + "
| 856 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "4 | \n", + "0.002030 | \n", + "0.001373 | \n", + "
| 857 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "3 | \n", + "0.000338 | \n", + "0.001030 | \n", + "
| 858 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000677 | \n", + "0.000343 | \n", + "
| 859 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "2 | \n", + "0.003383 | \n", + "0.000687 | \n", + "
860 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "1 | \n", + "0.002026 | \n", + "0.000324 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000675 | \n", + "0.000324 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "6 | \n", + "0.001013 | \n", + "0.001941 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "4 | \n", + "0.001351 | \n", + "0.001294 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "5 | \n", + "0.001351 | \n", + "0.001618 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 860 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "1 | \n", + "0.001013 | \n", + "0.000324 | \n", + "
| 861 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "10 | \n", + "0.002026 | \n", + "0.003235 | \n", + "
| 862 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "5 | \n", + "0.000338 | \n", + "0.001618 | \n", + "
| 863 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000675 | \n", + "0.000324 | \n", + "
| 864 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "1 | \n", + "0.003377 | \n", + "0.000324 | \n", + "
865 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "4 | \n", + "0.002027 | \n", + "0.001533 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "4 | \n", + "0.000676 | \n", + "0.001533 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "2 | \n", + "0.001014 | \n", + "0.000766 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "3 | \n", + "0.001351 | \n", + "0.001149 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "2 | \n", + "0.001351 | \n", + "0.000766 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 859 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "6 | \n", + "0.001014 | \n", + "0.002299 | \n", + "
| 860 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "7 | \n", + "0.002027 | \n", + "0.002682 | \n", + "
| 861 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000338 | \n", + "0.000383 | \n", + "
| 862 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000676 | \n", + "0.000383 | \n", + "
| 863 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "3 | \n", + "0.003378 | \n", + "0.001149 | \n", + "
864 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "5 | \n", + "0.002029 | \n", + "0.001776 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000676 | \n", + "0.001066 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "4 | \n", + "0.001015 | \n", + "0.001421 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "3 | \n", + "0.001353 | \n", + "0.001066 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "1 | \n", + "0.001353 | \n", + "0.000355 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 856 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "2 | \n", + "0.001015 | \n", + "0.000710 | \n", + "
| 857 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "3 | \n", + "0.002029 | \n", + "0.001066 | \n", + "
| 858 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "2 | \n", + "0.000338 | \n", + "0.000710 | \n", + "
| 859 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "3 | \n", + "0.000676 | \n", + "0.001066 | \n", + "
| 860 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "8 | \n", + "0.003382 | \n", + "0.002842 | \n", + "
861 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "2 | \n", + "0.000637 | \n", + "0.000715 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "1 | \n", + "0.001274 | \n", + "0.000357 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "8 | \n", + "0.000637 | \n", + "0.002858 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "6 | \n", + "0.001593 | \n", + "0.002144 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "6 | \n", + "0.001274 | \n", + "0.002144 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 847 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "6 | \n", + "0.002549 | \n", + "0.002144 | \n", + "
| 848 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "4 | \n", + "0.001593 | \n", + "0.001429 | \n", + "
| 849 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "1 | \n", + "0.001911 | \n", + "0.000357 | \n", + "
| 850 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000637 | \n", + "0.000357 | \n", + "
| 851 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "7 | \n", + "0.001274 | \n", + "0.002501 | \n", + "
852 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "1 | \n", + "0.000636 | \n", + "0.000394 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "3 | \n", + "0.001272 | \n", + "0.001181 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "4 | \n", + "0.000636 | \n", + "0.001574 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "7 | \n", + "0.001590 | \n", + "0.002755 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "1 | \n", + "0.001272 | \n", + "0.000394 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 853 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "2 | \n", + "0.002544 | \n", + "0.000787 | \n", + "
| 854 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "7 | \n", + "0.001590 | \n", + "0.002755 | \n", + "
| 855 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "5 | \n", + "0.001908 | \n", + "0.001968 | \n", + "
| 856 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "6 | \n", + "0.000636 | \n", + "0.002361 | \n", + "
| 857 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "1 | \n", + "0.001272 | \n", + "0.000394 | \n", + "
858 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "1 | \n", + "0.000639 | \n", + "0.000383 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "1 | \n", + "0.001278 | \n", + "0.000383 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "6 | \n", + "0.000639 | \n", + "0.002299 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "1 | \n", + "0.001597 | \n", + "0.000383 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "2 | \n", + "0.001278 | \n", + "0.000766 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 838 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "1 | \n", + "0.002556 | \n", + "0.000383 | \n", + "
| 839 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "7 | \n", + "0.001597 | \n", + "0.002682 | \n", + "
| 840 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "8 | \n", + "0.001917 | \n", + "0.003065 | \n", + "
| 841 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "2 | \n", + "0.000639 | \n", + "0.000766 | \n", + "
| 842 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "2 | \n", + "0.001278 | \n", + "0.000766 | \n", + "
843 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "2 | \n", + "1 | \n", + "0.000637 | \n", + "0.000369 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "4 | \n", + "1 | \n", + "0.001273 | \n", + "0.000369 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "5 | \n", + "0.000637 | \n", + "0.001847 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "5 | \n", + "1 | \n", + "0.001591 | \n", + "0.000369 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "4 | \n", + "0.001273 | \n", + "0.001478 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 850 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "4 | \n", + "0.002546 | \n", + "0.001478 | \n", + "
| 851 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "3 | \n", + "0.001591 | \n", + "0.001108 | \n", + "
| 852 | \n", + "MA1505.1_HOXC8 | \n", + "6 | \n", + "3 | \n", + "0.001910 | \n", + "0.001108 | \n", + "
| 853 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000637 | \n", + "0.000369 | \n", + "
| 854 | \n", + "MA0641.1_ELF4 | \n", + "4 | \n", + "6 | \n", + "0.001273 | \n", + "0.002216 | \n", + "
855 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "2 | \n", + "0.001024 | \n", + "0.000716 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000683 | \n", + "0.000358 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "8 | \n", + "0.000683 | \n", + "0.002865 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "6 | \n", + "0.001024 | \n", + "0.002149 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "6 | \n", + "0.001707 | \n", + "0.002149 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 840 | \n", + "MA1484.1_ETS2 | \n", + "1 | \n", + "9 | \n", + "0.000341 | \n", + "0.003223 | \n", + "
| 841 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "6 | \n", + "0.000683 | \n", + "0.002149 | \n", + "
| 842 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "4 | \n", + "0.002390 | \n", + "0.001433 | \n", + "
| 843 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "1 | \n", + "0.002731 | \n", + "0.000358 | \n", + "
| 844 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "7 | \n", + "0.000341 | \n", + "0.002507 | \n", + "
845 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "1 | \n", + "0.001028 | \n", + "0.000397 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000685 | \n", + "0.001192 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "4 | \n", + "0.000685 | \n", + "0.001589 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "7 | \n", + "0.001028 | \n", + "0.002781 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "1 | \n", + "0.001714 | \n", + "0.000397 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 829 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "2 | \n", + "0.000685 | \n", + "0.000795 | \n", + "
| 830 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "7 | \n", + "0.002399 | \n", + "0.002781 | \n", + "
| 831 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "5 | \n", + "0.002742 | \n", + "0.001986 | \n", + "
| 832 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "6 | \n", + "0.000343 | \n", + "0.002384 | \n", + "
| 833 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "1 | \n", + "0.000343 | \n", + "0.000397 | \n", + "
834 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "1 | \n", + "0.001029 | \n", + "0.000385 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000686 | \n", + "0.000385 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "6 | \n", + "0.000686 | \n", + "0.002309 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "1 | \n", + "0.001029 | \n", + "0.000385 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "2 | \n", + "0.001715 | \n", + "0.000770 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 827 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "1 | \n", + "0.000686 | \n", + "0.000385 | \n", + "
| 828 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "7 | \n", + "0.002401 | \n", + "0.002693 | \n", + "
| 829 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "8 | \n", + "0.002743 | \n", + "0.003078 | \n", + "
| 830 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "2 | \n", + "0.000343 | \n", + "0.000770 | \n", + "
| 831 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "2 | \n", + "0.000343 | \n", + "0.000770 | \n", + "
832 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "1 | \n", + "0.001029 | \n", + "0.000373 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000686 | \n", + "0.000373 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "5 | \n", + "0.000686 | \n", + "0.001863 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "1 | \n", + "0.001029 | \n", + "0.000373 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "5 | \n", + "4 | \n", + "0.001715 | \n", + "0.001490 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 827 | \n", + "MA1491.1_GLI3 | \n", + "2 | \n", + "4 | \n", + "0.000686 | \n", + "0.001490 | \n", + "
| 828 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "3 | \n", + "0.002401 | \n", + "0.001118 | \n", + "
| 829 | \n", + "MA1505.1_HOXC8 | \n", + "8 | \n", + "3 | \n", + "0.002743 | \n", + "0.001118 | \n", + "
| 830 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "1 | \n", + "0.000343 | \n", + "0.000373 | \n", + "
| 831 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "6 | \n", + "0.000343 | \n", + "0.002235 | \n", + "
832 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "2 | \n", + "0.001182 | \n", + "0.000719 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000788 | \n", + "0.000360 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "8 | \n", + "0.002363 | \n", + "0.002877 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "6 | \n", + "0.001575 | \n", + "0.002157 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "6 | \n", + "0.000788 | \n", + "0.002157 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 829 | \n", + "UN0238.1_ZNF823 | \n", + "2 | \n", + "1 | \n", + "0.000788 | \n", + "0.000360 | \n", + "
| 830 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "6 | \n", + "0.003151 | \n", + "0.002157 | \n", + "
| 831 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "4 | \n", + "0.004332 | \n", + "0.001438 | \n", + "
| 832 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000394 | \n", + "0.000360 | \n", + "
| 833 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "7 | \n", + "0.001969 | \n", + "0.002517 | \n", + "
834 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "1 | \n", + "0.001171 | \n", + "0.000394 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000781 | \n", + "0.001182 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "4 | \n", + "0.002343 | \n", + "0.001575 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "7 | \n", + "0.001562 | \n", + "0.002757 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "1 | \n", + "0.000781 | \n", + "0.000394 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 851 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "2 | \n", + "0.003124 | \n", + "0.000788 | \n", + "
| 852 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "7 | \n", + "0.004295 | \n", + "0.002757 | \n", + "
| 853 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "5 | \n", + "0.000390 | \n", + "0.001969 | \n", + "
| 854 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "6 | \n", + "0.000390 | \n", + "0.002363 | \n", + "
| 855 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "1 | \n", + "0.001952 | \n", + "0.000394 | \n", + "
856 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "1 | \n", + "0.001175 | \n", + "0.000382 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000783 | \n", + "0.000382 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "6 | \n", + "0.002350 | \n", + "0.002294 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "1 | \n", + "0.001567 | \n", + "0.000382 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "2 | \n", + "0.000783 | \n", + "0.000765 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 843 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "1 | \n", + "0.003134 | \n", + "0.000382 | \n", + "
| 844 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "7 | \n", + "0.004309 | \n", + "0.002677 | \n", + "
| 845 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "8 | \n", + "0.000392 | \n", + "0.003059 | \n", + "
| 846 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "2 | \n", + "0.000392 | \n", + "0.000765 | \n", + "
| 847 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "2 | \n", + "0.001958 | \n", + "0.000765 | \n", + "
848 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "1 | \n", + "0.001173 | \n", + "0.000370 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000782 | \n", + "0.000370 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "6 | \n", + "5 | \n", + "0.002346 | \n", + "0.001849 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "1 | \n", + "0.001564 | \n", + "0.000370 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "4 | \n", + "0.000782 | \n", + "0.001479 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 847 | \n", + "MA1491.1_GLI3 | \n", + "8 | \n", + "4 | \n", + "0.003129 | \n", + "0.001479 | \n", + "
| 848 | \n", + "MA0775.1_MEIS3 | \n", + "11 | \n", + "3 | \n", + "0.004302 | \n", + "0.001109 | \n", + "
| 849 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "3 | \n", + "0.000391 | \n", + "0.001109 | \n", + "
| 850 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "1 | \n", + "0.000391 | \n", + "0.000370 | \n", + "
| 851 | \n", + "MA0641.1_ELF4 | \n", + "5 | \n", + "6 | \n", + "0.001955 | \n", + "0.002219 | \n", + "
852 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "2 | \n", + "0.002029 | \n", + "0.000712 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000676 | \n", + "0.000356 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "8 | \n", + "0.001015 | \n", + "0.002849 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "6 | \n", + "0.001353 | \n", + "0.002137 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "6 | \n", + "0.001353 | \n", + "0.002137 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 856 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "6 | \n", + "0.001015 | \n", + "0.002137 | \n", + "
| 857 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "4 | \n", + "0.002029 | \n", + "0.001425 | \n", + "
| 858 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000338 | \n", + "0.000356 | \n", + "
| 859 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000676 | \n", + "0.000356 | \n", + "
| 860 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "7 | \n", + "0.003382 | \n", + "0.002493 | \n", + "
861 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "1 | \n", + "0.002027 | \n", + "0.000393 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "3 | \n", + "0.000676 | \n", + "0.001178 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "4 | \n", + "0.001014 | \n", + "0.001570 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "7 | \n", + "0.001351 | \n", + "0.002748 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "1 | \n", + "0.001351 | \n", + "0.000393 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 859 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "2 | \n", + "0.001014 | \n", + "0.000785 | \n", + "
| 860 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "7 | \n", + "0.002027 | \n", + "0.002748 | \n", + "
| 861 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "5 | \n", + "0.000338 | \n", + "0.001963 | \n", + "
| 862 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "6 | \n", + "0.000676 | \n", + "0.002356 | \n", + "
| 863 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "1 | \n", + "0.003378 | \n", + "0.000393 | \n", + "
864 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "1 | \n", + "0.002033 | \n", + "0.000381 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000678 | \n", + "0.000381 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "6 | \n", + "0.001016 | \n", + "0.002287 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "1 | \n", + "0.001355 | \n", + "0.000381 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "2 | \n", + "0.001355 | \n", + "0.000762 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 851 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "1 | \n", + "0.001016 | \n", + "0.000381 | \n", + "
| 852 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "7 | \n", + "0.002033 | \n", + "0.002669 | \n", + "
| 853 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "8 | \n", + "0.000339 | \n", + "0.003050 | \n", + "
| 854 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "2 | \n", + "0.000678 | \n", + "0.000762 | \n", + "
| 855 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "2 | \n", + "0.003388 | \n", + "0.000762 | \n", + "
856 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "6 | \n", + "1 | \n", + "0.002031 | \n", + "0.000369 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "1 | \n", + "0.000677 | \n", + "0.000369 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "5 | \n", + "0.001016 | \n", + "0.001845 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "4 | \n", + "1 | \n", + "0.001354 | \n", + "0.000369 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "4 | \n", + "4 | \n", + "0.001354 | \n", + "0.001476 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 853 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "4 | \n", + "0.001016 | \n", + "0.001476 | \n", + "
| 854 | \n", + "MA0775.1_MEIS3 | \n", + "6 | \n", + "3 | \n", + "0.002031 | \n", + "0.001107 | \n", + "
| 855 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "3 | \n", + "0.000339 | \n", + "0.001107 | \n", + "
| 856 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.000677 | \n", + "0.000369 | \n", + "
| 857 | \n", + "MA0641.1_ELF4 | \n", + "10 | \n", + "6 | \n", + "0.003385 | \n", + "0.002214 | \n", + "
858 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "2 | \n", + "0.001523 | \n", + "0.000642 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "3 | \n", + "4 | \n", + "0.001523 | \n", + "0.001285 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "2 | \n", + "0.001523 | \n", + "0.000642 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "5 | \n", + "0.001523 | \n", + "0.001606 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "1 | \n", + "4 | \n", + "0.000508 | \n", + "0.001285 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 821 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "8 | \n", + "0.001523 | \n", + "0.002570 | \n", + "
| 822 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "5 | \n", + "0.003553 | \n", + "0.001606 | \n", + "
| 823 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "6 | \n", + "0.000508 | \n", + "0.001927 | \n", + "
| 824 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "2 | \n", + "0.001015 | \n", + "0.000642 | \n", + "
| 825 | \n", + "MA0641.1_ELF4 | \n", + "2 | \n", + "4 | \n", + "0.001015 | \n", + "0.001285 | \n", + "
826 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "3 | \n", + "0.001513 | \n", + "0.001026 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "3 | \n", + "2 | \n", + "0.001513 | \n", + "0.000684 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "2 | \n", + "0.001513 | \n", + "0.000684 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "3 | \n", + "0.001513 | \n", + "0.001026 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "1 | \n", + "5 | \n", + "0.000504 | \n", + "0.001711 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 834 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "2 | \n", + "0.001513 | \n", + "0.000684 | \n", + "
| 835 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "7 | \n", + "0.003530 | \n", + "0.002395 | \n", + "
| 836 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "8 | \n", + "0.000504 | \n", + "0.002737 | \n", + "
| 837 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.001009 | \n", + "0.000342 | \n", + "
| 838 | \n", + "MA0641.1_ELF4 | \n", + "2 | \n", + "1 | \n", + "0.001009 | \n", + "0.000342 | \n", + "
839 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "3 | \n", + "0.001527 | \n", + "0.001188 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "3 | \n", + "2 | \n", + "0.001527 | \n", + "0.000792 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "6 | \n", + "0.001527 | \n", + "0.002376 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "4 | \n", + "0.001527 | \n", + "0.001584 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "1 | \n", + "2 | \n", + "0.000509 | \n", + "0.000792 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 815 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "8 | \n", + "0.001527 | \n", + "0.003168 | \n", + "
| 816 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "11 | \n", + "0.003564 | \n", + "0.004356 | \n", + "
| 817 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000509 | \n", + "0.000396 | \n", + "
| 818 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.001018 | \n", + "0.000396 | \n", + "
| 819 | \n", + "MA0641.1_ELF4 | \n", + "2 | \n", + "5 | \n", + "0.001018 | \n", + "0.001980 | \n", + "
820 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "3 | \n", + "6 | \n", + "0.001508 | \n", + "0.00204 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "3 | \n", + "2 | \n", + "0.001508 | \n", + "0.00068 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "3 | \n", + "3 | \n", + "0.001508 | \n", + "0.00102 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "4 | \n", + "0.001508 | \n", + "0.00136 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "1 | \n", + "4 | \n", + "0.000503 | \n", + "0.00136 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 840 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "3 | \n", + "0.001508 | \n", + "0.00102 | \n", + "
| 841 | \n", + "MA0775.1_MEIS3 | \n", + "7 | \n", + "6 | \n", + "0.003519 | \n", + "0.00204 | \n", + "
| 842 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000503 | \n", + "0.00034 | \n", + "
| 843 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "2 | \n", + "0.001006 | \n", + "0.00068 | \n", + "
| 844 | \n", + "MA0641.1_ELF4 | \n", + "2 | \n", + "10 | \n", + "0.001006 | \n", + "0.00340 | \n", + "
845 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "4 | \n", + "2 | \n", + "0.001711 | \n", + "0.000637 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "7 | \n", + "4 | \n", + "0.002994 | \n", + "0.001273 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.000855 | \n", + "0.000637 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "1 | \n", + "5 | \n", + "0.000428 | \n", + "0.001591 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "3 | \n", + "4 | \n", + "0.001283 | \n", + "0.001273 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 850 | \n", + "MA1491.1_GLI3 | \n", + "1 | \n", + "8 | \n", + "0.000428 | \n", + "0.002546 | \n", + "
| 851 | \n", + "MA0775.1_MEIS3 | \n", + "2 | \n", + "5 | \n", + "0.000855 | \n", + "0.001591 | \n", + "
| 852 | \n", + "MA1505.1_HOXC8 | \n", + "7 | \n", + "6 | \n", + "0.002994 | \n", + "0.001910 | \n", + "
| 853 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "2 | \n", + "0.000428 | \n", + "0.000637 | \n", + "
| 854 | \n", + "MA0868.2_SOX8 | \n", + "2 | \n", + "7 | \n", + "0.000855 | \n", + "0.002228 | \n", + "
855 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "4 | \n", + "3 | \n", + "0.001739 | \n", + "0.001034 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "7 | \n", + "2 | \n", + "0.003043 | \n", + "0.000689 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.000870 | \n", + "0.000689 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "1 | \n", + "3 | \n", + "0.000435 | \n", + "0.001034 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "3 | \n", + "5 | \n", + "0.001304 | \n", + "0.001724 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 812 | \n", + "MA1484.1_ETS2 | \n", + "1 | \n", + "1 | \n", + "0.000435 | \n", + "0.000345 | \n", + "
| 813 | \n", + "MA1491.1_GLI3 | \n", + "1 | \n", + "2 | \n", + "0.000435 | \n", + "0.000689 | \n", + "
| 814 | \n", + "MA0775.1_MEIS3 | \n", + "2 | \n", + "7 | \n", + "0.000870 | \n", + "0.002413 | \n", + "
| 815 | \n", + "MA1505.1_HOXC8 | \n", + "7 | \n", + "8 | \n", + "0.003043 | \n", + "0.002758 | \n", + "
| 816 | \n", + "MA0868.2_SOX8 | \n", + "2 | \n", + "6 | \n", + "0.000870 | \n", + "0.002068 | \n", + "
817 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "4 | \n", + "3 | \n", + "0.001716 | \n", + "0.001175 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "7 | \n", + "2 | \n", + "0.003003 | \n", + "0.000783 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "6 | \n", + "0.000858 | \n", + "0.002350 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "1 | \n", + "4 | \n", + "0.000429 | \n", + "0.001567 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "3 | \n", + "2 | \n", + "0.001287 | \n", + "0.000783 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 843 | \n", + "UN0238.1_ZNF823 | \n", + "1 | \n", + "2 | \n", + "0.000429 | \n", + "0.000783 | \n", + "
| 844 | \n", + "MA1491.1_GLI3 | \n", + "1 | \n", + "8 | \n", + "0.000429 | \n", + "0.003134 | \n", + "
| 845 | \n", + "MA0775.1_MEIS3 | \n", + "2 | \n", + "11 | \n", + "0.000858 | \n", + "0.004309 | \n", + "
| 846 | \n", + "MA1505.1_HOXC8 | \n", + "7 | \n", + "1 | \n", + "0.003003 | \n", + "0.000392 | \n", + "
| 847 | \n", + "MA0868.2_SOX8 | \n", + "2 | \n", + "1 | \n", + "0.000858 | \n", + "0.000392 | \n", + "
848 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "4 | \n", + "6 | \n", + "0.001709 | \n", + "0.002031 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "7 | \n", + "2 | \n", + "0.002990 | \n", + "0.000677 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "3 | \n", + "0.000854 | \n", + "0.001016 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "1 | \n", + "4 | \n", + "0.000427 | \n", + "0.001354 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "3 | \n", + "4 | \n", + "0.001282 | \n", + "0.001354 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 853 | \n", + "MA1491.1_GLI3 | \n", + "1 | \n", + "3 | \n", + "0.000427 | \n", + "0.001016 | \n", + "
| 854 | \n", + "MA0775.1_MEIS3 | \n", + "2 | \n", + "6 | \n", + "0.000854 | \n", + "0.002031 | \n", + "
| 855 | \n", + "MA1505.1_HOXC8 | \n", + "7 | \n", + "1 | \n", + "0.002990 | \n", + "0.000339 | \n", + "
| 856 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "2 | \n", + "0.000427 | \n", + "0.000677 | \n", + "
| 857 | \n", + "MA0868.2_SOX8 | \n", + "2 | \n", + "2 | \n", + "0.000854 | \n", + "0.000677 | \n", + "
858 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "1 | \n", + "2 | \n", + "0.000536 | \n", + "0.000636 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "4 | \n", + "0.001072 | \n", + "0.001271 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.001072 | \n", + "0.000636 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "1 | \n", + "5 | \n", + "0.000536 | \n", + "0.001589 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "3 | \n", + "4 | \n", + "0.001609 | \n", + "0.001271 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 855 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "8 | \n", + "0.001609 | \n", + "0.002542 | \n", + "
| 856 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "5 | \n", + "0.002681 | \n", + "0.001589 | \n", + "
| 857 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "6 | \n", + "0.000536 | \n", + "0.001907 | \n", + "
| 858 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "2 | \n", + "0.000536 | \n", + "0.000636 | \n", + "
| 859 | \n", + "MA0641.1_ELF4 | \n", + "3 | \n", + "4 | \n", + "0.001609 | \n", + "0.001271 | \n", + "
860 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "1 | \n", + "3 | \n", + "0.000541 | \n", + "0.001025 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "2 | \n", + "0.001082 | \n", + "0.000683 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "2 | \n", + "0.001082 | \n", + "0.000683 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "1 | \n", + "3 | \n", + "0.000541 | \n", + "0.001025 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "3 | \n", + "5 | \n", + "0.001622 | \n", + "0.001708 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 839 | \n", + "MA0840.1_Creb5 | \n", + "1 | \n", + "2 | \n", + "0.000541 | \n", + "0.000683 | \n", + "
| 840 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "2 | \n", + "0.001622 | \n", + "0.000683 | \n", + "
| 841 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "7 | \n", + "0.002704 | \n", + "0.002391 | \n", + "
| 842 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "8 | \n", + "0.000541 | \n", + "0.002732 | \n", + "
| 843 | \n", + "MA0641.1_ELF4 | \n", + "3 | \n", + "1 | \n", + "0.001622 | \n", + "0.000342 | \n", + "
844 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "1 | \n", + "3 | \n", + "0.000546 | \n", + "0.001184 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "2 | \n", + "0.001091 | \n", + "0.000790 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "6 | \n", + "0.001091 | \n", + "0.002369 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "1 | \n", + "4 | \n", + "0.000546 | \n", + "0.001579 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "3 | \n", + "2 | \n", + "0.001637 | \n", + "0.000790 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 823 | \n", + "MA0840.1_Creb5 | \n", + "1 | \n", + "2 | \n", + "0.000546 | \n", + "0.000790 | \n", + "
| 824 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "8 | \n", + "0.001637 | \n", + "0.003158 | \n", + "
| 825 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "11 | \n", + "0.002728 | \n", + "0.004343 | \n", + "
| 826 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000546 | \n", + "0.000395 | \n", + "
| 827 | \n", + "MA0641.1_ELF4 | \n", + "3 | \n", + "5 | \n", + "0.001637 | \n", + "0.001974 | \n", + "
828 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "1 | \n", + "6 | \n", + "0.000537 | \n", + "0.002031 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "2 | \n", + "2 | \n", + "0.001074 | \n", + "0.000677 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "2 | \n", + "3 | \n", + "0.001074 | \n", + "0.001016 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "1 | \n", + "4 | \n", + "0.000537 | \n", + "0.001354 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "3 | \n", + "4 | \n", + "0.001610 | \n", + "0.001354 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 853 | \n", + "MA1491.1_GLI3 | \n", + "3 | \n", + "3 | \n", + "0.001610 | \n", + "0.001016 | \n", + "
| 854 | \n", + "MA0775.1_MEIS3 | \n", + "5 | \n", + "6 | \n", + "0.002684 | \n", + "0.002031 | \n", + "
| 855 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000537 | \n", + "0.000339 | \n", + "
| 856 | \n", + "MA1111.1_NR2F2 | \n", + "1 | \n", + "2 | \n", + "0.000537 | \n", + "0.000677 | \n", + "
| 857 | \n", + "MA0641.1_ELF4 | \n", + "3 | \n", + "10 | \n", + "0.001610 | \n", + "0.003385 | \n", + "
858 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "5 | \n", + "2 | \n", + "0.002726 | \n", + "0.000639 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "3 | \n", + "4 | \n", + "0.001636 | \n", + "0.001278 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "1 | \n", + "2 | \n", + "0.000545 | \n", + "0.000639 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "5 | \n", + "0.001636 | \n", + "0.001598 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "4 | \n", + "0.001091 | \n", + "0.001278 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 837 | \n", + "MA1491.1_GLI3 | \n", + "6 | \n", + "8 | \n", + "0.003272 | \n", + "0.002557 | \n", + "
| 838 | \n", + "MA0775.1_MEIS3 | \n", + "2 | \n", + "5 | \n", + "0.001091 | \n", + "0.001598 | \n", + "
| 839 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "6 | \n", + "0.000545 | \n", + "0.001918 | \n", + "
| 840 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "2 | \n", + "0.001091 | \n", + "0.000639 | \n", + "
| 841 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "4 | \n", + "0.000545 | \n", + "0.001278 | \n", + "
842 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "5 | \n", + "3 | \n", + "0.002719 | \n", + "0.001024 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "3 | \n", + "2 | \n", + "0.001631 | \n", + "0.000682 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "1 | \n", + "2 | \n", + "0.000544 | \n", + "0.000682 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "3 | \n", + "0.001631 | \n", + "0.001024 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "5 | \n", + "0.001088 | \n", + "0.001706 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 842 | \n", + "MA1491.1_GLI3 | \n", + "6 | \n", + "2 | \n", + "0.003263 | \n", + "0.000682 | \n", + "
| 843 | \n", + "MA0775.1_MEIS3 | \n", + "2 | \n", + "7 | \n", + "0.001088 | \n", + "0.002388 | \n", + "
| 844 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "8 | \n", + "0.000544 | \n", + "0.002729 | \n", + "
| 845 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.001088 | \n", + "0.000341 | \n", + "
| 846 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "1 | \n", + "0.000544 | \n", + "0.000341 | \n", + "
847 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "5 | \n", + "3 | \n", + "0.002752 | \n", + "0.001186 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "3 | \n", + "2 | \n", + "0.001651 | \n", + "0.000791 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "1 | \n", + "6 | \n", + "0.000550 | \n", + "0.002372 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "4 | \n", + "0.001651 | \n", + "0.001581 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "2 | \n", + "0.001101 | \n", + "0.000791 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 820 | \n", + "MA1491.1_GLI3 | \n", + "6 | \n", + "8 | \n", + "0.003302 | \n", + "0.003162 | \n", + "
| 821 | \n", + "MA0775.1_MEIS3 | \n", + "2 | \n", + "11 | \n", + "0.001101 | \n", + "0.004348 | \n", + "
| 822 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000550 | \n", + "0.000395 | \n", + "
| 823 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "1 | \n", + "0.001101 | \n", + "0.000395 | \n", + "
| 824 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "5 | \n", + "0.000550 | \n", + "0.001976 | \n", + "
825 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1656.1_ZNF449 | \n", + "5 | \n", + "6 | \n", + "0.002706 | \n", + "0.002033 | \n", + "
| 1 | \n", + "UN0249.1_ZSCAN9 | \n", + "3 | \n", + "2 | \n", + "0.001623 | \n", + "0.000678 | \n", + "
| 2 | \n", + "MA0663.1_MLX | \n", + "1 | \n", + "3 | \n", + "0.000541 | \n", + "0.001016 | \n", + "
| 3 | \n", + "MA0160.1_NR4A2 | \n", + "3 | \n", + "4 | \n", + "0.001623 | \n", + "0.001355 | \n", + "
| 4 | \n", + "MA0910.2_HOXD8 | \n", + "2 | \n", + "4 | \n", + "0.001082 | \n", + "0.001355 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 851 | \n", + "MA1491.1_GLI3 | \n", + "6 | \n", + "3 | \n", + "0.003247 | \n", + "0.001016 | \n", + "
| 852 | \n", + "MA0775.1_MEIS3 | \n", + "2 | \n", + "6 | \n", + "0.001082 | \n", + "0.002033 | \n", + "
| 853 | \n", + "MA1505.1_HOXC8 | \n", + "1 | \n", + "1 | \n", + "0.000541 | \n", + "0.000339 | \n", + "
| 854 | \n", + "MA1111.1_NR2F2 | \n", + "2 | \n", + "2 | \n", + "0.001082 | \n", + "0.000678 | \n", + "
| 855 | \n", + "MA0641.1_ELF4 | \n", + "1 | \n", + "10 | \n", + "0.000541 | \n", + "0.003388 | \n", + "
856 rows × 5 columns
\n", + "
\n",
+ "\n",
+ "
\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "40d7bb2b-d978-45f6-b5a1-1907496eb6da",
+ "metadata": {
+ "id": "40d7bb2b-d978-45f6-b5a1-1907496eb6da"
+ },
+ "source": [
+ "# Diffusion"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "3c90f756-e04e-4b64-a11e-d2cc30f0e286",
+ "metadata": {
+ "id": "3c90f756-e04e-4b64-a11e-d2cc30f0e286"
+ },
+ "outputs": [],
+ "source": [
+ "\n",
+ "#FUNCTION CHANGED ADD CONDITIONING \n",
+ "@torch.no_grad()\n",
+ "def p_sample(model, x, classes, t, t_index):\n",
+ " betas_t = extract(betas, t, x.shape)\n",
+ " sqrt_one_minus_alphas_cumprod_t = extract(\n",
+ " sqrt_one_minus_alphas_cumprod, t, x.shape\n",
+ " )\n",
+ " #print (x.shape, 'x_shape')\n",
+ " sqrt_recip_alphas_t = extract(sqrt_recip_alphas, t, x.shape)\n",
+ " \n",
+ " \n",
+ " # Equation 11 in the paper\n",
+ " # Use our model (noise predictor) to predict the mean\n",
+ " model_mean = sqrt_recip_alphas_t * (\n",
+ " x - betas_t * model(x, classes=classes, time=t) / sqrt_one_minus_alphas_cumprod_t\n",
+ " )\n",
+ "\n",
+ " if t_index == 0:\n",
+ " return model_mean\n",
+ " else:\n",
+ " posterior_variance_t = extract(posterior_variance, t, x.shape)\n",
+ " noise = torch.randn_like(x)\n",
+ " # Algorithm 2 line 4:\n",
+ " return model_mean + torch.sqrt(posterior_variance_t) * noise \n",
+ "\n",
+ "# Algorithm 2 but save all images:\n",
+ "\n",
+ "#FUNCTION CHANGED ADD CONDITIONING \n",
+ "\n",
+ "@torch.no_grad()\n",
+ "def p_sample_loop(model, classes, shape):\n",
+ " device = next(model.parameters()).device\n",
+ "\n",
+ " b = shape[0]\n",
+ " # start from pure noise (for each example in the batch)\n",
+ " img = torch.randn(shape, device=device)\n",
+ " imgs = []\n",
+ " \n",
+ " for i in tqdm(reversed(range(0, timesteps)), desc='sampling loop time step', total=timesteps):\n",
+ " img = p_sample(model, x=img, classes=classes, t=torch.full((b,), i, device=device, dtype=torch.long), t_index=i)\n",
+ " imgs.append(img.cpu().numpy())\n",
+ " return imgs\n",
+ "\n",
+ "\n",
+ "#FUNCTION CHANGED ADD CONDITIONING \n",
+ "\n",
+ "@torch.no_grad()\n",
+ "def sample(model, classes, image_size, batch_size=16, channels=3):\n",
+ " return p_sample_loop(model, classes=classes, shape=(batch_size, channels, 4, image_size))\n",
+ "\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bdf81003-2e58-41c7-a12b-eee5a8bd0bf7",
+ "metadata": {
+ "id": "bdf81003-2e58-41c7-a12b-eee5a8bd0bf7"
+ },
+ "source": [
+ "### schedule"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "1ddea4cb-f672-4ac6-b1d3-8afcb79c7081",
+ "metadata": {
+ "id": "1ddea4cb-f672-4ac6-b1d3-8afcb79c7081"
+ },
+ "outputs": [],
+ "source": [
+ "def cosine_beta_schedule(timesteps, s=0.008):\n",
+ " \"\"\"\n",
+ " cosine schedule as proposed in https://arxiv.org/abs/2102.09672\n",
+ " \"\"\"\n",
+ " steps = timesteps + 1\n",
+ " x = torch.linspace(0, timesteps, steps)\n",
+ " alphas_cumprod = torch.cos(((x / timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2\n",
+ " alphas_cumprod = alphas_cumprod / alphas_cumprod[0]\n",
+ " betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])\n",
+ " return torch.clip(betas, 0.0001, 0.9999)\n",
+ "\n",
+ "def linear_beta_schedule(timesteps, beta_end=0.005):\n",
+ " beta_start = 0.0001\n",
+ "\n",
+ " return torch.linspace(beta_start, beta_end, timesteps)\n",
+ "\n",
+ "def quadratic_beta_schedule(timesteps):\n",
+ " beta_start = 0.0001\n",
+ " beta_end = 0.02\n",
+ " return torch.linspace(beta_start**0.5, beta_end**0.5, timesteps) ** 2\n",
+ "\n",
+ "def sigmoid_beta_schedule(timesteps):\n",
+ " beta_start = 0.001\n",
+ " beta_end = 0.02\n",
+ " betas = torch.linspace(-6, 6, timesteps)\n",
+ " return torch.sigmoid(betas) * (beta_end - beta_start) + beta_start"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e1014e96-65d9-4d41-8a2a-46270f8213f7",
+ "metadata": {
+ "id": "e1014e96-65d9-4d41-8a2a-46270f8213f7"
+ },
+ "source": [
+ "### Foward diffusion\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "4beee4f9-5f17-4771-b075-58f0f30d420e",
+ "metadata": {
+ "id": "4beee4f9-5f17-4771-b075-58f0f30d420e"
+ },
+ "outputs": [],
+ "source": [
+ "\n",
+ "# forward diffusion\n",
+ "def q_sample(x_start, t, noise=None):\n",
+ " if noise is None:\n",
+ " noise = torch.randn_like(x_start)\n",
+ "\n",
+ " sqrt_alphas_cumprod_t = extract(sqrt_alphas_cumprod, t, x_start.shape)\n",
+ " sqrt_one_minus_alphas_cumprod_t = extract(\n",
+ " sqrt_one_minus_alphas_cumprod, t, x_start.shape\n",
+ " ) \n",
+ "\n",
+ " #print (sqrt_alphas_cumprod_t , sqrt_one_minus_alphas_cumprod_t , t)\n",
+ "\n",
+ " return sqrt_alphas_cumprod_t * x_start + sqrt_one_minus_alphas_cumprod_t * noise\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "78250fd9-1e51-40e8-8612-71c2f36dce80",
+ "metadata": {
+ "id": "78250fd9-1e51-40e8-8612-71c2f36dce80"
+ },
+ "source": [
+ "### Loss"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "83a85475-b650-4e7d-8d02-33d20198cb3c",
+ "metadata": {
+ "id": "83a85475-b650-4e7d-8d02-33d20198cb3c"
+ },
+ "outputs": [],
+ "source": [
+ "#FUNCTION CHANGED ADD CONDITIONING \n",
+ "\n",
+ "def p_losses(denoise_model, x_start, t, classes, noise=None, loss_type=\"l1\"): # \n",
+ " if noise is None:\n",
+ " noise = torch.randn_like(x_start) # guass noise \n",
+ " x_noisy = q_sample(x_start=x_start, t=t, noise=noise) #this is the auto generated noise given t and Noise\n",
+ " # print('max_q_sample', x_noisy.max(), 'mean_q_sample',x_noisy.mean() )\n",
+ " predicted_noise = denoise_model(x_noisy, t, classes) # this is the predicted noise given the model and step t\n",
+ " # print('max_predicted', x_noisy.max(), 'mean_predicted',x_noisy.mean() )\n",
+ "\n",
+ " # #predicted is ok (clipped)\n",
+ " # print ('predited inside loss')\n",
+ " # print (predicted_noise)\n",
+ " # print ('this is the noise generated by the p_losses')\n",
+ " # print (noise)\n",
+ " if loss_type == 'l1':\n",
+ " loss = F.l1_loss(noise, predicted_noise)\n",
+ " elif loss_type == 'l2':\n",
+ " # print (noise.shape, 'noise' )\n",
+ " # print (predicted_noise.shape, 'pred') \n",
+ " loss = F.mse_loss(noise, predicted_noise)\n",
+ " elif loss_type == \"huber\":\n",
+ " loss = F.smooth_l1_loss(noise, predicted_noise)\n",
+ " else:\n",
+ " raise NotImplementedError()\n",
+ "\n",
+ " return loss"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "XApz_K-I7DD4",
+ "metadata": {
+ "id": "XApz_K-I7DD4"
+ },
+ "source": [
+ "# Models"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ca471fe6-0427-4e7e-886a-8b7f67d812ef",
+ "metadata": {
+ "id": "ca471fe6-0427-4e7e-886a-8b7f67d812ef",
+ "tags": []
+ },
+ "source": [
+ "### Simple CNND2"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "87ba190c-8f02-45b4-91f3-b44f040c017b",
+ "metadata": {
+ "id": "87ba190c-8f02-45b4-91f3-b44f040c017b"
+ },
+ "outputs": [],
+ "source": [
+ "class SinusoidalPositionEmbeddings(nn.Module):\n",
+ " def __init__(self, dim):\n",
+ " super().__init__()\n",
+ " self.dim = dim\n",
+ "\n",
+ " def forward(self, time):\n",
+ " device = time.device\n",
+ " half_dim = self.dim // 2\n",
+ " embeddings = math.log(10000) / (half_dim - 1)\n",
+ " embeddings = torch.exp(torch.arange(half_dim, device=device) * -embeddings)\n",
+ " embeddings = time[:, None] * embeddings[None, :]\n",
+ " embeddings = torch.cat((embeddings.sin(), embeddings.cos()), dim=-1)\n",
+ " return embeddings\n",
+ "\n",
+ "\n",
+ "class ResBlock(nn.Module):\n",
+ "\n",
+ " \"\"\"\n",
+ " Iniialize a residual block with two convolutions followed by batchnorm layers\n",
+ " \"\"\"\n",
+ " def __init__(self, in_size:int, hidden_size:int, out_size:int):\n",
+ " super().__init__()\n",
+ " self.conv1 = nn.Conv2d(in_size, hidden_size, 3, padding=1)\n",
+ " self.conv2 = nn.Conv2d(hidden_size, out_size, 3, padding=1)\n",
+ " self.batchnorm1 = nn.BatchNorm2d(hidden_size)\n",
+ " self.batchnorm2 = nn.BatchNorm2d(out_size)\n",
+ "\n",
+ " def convblock(self, x):\n",
+ " x = F.relu(self.batchnorm1(self.conv1(x)))\n",
+ " x = F.relu(self.batchnorm2(self.conv2(x)))\n",
+ " return x\n",
+ " \n",
+ " \"\"\"\n",
+ " Combine output with the original input\n",
+ " \"\"\"\n",
+ " def forward(self, x): return x + self.convblock(x) # skip connection\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "class ConvBlock_2d(nn.Module):\n",
+ " def __init__(self, in_channels, out_channels):\n",
+ " super().__init__()\n",
+ " \n",
+ " self.conv1 = nn.Sequential(\n",
+ " nn.Conv2d(in_channels, out_channels, 4 ,padding=2 ),\n",
+ " nn.BatchNorm2d(out_channels),\n",
+ " nn.ReLU(),\n",
+ " )\n",
+ " self.conv2 = nn.Sequential(\n",
+ " nn.Conv2d(out_channels, out_channels, 4,1,1\n",
+ " ),\n",
+ " nn.BatchNorm2d(out_channels),\n",
+ " nn.ReLU(),\n",
+ " )\n",
+ "\n",
+ " self._init_weights()\n",
+ " \n",
+ " def _init_weights(self):\n",
+ " for m in self.modules():\n",
+ " if isinstance(m, nn.Conv2d):\n",
+ " nn.init.kaiming_normal_(m.weight)\n",
+ " if m.bias is not None:\n",
+ " nn.init.zeros_(m.bias)\n",
+ " elif isinstance(m, nn.BatchNorm2d):\n",
+ " nn.init.constant_(m.weight, 1)\n",
+ " nn.init.zeros_(m.bias)\n",
+ " \n",
+ " def forward(self, x):\n",
+ " #print ('x', x.shape)\n",
+ " x = self.conv1(x)\n",
+ " #print ('conv1', x.shape)\n",
+ " x = self.conv2(x)\n",
+ " #print ('conv2', x.shape)\n",
+ " #x = F.avg_pool2d(x, 2)\n",
+ " \n",
+ " return x\n",
+ "\n",
+ "\n",
+ "\n",
+ "class Classifier(nn.Module):\n",
+ " def __init__(self):\n",
+ " super().__init__()\n",
+ " \n",
+ "\n",
+ "\n",
+ " self.res = nn.Sequential( ResBlock(1,2,1),\n",
+ " ResBlock(1,2,1),\n",
+ " ResBlock(1,2,1) ,\n",
+ " ResBlock(1,2,1))\n",
+ " \n",
+ "\n",
+ "\n",
+ " self.conv = nn.Sequential(\n",
+ " ConvBlock_2d(in_channels=1, out_channels=2),\n",
+ " nn.ReLU(),\n",
+ " nn.BatchNorm2d(2),\n",
+ " ConvBlock_2d(in_channels=2, out_channels=4),\n",
+ " nn.ReLU(),\n",
+ " nn.BatchNorm2d(4),\n",
+ " ConvBlock_2d(in_channels=4, out_channels=1),\n",
+ " nn.BatchNorm2d(1)\n",
+ " # ConvBlock_2d(in_channels=1, out_channels=1),\n",
+ " # ConvBlock_2d(in_channels=1, out_channels=1),\n",
+ " # ConvBlock_2d(in_channels=1, out_channels=1),\n",
+ " )\n",
+ " \n",
+ " self.fc = nn.Sequential(\n",
+ " nn.Linear(800, 800),\n",
+ " # nn.GELU(),\n",
+ " nn.BatchNorm1d(800), #ALWAYS BATCHNORM THIS CHANGES A LOT THE RESULTS\n",
+ " # nn.Linear(400, 400),\n",
+ " # nn.BatchNorm1d(400),\n",
+ " # nn.GELU(),\n",
+ " # nn.BatchNorm1d(400),\n",
+ " \n",
+ " )\n",
+ "\n",
+ "\n",
+ " self.fc2 = nn.Sequential(\n",
+ " nn.Linear(400, 800),\n",
+ " # nn.GELU(),\n",
+ " nn.BatchNorm1d(800), #ALWAYS BATCHNORM THIS CHANGES A LOT THE RESULTS\n",
+ " # nn.Linear(400, 400),\n",
+ " # nn.GELU(),\n",
+ " # nn.BatchNorm1d(400),\n",
+ " \n",
+ " )\n",
+ "\n",
+ " time_dim = 200 * 4\n",
+ " self.time_mlp = nn.Sequential(\n",
+ " SinusoidalPositionEmbeddings(100),\n",
+ " nn.Linear(100, time_dim),\n",
+ " nn.GELU(),\n",
+ " nn.Linear(time_dim, time_dim),\n",
+ " )\n",
+ "\n",
+ " self.time_mlp_out = nn.Sequential(\n",
+ " SinusoidalPositionEmbeddings(100),\n",
+ " nn.Linear(100, time_dim),\n",
+ " nn.GELU(),\n",
+ " nn.Linear(time_dim, time_dim),\n",
+ " )\n",
+ "\n",
+ " def forward(self, x, y):\n",
+ " #print (x.shape, 'inside model x ')\n",
+ " x_a = x.clone()\n",
+ " # y_a = y.clone()\n",
+ " x = self.res(x)\n",
+ " \n",
+ " #print ('to_full', x.shape)\n",
+ " \n",
+ " y_emb = self.time_mlp(y)\n",
+ " # y_emb_out = self.time_mlp_out(y_a)\n",
+ " \n",
+ " x = x.view(-1,800)\n",
+ "\n",
+ " \n",
+ " x_a = x.view(-1,800)\n",
+ " x_a = self.fc(x_a)\n",
+ " \n",
+ "\n",
+ " x = x + y_emb.view(-1,800) * x_a\n",
+ "\n",
+ " \n",
+ " #x = self.fc2(x)\n",
+ " #x = x + y_emb_out.view(-1,400) + x_a\n",
+ "\n",
+ " #x = torch.clip(x, min=-1, max=1) \n",
+ " x = x.view(-1,1,4,200)\n",
+ " \n",
+ " #x = x.view(-1,1,200,4)\n",
+ "\n",
+ " \n",
+ " \n",
+ " #print (x.shape)\n",
+ " #The cliping is working already checked\n",
+ " return x\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7031ebeb-23cb-4bee-b6d2-de08ba960628",
+ "metadata": {
+ "id": "7031ebeb-23cb-4bee-b6d2-de08ba960628"
+ },
+ "source": [
+ "### UNET and UTILS"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "sYN_xbU9A4yY",
+ "metadata": {
+ "id": "sYN_xbU9A4yY"
+ },
+ "outputs": [],
+ "source": [
+ "#THIS is a new function to accomodate the conditional \n",
+ "\n",
+ "class EmbedFC(nn.Module):\n",
+ " def __init__(self, input_dim, emb_dim):\n",
+ " super(EmbedFC, self).__init__()\n",
+ " '''\n",
+ " generic one layer FC NN for embedding things \n",
+ " '''\n",
+ " self.input_dim = input_dim\n",
+ " layers = [\n",
+ " nn.Linear(input_dim, emb_dim),\n",
+ " nn.GELU(),\n",
+ " nn.Linear(emb_dim, emb_dim),\n",
+ " ]\n",
+ " self.model = nn.Sequential(*layers)\n",
+ "\n",
+ " def forward(self, x):\n",
+ " return self.model(x)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "52feed00-b104-4ac3-b55f-77d1db67196d",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "ccd1545c-0d18-4559-b828-fdf95377bae6",
+ "metadata": {
+ "id": "ccd1545c-0d18-4559-b828-fdf95377bae6"
+ },
+ "outputs": [],
+ "source": [
+ "\n",
+ "\n",
+ "def exists(x):\n",
+ " return x is not None\n",
+ "\n",
+ "def default(val, d):\n",
+ " if exists(val):\n",
+ " return val\n",
+ " return d() if callable(d) else d\n",
+ "\n",
+ "def cycle(dl):\n",
+ " while True:\n",
+ " for data in dl:\n",
+ " yield data\n",
+ "\n",
+ "def has_int_squareroot(num):\n",
+ " return (math.sqrt(num) ** 2) == num\n",
+ "\n",
+ "def num_to_groups(num, divisor):\n",
+ " groups = num // divisor\n",
+ " remainder = num % divisor\n",
+ " arr = [divisor] * groups\n",
+ " if remainder > 0:\n",
+ " arr.append(remainder)\n",
+ " return arr\n",
+ "\n",
+ "def convert_image_to(img_type, image):\n",
+ " if image.mode != img_type:\n",
+ " return image.convert(img_type)\n",
+ " return image\n",
+ "\n",
+ "def l2norm(t):\n",
+ " return F.normalize(t, dim = -1)\n",
+ "\n",
+ "# small helper modules\n",
+ "\n",
+ "\n",
+ "def default(val, d):\n",
+ " if exists(val):\n",
+ " return val\n",
+ " return d() if callable(d) else d\n",
+ "\n",
+ "\n",
+ "\n",
+ "class Residual(nn.Module):\n",
+ " def __init__(self, fn):\n",
+ " super().__init__()\n",
+ " self.fn = fn\n",
+ "\n",
+ " def forward(self, x, *args, **kwargs):\n",
+ " return self.fn(x, *args, **kwargs) + x\n",
+ "\n",
+ "def Upsample(dim, dim_out = None):\n",
+ " return nn.Sequential(\n",
+ " nn.Upsample(scale_factor = 2, mode = 'nearest'),\n",
+ " nn.Conv2d(dim, default(dim_out, dim), 3, padding = 1)\n",
+ " )\n",
+ "\n",
+ "def Downsample(dim, dim_out = None):\n",
+ " return nn.Conv2d(dim, default(dim_out, dim), 4, 2, 1)\n",
+ "\n",
+ "class LayerNorm(nn.Module):\n",
+ " def __init__(self, dim):\n",
+ " super().__init__()\n",
+ " self.g = nn.Parameter(torch.ones(1, dim, 1, 1))\n",
+ "\n",
+ " def forward(self, x):\n",
+ " eps = 1e-5 if x.dtype == torch.float32 else 1e-3\n",
+ " var = torch.var(x, dim = 1, unbiased = False, keepdim = True)\n",
+ " mean = torch.mean(x, dim = 1, keepdim = True)\n",
+ " return (x - mean) * (var + eps).rsqrt() * self.g\n",
+ "\n",
+ "class PreNorm(nn.Module):\n",
+ " def __init__(self, dim, fn):\n",
+ " super().__init__()\n",
+ " self.fn = fn\n",
+ " self.norm = LayerNorm(dim)\n",
+ "\n",
+ " def forward(self, x):\n",
+ " x = self.norm(x)\n",
+ " return self.fn(x)\n",
+ "\n",
+ "# positional embeds\n",
+ "\n",
+ "class LearnedSinusoidalPosEmb(nn.Module):\n",
+ " \"\"\" following @crowsonkb 's lead with learned sinusoidal pos emb \"\"\"\n",
+ " \"\"\" https://github.com/crowsonkb/v-diffusion-jax/blob/master/diffusion/models/danbooru_128.py#L8 \"\"\"\n",
+ "\n",
+ " def __init__(self, dim):\n",
+ " super().__init__()\n",
+ " assert (dim % 2) == 0\n",
+ " half_dim = dim // 2\n",
+ " self.weights = nn.Parameter(torch.randn(half_dim))\n",
+ "\n",
+ " def forward(self, x):\n",
+ " x = rearrange(x, 'b -> b 1')\n",
+ " freqs = x * rearrange(self.weights, 'd -> 1 d') * 2 * math.pi\n",
+ " fouriered = torch.cat((freqs.sin(), freqs.cos()), dim = -1)\n",
+ " fouriered = torch.cat((x, fouriered), dim = -1)\n",
+ " return fouriered\n",
+ "\n",
+ "# building block modules\n",
+ "\n",
+ "class Block(nn.Module):\n",
+ " def __init__(self, dim, dim_out, groups = 8):\n",
+ " super().__init__()\n",
+ " self.proj = nn.Conv2d(dim, dim_out, 3, padding = 1)\n",
+ " self.norm = nn.GroupNorm(groups, dim_out)\n",
+ " self.act = nn.SiLU()\n",
+ "\n",
+ " def forward(self, x, scale_shift = None):\n",
+ " x = self.proj(x)\n",
+ " x = self.norm(x)\n",
+ "\n",
+ " if exists(scale_shift):\n",
+ " scale, shift = scale_shift\n",
+ " x = x * (scale + 1) + shift\n",
+ "\n",
+ " x = self.act(x)\n",
+ " return x\n",
+ "\n",
+ "class ResnetBlock(nn.Module):\n",
+ " def __init__(self, dim, dim_out, *, time_emb_dim = None, groups = 8):\n",
+ " super().__init__()\n",
+ " self.mlp = nn.Sequential(\n",
+ " nn.SiLU(),\n",
+ " nn.Linear(time_emb_dim, dim_out * 2)\n",
+ " ) if exists(time_emb_dim) else None\n",
+ "\n",
+ " self.block1 = Block(dim, dim_out, groups = groups)\n",
+ " self.block2 = Block(dim_out, dim_out, groups = groups)\n",
+ " self.res_conv = nn.Conv2d(dim, dim_out, 1) if dim != dim_out else nn.Identity()\n",
+ "\n",
+ " def forward(self, x, time_emb = None):\n",
+ "\n",
+ " scale_shift = None\n",
+ " if exists(self.mlp) and exists(time_emb):\n",
+ " time_emb = self.mlp(time_emb)\n",
+ " time_emb = rearrange(time_emb, 'b c -> b c 1 1')\n",
+ " scale_shift = time_emb.chunk(2, dim = 1)\n",
+ "\n",
+ " h = self.block1(x, scale_shift = scale_shift)\n",
+ "\n",
+ " h = self.block2(h)\n",
+ "\n",
+ " return h + self.res_conv(x)\n",
+ "\n",
+ "#FUNCTION CHANGED ADD CONDITIONING \n",
+ "\n",
+ "class ResnetBlockClassConditioned(ResnetBlock):\n",
+ " def __init__(self, dim, dim_out, *, num_classes, class_embed_dim, time_emb_dim = None, groups = 8):\n",
+ " super().__init__(dim=dim+class_embed_dim, dim_out=dim_out, time_emb_dim=time_emb_dim, groups=groups)\n",
+ " self.class_mlp = EmbedFC(num_classes, class_embed_dim)\n",
+ " \n",
+ "\n",
+ " \n",
+ " def forward(self, x, time_emb=None, c=None, mask=None):\n",
+ " emb_c = self.class_mlp(c)\n",
+ " emb_c = emb_c.view(*emb_c.shape, 1, 1)\n",
+ " emb_c = emb_c.expand(-1, -1, x.shape[-2], x.shape[-1])\n",
+ "\n",
+ " if mask:\n",
+ " # mask classes so we can jointly train condtioned and unconditioned\n",
+ " # masking with 1e-9 like we do in Transformers\n",
+ " emb_c = emb_c.masked_fill(mask, 1e-9)\n",
+ "\n",
+ " x = torch.cat([x, emb_c], axis=1)\n",
+ "\n",
+ " return super().forward(x, time_emb)\n",
+ "\n",
+ "class LinearAttention(nn.Module):\n",
+ " def __init__(self, dim, heads = 4, dim_head = 32):\n",
+ " super().__init__()\n",
+ " self.scale = dim_head ** -0.5\n",
+ " self.heads = heads\n",
+ " hidden_dim = dim_head * heads\n",
+ " self.to_qkv = nn.Conv2d(dim, hidden_dim * 3, 1, bias = False)\n",
+ "\n",
+ " self.to_out = nn.Sequential(\n",
+ " nn.Conv2d(hidden_dim, dim, 1),\n",
+ " LayerNorm(dim)\n",
+ " )\n",
+ "\n",
+ " def forward(self, x):\n",
+ " b, c, h, w = x.shape\n",
+ " qkv = self.to_qkv(x).chunk(3, dim = 1)\n",
+ " q, k, v = map(lambda t: rearrange(t, 'b (h c) x y -> b h c (x y)', h = self.heads), qkv)\n",
+ "\n",
+ " q = q.softmax(dim = -2)\n",
+ " k = k.softmax(dim = -1)\n",
+ "\n",
+ " q = q * self.scale\n",
+ " v = v / (h * w)\n",
+ "\n",
+ " context = torch.einsum('b h d n, b h e n -> b h d e', k, v)\n",
+ "\n",
+ " out = torch.einsum('b h d e, b h d n -> b h e n', context, q)\n",
+ " out = rearrange(out, 'b h c (x y) -> b (h c) x y', h = self.heads, x = h, y = w)\n",
+ " return self.to_out(out)\n",
+ "\n",
+ "class Attention(nn.Module):\n",
+ " def __init__(self, dim, heads = 4, dim_head = 32, scale = 10):\n",
+ " super().__init__()\n",
+ " self.scale = scale\n",
+ " self.heads = heads\n",
+ " hidden_dim = dim_head * heads\n",
+ " self.to_qkv = nn.Conv2d(dim, hidden_dim * 3, 1, bias = False)\n",
+ " self.to_out = nn.Conv2d(hidden_dim, dim, 1)\n",
+ "\n",
+ " def forward(self, x):\n",
+ " b, c, h, w = x.shape\n",
+ " qkv = self.to_qkv(x).chunk(3, dim = 1)\n",
+ " q, k, v = map(lambda t: rearrange(t, 'b (h c) x y -> b h c (x y)', h = self.heads), qkv)\n",
+ "\n",
+ " q, k = map(l2norm, (q, k))\n",
+ "\n",
+ " sim = einsum('b h d i, b h d j -> b h i j', q, k) * self.scale\n",
+ " attn = sim.softmax(dim = -1)\n",
+ " out = einsum('b h i j, b h d j -> b h i d', attn, v)\n",
+ " out = rearrange(out, 'b h (x y) d -> b (h d) x y', x = h, y = w)\n",
+ " return self.to_out(out)\n",
+ "\n",
+ "# model\n",
+ "\n",
+ "# bit diffusion class\n",
+ "\n",
+ "def log(t, eps = 1e-20):\n",
+ " return torch.log(t.clamp(min = eps))\n",
+ "\n",
+ "def right_pad_dims_to(x, t):\n",
+ " padding_dims = x.ndim - t.ndim\n",
+ " if padding_dims <= 0:\n",
+ " return t\n",
+ " return t.view(*t.shape, *((1,) * padding_dims))\n",
+ "\n",
+ "def beta_linear_log_snr(t):\n",
+ " return -torch.log(expm1(1e-4 + 10 * (t ** 2)))\n",
+ "\n",
+ "def alpha_cosine_log_snr(t, s: float = 0.008):\n",
+ " return -log((torch.cos((t + s) / (1 + s) * math.pi * 0.5) ** -2) - 1, eps = 1e-5) # not sure if this accounts for beta being clipped to 0.999 in discrete version\n",
+ "\n",
+ "def log_snr_to_alpha_sigma(log_snr):\n",
+ " return torch.sqrt(torch.sigmoid(log_snr)), torch.sqrt(torch.sigmoid(-log_snr))\n",
+ "\n",
+ "\n",
+ "#ClASS CHANGED ADD CONDITIONING \n",
+ "\n",
+ "class Unet_lucas(nn.Module):\n",
+ " def __init__(\n",
+ " self,\n",
+ " dim,\n",
+ " init_dim = None,\n",
+ " dim_mults=(1, 2, 4),\n",
+ " channels = 1,\n",
+ " resnet_block_groups = 8,\n",
+ " learned_sinusoidal_dim = 16,\n",
+ " num_classes=10,\n",
+ " class_embed_dim=3,\n",
+ " ):\n",
+ " super().__init__()\n",
+ "\n",
+ " # determine dimensions\n",
+ "\n",
+ " channels =1\n",
+ " self.channels = channels\n",
+ "\n",
+ " input_channels = channels * 2\n",
+ " #print ('input channels',input_channels)\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ " init_dim = default(init_dim, dim)\n",
+ " print (init_dim, 'init_dim')\n",
+ " #self.init_conv = nn.Conv2d(input_channels, init_dim, 7, padding = 3) # original\n",
+ " self.init_conv = nn.Conv2d(input_channels, init_dim, (7,7), padding = 3)\n",
+ "\n",
+ " #print (self.init_conv)\n",
+ " dims = [init_dim, *map(lambda m: dim * m, dim_mults)]\n",
+ " #print (dims)\n",
+ "\n",
+ "\n",
+ " in_out = list(zip(dims[:-1], dims[1:]))\n",
+ " #print (in_out)\n",
+ " block_klass = partial(ResnetBlockClassConditioned, groups=resnet_block_groups,\n",
+ " num_classes=num_classes, class_embed_dim=class_embed_dim)\n",
+ "\n",
+ " # time embeddings\n",
+ "\n",
+ " time_dim = dim * 4\n",
+ "\n",
+ " sinu_pos_emb = LearnedSinusoidalPosEmb(learned_sinusoidal_dim)\n",
+ " fourier_dim = learned_sinusoidal_dim + 1\n",
+ "\n",
+ " self.time_mlp = nn.Sequential(\n",
+ " sinu_pos_emb,\n",
+ " nn.Linear(fourier_dim, time_dim),\n",
+ " nn.GELU(),\n",
+ " nn.Linear(time_dim, time_dim)\n",
+ " )\n",
+ "\n",
+ " # layers\n",
+ "\n",
+ " self.downs = nn.ModuleList([])\n",
+ " self.ups = nn.ModuleList([])\n",
+ " num_resolutions = len(in_out)\n",
+ "\n",
+ " for ind, (dim_in, dim_out) in enumerate(in_out):\n",
+ " is_last = ind >= (num_resolutions - 1)\n",
+ "\n",
+ " self.downs.append(nn.ModuleList([\n",
+ " block_klass(dim_in, dim_in, time_emb_dim = time_dim),\n",
+ " block_klass(dim_in, dim_in, time_emb_dim = time_dim),\n",
+ " Residual(PreNorm(dim_in, LinearAttention(dim_in))),\n",
+ " Downsample(dim_in, dim_out) if not is_last else nn.Conv2d(dim_in, dim_out, 3, padding = 1)\n",
+ " ]))\n",
+ "\n",
+ " mid_dim = dims[-1]\n",
+ " self.mid_block1 = block_klass(mid_dim, mid_dim, time_emb_dim = time_dim)\n",
+ " self.mid_attn = Residual(PreNorm(mid_dim, Attention(mid_dim)))\n",
+ " self.mid_block2 = block_klass(mid_dim, mid_dim, time_emb_dim = time_dim)\n",
+ "\n",
+ " for ind, (dim_in, dim_out) in enumerate(reversed(in_out)):\n",
+ " is_last = ind == (len(in_out) - 1)\n",
+ "\n",
+ " self.ups.append(nn.ModuleList([\n",
+ " block_klass(dim_out + dim_in, dim_out, time_emb_dim = time_dim),\n",
+ " block_klass(dim_out + dim_in, dim_out, time_emb_dim = time_dim),\n",
+ " Residual(PreNorm(dim_out, LinearAttention(dim_out))),\n",
+ " Upsample(dim_out, dim_in) if not is_last else nn.Conv2d(dim_out, dim_in, 3, padding = 1)\n",
+ " ]))\n",
+ "\n",
+ " self.final_res_block = block_klass(dim * 2, dim, time_emb_dim = time_dim)\n",
+ " #self.final_res_block = block_klass(1, dim, time_emb_dim = time_dim)\n",
+ "\n",
+ " #self.final_conv = nn.Conv2d(dim, channels, 1)\n",
+ " self.final_conv = nn.Conv2d(dim, 1, 1)\n",
+ " #print('self.final_conv' , self.final_conv)\n",
+ "\n",
+ "\n",
+ " print ('final',dim, channels, self.final_conv)\n",
+ "\n",
+ " def forward(self, x, time, classes, x_self_cond = None):\n",
+ " #print (x.shape ,'in_shape')\n",
+ " x_self_cond = default(x_self_cond, lambda: torch.zeros_like(x))\n",
+ " x = torch.cat((x_self_cond, x), dim = 1)\n",
+ "\n",
+ " x = self.init_conv(x)\n",
+ " #print ('init_conv', x.shape)\n",
+ " r = x.clone()\n",
+ "\n",
+ " t = self.time_mlp(time)\n",
+ " if classes is None:\n",
+ " classes = torch.zeros((x.shape[0], self.num_classes))\n",
+ " context_mask = torch.ones((x.shape[0]))\n",
+ "\n",
+ "\n",
+ " h = []\n",
+ "\n",
+ " for block1, block2, attn, downsample in self.downs:\n",
+ " x = block1(x, t, classes)\n",
+ " h.append(x)\n",
+ "\n",
+ " x = block2(x, t, classes)\n",
+ " x = attn(x)\n",
+ " h.append(x)\n",
+ "\n",
+ " x = downsample(x)\n",
+ "\n",
+ " x = self.mid_block1(x, t, classes)\n",
+ " x = self.mid_attn(x)\n",
+ " x = self.mid_block2(x, t, classes)\n",
+ "\n",
+ " for block1, block2, attn, upsample in self.ups:\n",
+ " x = torch.cat((x, h.pop()), dim = 1)\n",
+ " x = block1(x, t, classes)\n",
+ "\n",
+ " x = torch.cat((x, h.pop()), dim = 1)\n",
+ " x = block2(x, t, classes)\n",
+ " x = attn(x)\n",
+ "\n",
+ " x = upsample(x)\n",
+ "\n",
+ " \n",
+ " #print('x torch_after_upsamples',x.shape)\n",
+ "\n",
+ " x = torch.cat((x, r), dim = 1)\n",
+ " #print('x tochcat', x.shape)\n",
+ "\n",
+ " x = self.final_res_block(x, t, classes)\n",
+ " #print(self.final_res_block)\n",
+ " #print('x from res_block before final_conv',x.shape)\n",
+ " #print (self.final_conv(x).shape)\n",
+ " x = self.final_conv(x)\n",
+ " #print ('FINAL X', x.shape)\n",
+ " return x\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d477de22-b316-4b4f-bc7f-5afc956b9f2b",
+ "metadata": {
+ "id": "d477de22-b316-4b4f-bc7f-5afc956b9f2b"
+ },
+ "source": [
+ "# Loading data and generating fasta files and motifs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "id": "07687539-319b-4b9a-a4f8-1149853e7bc7",
+ "metadata": {
+ "id": "07687539-319b-4b9a-a4f8-1149853e7bc7"
+ },
+ "outputs": [],
+ "source": [
+ "#Loading Dataset\n",
+ "df = pd.read_csv(\"train_all_classifier_WM20220916.csv\", sep=\"\\t\")\n",
+ "df.head()\n",
+ "df = df.sample(8000) # Using 4000k cells"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "4a34e782-3b42-4fb1-876c-8cacc9f9a4c3",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Component % on Training Sample')"
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ "| \n", + " | Unnamed: 0 | \n", + "seqname | \n", + "start | \n", + "end | \n", + "DHS_width | \n", + "summit | \n", + "total_signal | \n", + "numsamples | \n", + "numpeaks | \n", + "C1 | \n", + "... | \n", + "C10 | \n", + "C11 | \n", + "C12 | \n", + "C13 | \n", + "C14 | \n", + "C15 | \n", + "C16 | \n", + "raw_sequence | \n", + "component | \n", + "proportion | \n", + "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 91915 | \n", + "2045472 | \n", + "chr22 | \n", + "33834860 | \n", + "33835120 | \n", + "260 | \n", + "33835090 | \n", + "0.506276 | \n", + "1 | \n", + "1 | \n", + "0.000000 | \n", + "... | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "AATATGCCTCAGCGACAAAGCTACAGCTTTAGAAATGATACAGTTA... | \n", + "7 | \n", + "1.000000 | \n", + "
| 10522 | \n", + "1562796 | \n", + "chr19 | \n", + "50292300 | \n", + "50292500 | \n", + "200 | \n", + "50292410 | \n", + "10.969485 | \n", + "11 | \n", + "11 | \n", + "0.000000 | \n", + "... | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "GCACCTCCCGGGAGGAGATCTTCTCCCAGAATCGGGAAAGTGAAAA... | \n", + "2 | \n", + "1.000000 | \n", + "
| 53116 | \n", + "1461349 | \n", + "chr18 | \n", + "54202940 | \n", + "54203160 | \n", + "220 | \n", + "54203030 | \n", + "1.777531 | \n", + "3 | \n", + "3 | \n", + "0.000000 | \n", + "... | \n", + "0.000046 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.020306 | \n", + "TTCCCAGAGATGGAGACAATGAAGCTGGGTCTTGGATGAGTAGGAG... | \n", + "15 | \n", + "0.997722 | \n", + "
| 151098 | \n", + "3278098 | \n", + "chr8 | \n", + "119764480 | \n", + "119764716 | \n", + "236 | \n", + "119764590 | \n", + "2.430332 | \n", + "4 | \n", + "4 | \n", + "0.000000 | \n", + "... | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "AAGCATAATGGCATGCTGTAAAGTAACATGGTAATACTAAAAACAT... | \n", + "4 | \n", + "1.000000 | \n", + "
| 103268 | \n", + "701239 | \n", + "chr12 | \n", + "27780320 | \n", + "27780740 | \n", + "420 | \n", + "27780550 | \n", + "0.760209 | \n", + "1 | \n", + "1 | \n", + "0.000561 | \n", + "... | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.004483 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "GCCCTGAGCCAGGAGGCCGGCGGCCCGGAGGTGCAGCAGCTGCGCG... | \n", + "12 | \n", + "0.888776 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 62262 | \n", + "2236153 | \n", + "chr3 | \n", + "131961740 | \n", + "131962100 | \n", + "360 | \n", + "131961950 | \n", + "3.170421 | \n", + "4 | \n", + "4 | \n", + "0.000000 | \n", + "... | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "TTATCTCTCTCCCTTCTCCCTTTTCCCTACTCAGATCCAGATCTGC... | \n", + "6 | \n", + "1.000000 | \n", + "
| 126297 | \n", + "324216 | \n", + "chr10 | \n", + "8349480 | \n", + "8349860 | \n", + "380 | \n", + "8349700 | \n", + "2.754060 | \n", + "2 | \n", + "2 | \n", + "0.000000 | \n", + "... | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "CACGTCTGGAAACCCAAGAATTTGGCATTTACCTTCTAGATGGGTG... | \n", + "4 | \n", + "1.000000 | \n", + "
| 10197 | \n", + "1451084 | \n", + "chr18 | \n", + "46445580 | \n", + "46445780 | \n", + "200 | \n", + "46445690 | \n", + "6.398176 | \n", + "11 | \n", + "11 | \n", + "0.000000 | \n", + "... | \n", + "0.026391 | \n", + "0.006446 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "ACAACTTTCAGAGACCTCAGCCTCCTGGAAATAAGATCTCTGAATC... | \n", + "5 | \n", + "0.699225 | \n", + "
| 133309 | \n", + "745202 | \n", + "chr12 | \n", + "64664420 | \n", + "64664620 | \n", + "200 | \n", + "64664510 | \n", + "0.781073 | \n", + "1 | \n", + "1 | \n", + "0.000000 | \n", + "... | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.000532 | \n", + "0.000189 | \n", + "0.003501 | \n", + "TATTCTGTGCAGTTTACACCTGCATAGGCCACCCATCTCGGCCTCC... | \n", + "15 | \n", + "0.723773 | \n", + "
| 73541 | \n", + "2664501 | \n", + "chr5 | \n", + "132026760 | \n", + "132026969 | \n", + "209 | \n", + "132026880 | \n", + "46.871654 | \n", + "20 | \n", + "20 | \n", + "0.000000 | \n", + "... | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "0.000000 | \n", + "CTGGGGGCGGTGGGGCTGCTGTACACATTTCCTGGCACACGCTTCC... | \n", + "4 | \n", + "0.997665 | \n", + "
8000 rows × 28 columns
\n", + "| \n", + " | Unnamed: 0 | \n", + "seqname | \n", + "start | \n", + "end | \n", + "DHS_width | \n", + "summit | \n", + "total_signal | \n", + "numsamples | \n", + "numpeaks | \n", + "C1 | \n", + "... | \n", + "C10 | \n", + "C11 | \n", + "C12 | \n", + "C13 | \n", + "C14 | \n", + "C15 | \n", + "C16 | \n", + "raw_sequence | \n", + "component | \n", + "proportion | \n", + "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 91915 | \n", + "2045472 | \n", + "chr22 | \n", + "33834860 | \n", + "33835120 | \n", + "260 | \n", + "33835090 | \n", + "0.506276 | \n", + "1 | \n", + "1 | \n", + "0.000000 | \n", + "... | \n", + "0.000000 | \n", + "0.0 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.0 | \n", + "0.000000 | \n", + "AATATGCCTCAGCGACAAAGCTACAGCTTTAGAAATGATACAGTTA... | \n", + "7 | \n", + "1.000000 | \n", + "
| 10522 | \n", + "1562796 | \n", + "chr19 | \n", + "50292300 | \n", + "50292500 | \n", + "200 | \n", + "50292410 | \n", + "10.969485 | \n", + "11 | \n", + "11 | \n", + "0.000000 | \n", + "... | \n", + "0.000000 | \n", + "0.0 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.0 | \n", + "0.000000 | \n", + "GCACCTCCCGGGAGGAGATCTTCTCCCAGAATCGGGAAAGTGAAAA... | \n", + "2 | \n", + "1.000000 | \n", + "
| 53116 | \n", + "1461349 | \n", + "chr18 | \n", + "54202940 | \n", + "54203160 | \n", + "220 | \n", + "54203030 | \n", + "1.777531 | \n", + "3 | \n", + "3 | \n", + "0.000000 | \n", + "... | \n", + "0.000046 | \n", + "0.0 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.0 | \n", + "0.020306 | \n", + "TTCCCAGAGATGGAGACAATGAAGCTGGGTCTTGGATGAGTAGGAG... | \n", + "15 | \n", + "0.997722 | \n", + "
| 151098 | \n", + "3278098 | \n", + "chr8 | \n", + "119764480 | \n", + "119764716 | \n", + "236 | \n", + "119764590 | \n", + "2.430332 | \n", + "4 | \n", + "4 | \n", + "0.000000 | \n", + "... | \n", + "0.000000 | \n", + "0.0 | \n", + "0.0 | \n", + "0.000000 | \n", + "0.0 | \n", + "0.0 | \n", + "0.000000 | \n", + "AAGCATAATGGCATGCTGTAAAGTAACATGGTAATACTAAAAACAT... | \n", + "4 | \n", + "1.000000 | \n", + "
| 103268 | \n", + "701239 | \n", + "chr12 | \n", + "27780320 | \n", + "27780740 | \n", + "420 | \n", + "27780550 | \n", + "0.760209 | \n", + "1 | \n", + "1 | \n", + "0.000561 | \n", + "... | \n", + "0.000000 | \n", + "0.0 | \n", + "0.0 | \n", + "0.004483 | \n", + "0.0 | \n", + "0.0 | \n", + "0.000000 | \n", + "GCCCTGAGCCAGGAGGCCGGCGGCCCGGAGGTGCAGCAGCTGCGCG... | \n", + "12 | \n", + "0.888776 | \n", + "
5 rows × 28 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "16 | \n", + "119 | \n", + "0.001650 | \n", + "0.001296 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "125 | \n", + "0.000722 | \n", + "0.001361 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "67 | \n", + "0.000619 | \n", + "0.000729 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "14 | \n", + "111 | \n", + "0.001444 | \n", + "0.001209 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "13 | \n", + "116 | \n", + "0.001341 | \n", + "0.001263 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 878 | \n", + "MA0780.1_PAX3 | \n", + "10 | \n", + "62 | \n", + "0.001031 | \n", + "0.000675 | \n", + "
| 879 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "82 | \n", + "0.000619 | \n", + "0.000893 | \n", + "
| 880 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "11 | \n", + "128 | \n", + "0.001134 | \n", + "0.001394 | \n", + "
| 881 | \n", + "MA0842.2_NRL | \n", + "12 | \n", + "101 | \n", + "0.001238 | \n", + "0.001100 | \n", + "
| 882 | \n", + "MA0798.2_RFX3 | \n", + "6 | \n", + "117 | \n", + "0.000619 | \n", + "0.001274 | \n", + "
883 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "15 | \n", + "119 | \n", + "0.001614 | \n", + "0.001296 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "9 | \n", + "125 | \n", + "0.000969 | \n", + "0.001361 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "4 | \n", + "67 | \n", + "0.000431 | \n", + "0.000729 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "17 | \n", + "111 | \n", + "0.001830 | \n", + "0.001209 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "116 | \n", + "0.000969 | \n", + "0.001263 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 878 | \n", + "MA0780.1_PAX3 | \n", + "1 | \n", + "62 | \n", + "0.000108 | \n", + "0.000675 | \n", + "
| 879 | \n", + "MA0883.1_Dmbx1 | \n", + "10 | \n", + "82 | \n", + "0.001076 | \n", + "0.000893 | \n", + "
| 880 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "17 | \n", + "128 | \n", + "0.001830 | \n", + "0.001394 | \n", + "
| 881 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "101 | \n", + "0.001507 | \n", + "0.001100 | \n", + "
| 882 | \n", + "MA0798.2_RFX3 | \n", + "11 | \n", + "117 | \n", + "0.001184 | \n", + "0.001274 | \n", + "
883 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "9 | \n", + "0.000385 | \n", + "0.001676 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "5 | \n", + "0.001347 | \n", + "0.000931 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "7 | \n", + "0.001155 | \n", + "0.001304 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "7 | \n", + "0.001732 | \n", + "0.001304 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "7 | \n", + "0.001732 | \n", + "0.001304 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 872 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "7 | \n", + "0.000770 | \n", + "0.001304 | \n", + "
| 873 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "1 | \n", + "0.001155 | \n", + "0.000186 | \n", + "
| 874 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "5 | \n", + "0.000962 | \n", + "0.000931 | \n", + "
| 875 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "7 | \n", + "0.002694 | \n", + "0.001304 | \n", + "
| 876 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "5 | \n", + "0.000577 | \n", + "0.000931 | \n", + "
877 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "4 | \n", + "0.000385 | \n", + "0.000636 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "2 | \n", + "0.001347 | \n", + "0.000318 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "3 | \n", + "0.001155 | \n", + "0.000477 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "8 | \n", + "0.001732 | \n", + "0.001272 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "9 | \n", + "0.001732 | \n", + "0.001432 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 872 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "1 | \n", + "0.000770 | \n", + "0.000159 | \n", + "
| 873 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "9 | \n", + "0.001155 | \n", + "0.001432 | \n", + "
| 874 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "8 | \n", + "0.000962 | \n", + "0.001272 | \n", + "
| 875 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "9 | \n", + "0.002694 | \n", + "0.001432 | \n", + "
| 876 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "6 | \n", + "0.000577 | \n", + "0.000954 | \n", + "
877 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "18 | \n", + "0.000385 | \n", + "0.002632 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "12 | \n", + "0.001347 | \n", + "0.001755 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "4 | \n", + "0.001155 | \n", + "0.000585 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "6 | \n", + "0.001732 | \n", + "0.000877 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "3 | \n", + "0.001732 | \n", + "0.000439 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 872 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "2 | \n", + "0.000770 | \n", + "0.000292 | \n", + "
| 873 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "9 | \n", + "0.001155 | \n", + "0.001316 | \n", + "
| 874 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "9 | \n", + "0.000962 | \n", + "0.001316 | \n", + "
| 875 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "8 | \n", + "0.002694 | \n", + "0.001170 | \n", + "
| 876 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "11 | \n", + "0.000577 | \n", + "0.001608 | \n", + "
877 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "2 | \n", + "0.000385 | \n", + "0.000381 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "10 | \n", + "0.001347 | \n", + "0.001907 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "2 | \n", + "0.001155 | \n", + "0.000381 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "10 | \n", + "0.001732 | \n", + "0.001907 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "6 | \n", + "0.001732 | \n", + "0.001144 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 871 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "1 | \n", + "0.000770 | \n", + "0.000191 | \n", + "
| 872 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "5 | \n", + "0.001155 | \n", + "0.000953 | \n", + "
| 873 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "3 | \n", + "0.000962 | \n", + "0.000572 | \n", + "
| 874 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "8 | \n", + "0.002694 | \n", + "0.001526 | \n", + "
| 875 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "8 | \n", + "0.000577 | \n", + "0.001526 | \n", + "
876 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "3 | \n", + "0.000385 | \n", + "0.000552 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "16 | \n", + "0.001347 | \n", + "0.002944 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "1 | \n", + "0.001154 | \n", + "0.000184 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "7 | \n", + "0.001731 | \n", + "0.001288 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "6 | \n", + "0.001731 | \n", + "0.001104 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 873 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "2 | \n", + "0.000770 | \n", + "0.000368 | \n", + "
| 874 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "1 | \n", + "0.001154 | \n", + "0.000184 | \n", + "
| 875 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "6 | \n", + "0.000962 | \n", + "0.001104 | \n", + "
| 876 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "3 | \n", + "0.002693 | \n", + "0.000552 | \n", + "
| 877 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "7 | \n", + "0.000577 | \n", + "0.001288 | \n", + "
878 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "7 | \n", + "0.000384 | \n", + "0.001077 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "10 | \n", + "0.001345 | \n", + "0.001539 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "4 | \n", + "0.001153 | \n", + "0.000615 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "6 | \n", + "0.001730 | \n", + "0.000923 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "12 | \n", + "0.001730 | \n", + "0.001846 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 878 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "6 | \n", + "0.000769 | \n", + "0.000923 | \n", + "
| 879 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "8 | \n", + "0.001153 | \n", + "0.001231 | \n", + "
| 880 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "7 | \n", + "0.000961 | \n", + "0.001077 | \n", + "
| 881 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "7 | \n", + "0.002691 | \n", + "0.001077 | \n", + "
| 882 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "5 | \n", + "0.000577 | \n", + "0.000769 | \n", + "
883 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "4 | \n", + "0.000384 | \n", + "0.000795 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "3 | \n", + "0.001346 | \n", + "0.000596 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "3 | \n", + "0.001153 | \n", + "0.000596 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "8 | \n", + "0.001730 | \n", + "0.001590 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "9 | \n", + "0.001730 | \n", + "0.001788 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 877 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "1 | \n", + "0.000769 | \n", + "0.000199 | \n", + "
| 878 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "3 | \n", + "0.001153 | \n", + "0.000596 | \n", + "
| 879 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "10 | \n", + "0.000961 | \n", + "0.001987 | \n", + "
| 880 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "4 | \n", + "0.002691 | \n", + "0.000795 | \n", + "
| 881 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "7 | \n", + "0.000577 | \n", + "0.001391 | \n", + "
882 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "6 | \n", + "0.000385 | \n", + "0.001028 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "13 | \n", + "0.001347 | \n", + "0.002228 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "22 | \n", + "0.001155 | \n", + "0.003770 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "4 | \n", + "0.001732 | \n", + "0.000685 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "5 | \n", + "0.001732 | \n", + "0.000857 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 870 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "12 | \n", + "0.000770 | \n", + "0.002056 | \n", + "
| 871 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "5 | \n", + "0.001155 | \n", + "0.000857 | \n", + "
| 872 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "2 | \n", + "0.000962 | \n", + "0.000343 | \n", + "
| 873 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "13 | \n", + "0.002695 | \n", + "0.002228 | \n", + "
| 874 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "19 | \n", + "0.000577 | \n", + "0.003256 | \n", + "
875 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "8 | \n", + "0.000385 | \n", + "0.001756 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "3 | \n", + "0.001348 | \n", + "0.000658 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "2 | \n", + "0.001156 | \n", + "0.000439 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "3 | \n", + "0.001734 | \n", + "0.000658 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "3 | \n", + "0.001734 | \n", + "0.000658 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 866 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "3 | \n", + "0.000771 | \n", + "0.000658 | \n", + "
| 867 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "2 | \n", + "0.001156 | \n", + "0.000439 | \n", + "
| 868 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "9 | \n", + "0.000963 | \n", + "0.001975 | \n", + "
| 869 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "6 | \n", + "0.002697 | \n", + "0.001317 | \n", + "
| 870 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "6 | \n", + "0.000578 | \n", + "0.001317 | \n", + "
871 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "11 | \n", + "0.000385 | \n", + "0.001897 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "7 | \n", + "0.001348 | \n", + "0.001207 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "9 | \n", + "0.001155 | \n", + "0.001552 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "6 | \n", + "0.001733 | \n", + "0.001034 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "2 | \n", + "0.001733 | \n", + "0.000345 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 869 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "3 | \n", + "0.000770 | \n", + "0.000517 | \n", + "
| 870 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "6 | \n", + "0.001155 | \n", + "0.001034 | \n", + "
| 871 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "15 | \n", + "0.000963 | \n", + "0.002586 | \n", + "
| 872 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "4 | \n", + "0.002695 | \n", + "0.000690 | \n", + "
| 873 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "7 | \n", + "0.000578 | \n", + "0.001207 | \n", + "
874 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "7 | \n", + "0.000385 | \n", + "0.001274 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "9 | \n", + "0.001347 | \n", + "0.001638 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "4 | \n", + "0.001155 | \n", + "0.000728 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "6 | \n", + "0.001732 | \n", + "0.001092 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "5 | \n", + "0.001732 | \n", + "0.000910 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 871 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "6 | \n", + "0.000770 | \n", + "0.001092 | \n", + "
| 872 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "5 | \n", + "0.001155 | \n", + "0.000910 | \n", + "
| 873 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "4 | \n", + "0.000962 | \n", + "0.000728 | \n", + "
| 874 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "13 | \n", + "0.002694 | \n", + "0.002365 | \n", + "
| 875 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "7 | \n", + "0.000577 | \n", + "0.001274 | \n", + "
876 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "4 | \n", + "0.000385 | \n", + "0.000805 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "8 | \n", + "0.001347 | \n", + "0.001610 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "1 | \n", + "0.001155 | \n", + "0.000201 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "5 | \n", + "0.001732 | \n", + "0.001006 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "9 | \n", + "0.001732 | \n", + "0.001811 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 871 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "4 | \n", + "0.000770 | \n", + "0.000805 | \n", + "
| 872 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "7 | \n", + "0.001155 | \n", + "0.001408 | \n", + "
| 873 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "9 | \n", + "0.000962 | \n", + "0.001811 | \n", + "
| 874 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "7 | \n", + "0.002694 | \n", + "0.001408 | \n", + "
| 875 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "9 | \n", + "0.000577 | \n", + "0.001811 | \n", + "
876 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "10 | \n", + "0.000385 | \n", + "0.001611 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "14 | \n", + "0.001346 | \n", + "0.002255 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "3 | \n", + "0.001154 | \n", + "0.000483 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "5 | \n", + "0.001731 | \n", + "0.000805 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "7 | \n", + "0.001731 | \n", + "0.001127 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 874 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "5 | \n", + "0.000769 | \n", + "0.000805 | \n", + "
| 875 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "3 | \n", + "0.001154 | \n", + "0.000483 | \n", + "
| 876 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "20 | \n", + "0.000962 | \n", + "0.003221 | \n", + "
| 877 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "4 | \n", + "0.002693 | \n", + "0.000644 | \n", + "
| 878 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "8 | \n", + "0.000577 | \n", + "0.001288 | \n", + "
879 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "7 | \n", + "0.000385 | \n", + "0.001014 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "4 | \n", + "0.001348 | \n", + "0.000579 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "4 | \n", + "0.001156 | \n", + "0.000579 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "11 | \n", + "0.001733 | \n", + "0.001593 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "8 | \n", + "0.001733 | \n", + "0.001158 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 867 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "2 | \n", + "0.000770 | \n", + "0.000290 | \n", + "
| 868 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "2 | \n", + "0.001156 | \n", + "0.000290 | \n", + "
| 869 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "8 | \n", + "0.000963 | \n", + "0.001158 | \n", + "
| 870 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "6 | \n", + "0.002696 | \n", + "0.000869 | \n", + "
| 871 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "11 | \n", + "0.000578 | \n", + "0.001593 | \n", + "
872 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "12 | \n", + "0.000385 | \n", + "0.002162 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "5 | \n", + "0.001347 | \n", + "0.000901 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "1 | \n", + "0.001155 | \n", + "0.000180 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "10 | \n", + "0.001732 | \n", + "0.001802 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "7 | \n", + "0.001732 | \n", + "0.001261 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 872 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "2 | \n", + "0.000770 | \n", + "0.000360 | \n", + "
| 873 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "6 | \n", + "0.001155 | \n", + "0.001081 | \n", + "
| 874 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "6 | \n", + "0.000962 | \n", + "0.001081 | \n", + "
| 875 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "4 | \n", + "0.002694 | \n", + "0.000721 | \n", + "
| 876 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "1 | \n", + "0.000577 | \n", + "0.000180 | \n", + "
877 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "7 | \n", + "0.000385 | \n", + "0.001063 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "4 | \n", + "0.001346 | \n", + "0.000608 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "6 | \n", + "2 | \n", + "0.001154 | \n", + "0.000304 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "9 | \n", + "0.001731 | \n", + "0.001367 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "9 | \n", + "5 | \n", + "0.001731 | \n", + "0.000759 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 874 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "1 | \n", + "0.000769 | \n", + "0.000152 | \n", + "
| 875 | \n", + "MA0883.1_Dmbx1 | \n", + "6 | \n", + "10 | \n", + "0.001154 | \n", + "0.001519 | \n", + "
| 876 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "7 | \n", + "0.000962 | \n", + "0.001063 | \n", + "
| 877 | \n", + "MA0842.2_NRL | \n", + "14 | \n", + "3 | \n", + "0.002693 | \n", + "0.000456 | \n", + "
| 878 | \n", + "MA0798.2_RFX3 | \n", + "3 | \n", + "5 | \n", + "0.000577 | \n", + "0.000759 | \n", + "
879 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "9 | \n", + "0.000426 | \n", + "0.001677 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "5 | \n", + "0.001493 | \n", + "0.000932 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "7 | \n", + "0.000213 | \n", + "0.001305 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "7 | \n", + "0.002132 | \n", + "0.001305 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "7 | \n", + "0.000853 | \n", + "0.001305 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 868 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "7 | \n", + "0.000853 | \n", + "0.001305 | \n", + "
| 869 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "1 | \n", + "0.000426 | \n", + "0.000186 | \n", + "
| 870 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "5 | \n", + "0.001066 | \n", + "0.000932 | \n", + "
| 871 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "7 | \n", + "0.001279 | \n", + "0.001305 | \n", + "
| 872 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "5 | \n", + "0.000213 | \n", + "0.000932 | \n", + "
873 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "4 | \n", + "0.000426 | \n", + "0.000636 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "2 | \n", + "0.001492 | \n", + "0.000318 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "3 | \n", + "0.000213 | \n", + "0.000477 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "8 | \n", + "0.002131 | \n", + "0.001273 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "9 | \n", + "0.000853 | \n", + "0.001432 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 870 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "1 | \n", + "0.000853 | \n", + "0.000159 | \n", + "
| 871 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "9 | \n", + "0.000426 | \n", + "0.001432 | \n", + "
| 872 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "8 | \n", + "0.001066 | \n", + "0.001273 | \n", + "
| 873 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "9 | \n", + "0.001279 | \n", + "0.001432 | \n", + "
| 874 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "6 | \n", + "0.000213 | \n", + "0.000955 | \n", + "
875 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "18 | \n", + "0.000426 | \n", + "0.002631 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "12 | \n", + "0.001491 | \n", + "0.001754 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "4 | \n", + "0.000213 | \n", + "0.000585 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "6 | \n", + "0.002129 | \n", + "0.000877 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "3 | \n", + "0.000852 | \n", + "0.000439 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 874 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "2 | \n", + "0.000852 | \n", + "0.000292 | \n", + "
| 875 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "9 | \n", + "0.000426 | \n", + "0.001316 | \n", + "
| 876 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "9 | \n", + "0.001065 | \n", + "0.001316 | \n", + "
| 877 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "8 | \n", + "0.001278 | \n", + "0.001169 | \n", + "
| 878 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "11 | \n", + "0.000213 | \n", + "0.001608 | \n", + "
879 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "2 | \n", + "0.000427 | \n", + "0.000382 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "10 | \n", + "0.001493 | \n", + "0.001908 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "2 | \n", + "0.000213 | \n", + "0.000382 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "10 | \n", + "0.002133 | \n", + "0.001908 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "6 | \n", + "0.000853 | \n", + "0.001145 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 867 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "1 | \n", + "0.000853 | \n", + "0.000191 | \n", + "
| 868 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "5 | \n", + "0.000427 | \n", + "0.000954 | \n", + "
| 869 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "3 | \n", + "0.001066 | \n", + "0.000573 | \n", + "
| 870 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "8 | \n", + "0.001280 | \n", + "0.001527 | \n", + "
| 871 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "8 | \n", + "0.000213 | \n", + "0.001527 | \n", + "
872 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "3 | \n", + "0.000426 | \n", + "0.000552 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "16 | \n", + "0.001491 | \n", + "0.002944 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "1 | \n", + "0.000213 | \n", + "0.000184 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "7 | \n", + "0.002129 | \n", + "0.001288 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "6 | \n", + "0.000852 | \n", + "0.001104 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 874 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "2 | \n", + "0.000852 | \n", + "0.000368 | \n", + "
| 875 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "1 | \n", + "0.000426 | \n", + "0.000184 | \n", + "
| 876 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "6 | \n", + "0.001065 | \n", + "0.001104 | \n", + "
| 877 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "3 | \n", + "0.001278 | \n", + "0.000552 | \n", + "
| 878 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "7 | \n", + "0.000213 | \n", + "0.001288 | \n", + "
879 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "7 | \n", + "0.000426 | \n", + "0.001077 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "10 | \n", + "0.001490 | \n", + "0.001539 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "4 | \n", + "0.000213 | \n", + "0.000616 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "6 | \n", + "0.002128 | \n", + "0.000923 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "12 | \n", + "0.000851 | \n", + "0.001847 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 877 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "6 | \n", + "0.000851 | \n", + "0.000923 | \n", + "
| 878 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "8 | \n", + "0.000426 | \n", + "0.001231 | \n", + "
| 879 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "7 | \n", + "0.001064 | \n", + "0.001077 | \n", + "
| 880 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "7 | \n", + "0.001277 | \n", + "0.001077 | \n", + "
| 881 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "5 | \n", + "0.000213 | \n", + "0.000769 | \n", + "
882 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "4 | \n", + "0.000426 | \n", + "0.000795 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "3 | \n", + "0.001490 | \n", + "0.000596 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "3 | \n", + "0.000213 | \n", + "0.000596 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "8 | \n", + "0.002129 | \n", + "0.001590 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "9 | \n", + "0.000852 | \n", + "0.001789 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 875 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "1 | \n", + "0.000852 | \n", + "0.000199 | \n", + "
| 876 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "3 | \n", + "0.000426 | \n", + "0.000596 | \n", + "
| 877 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "10 | \n", + "0.001065 | \n", + "0.001988 | \n", + "
| 878 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "4 | \n", + "0.001277 | \n", + "0.000795 | \n", + "
| 879 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "7 | \n", + "0.000213 | \n", + "0.001391 | \n", + "
880 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "6 | \n", + "0.000427 | \n", + "0.001029 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "13 | \n", + "0.001493 | \n", + "0.002229 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "22 | \n", + "0.000213 | \n", + "0.003772 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "4 | \n", + "0.002133 | \n", + "0.000686 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "5 | \n", + "0.000853 | \n", + "0.000857 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 867 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "12 | \n", + "0.000853 | \n", + "0.002057 | \n", + "
| 868 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "5 | \n", + "0.000427 | \n", + "0.000857 | \n", + "
| 869 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "2 | \n", + "0.001066 | \n", + "0.000343 | \n", + "
| 870 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "13 | \n", + "0.001280 | \n", + "0.002229 | \n", + "
| 871 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "19 | \n", + "0.000213 | \n", + "0.003257 | \n", + "
872 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "8 | \n", + "0.000427 | \n", + "0.001757 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "3 | \n", + "0.001494 | \n", + "0.000659 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "2 | \n", + "0.000213 | \n", + "0.000439 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "3 | \n", + "0.002134 | \n", + "0.000659 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "3 | \n", + "0.000854 | \n", + "0.000659 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 864 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "3 | \n", + "0.000854 | \n", + "0.000659 | \n", + "
| 865 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "2 | \n", + "0.000427 | \n", + "0.000439 | \n", + "
| 866 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "9 | \n", + "0.001067 | \n", + "0.001976 | \n", + "
| 867 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "6 | \n", + "0.001280 | \n", + "0.001318 | \n", + "
| 868 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "6 | \n", + "0.000213 | \n", + "0.001318 | \n", + "
869 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "11 | \n", + "0.000426 | \n", + "0.001896 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "7 | \n", + "0.001492 | \n", + "0.001207 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "9 | \n", + "0.000213 | \n", + "0.001551 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "6 | \n", + "0.002131 | \n", + "0.001034 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "2 | \n", + "0.000853 | \n", + "0.000345 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 870 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "3 | \n", + "0.000853 | \n", + "0.000517 | \n", + "
| 871 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "6 | \n", + "0.000426 | \n", + "0.001034 | \n", + "
| 872 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "15 | \n", + "0.001066 | \n", + "0.002586 | \n", + "
| 873 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "4 | \n", + "0.001279 | \n", + "0.000690 | \n", + "
| 874 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "7 | \n", + "0.000213 | \n", + "0.001207 | \n", + "
875 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "7 | \n", + "0.000427 | \n", + "0.001275 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "9 | \n", + "0.001494 | \n", + "0.001640 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "4 | \n", + "0.000213 | \n", + "0.000729 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "6 | \n", + "0.002134 | \n", + "0.001093 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "5 | \n", + "0.000854 | \n", + "0.000911 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 864 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "6 | \n", + "0.000854 | \n", + "0.001093 | \n", + "
| 865 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "5 | \n", + "0.000427 | \n", + "0.000911 | \n", + "
| 866 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "4 | \n", + "0.001067 | \n", + "0.000729 | \n", + "
| 867 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "13 | \n", + "0.001280 | \n", + "0.002368 | \n", + "
| 868 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "7 | \n", + "0.000213 | \n", + "0.001275 | \n", + "
869 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "4 | \n", + "0.000427 | \n", + "0.000806 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "8 | \n", + "0.001493 | \n", + "0.001612 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "1 | \n", + "0.000213 | \n", + "0.000201 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "5 | \n", + "0.002134 | \n", + "0.001007 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "9 | \n", + "0.000853 | \n", + "0.001813 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 865 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "4 | \n", + "0.000853 | \n", + "0.000806 | \n", + "
| 866 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "7 | \n", + "0.000427 | \n", + "0.001410 | \n", + "
| 867 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "9 | \n", + "0.001067 | \n", + "0.001813 | \n", + "
| 868 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "7 | \n", + "0.001280 | \n", + "0.001410 | \n", + "
| 869 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "9 | \n", + "0.000213 | \n", + "0.001813 | \n", + "
870 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "10 | \n", + "0.000426 | \n", + "0.001611 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "14 | \n", + "0.001491 | \n", + "0.002255 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "3 | \n", + "0.000213 | \n", + "0.000483 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "5 | \n", + "0.002129 | \n", + "0.000805 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "7 | \n", + "0.000852 | \n", + "0.001127 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 874 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "5 | \n", + "0.000852 | \n", + "0.000805 | \n", + "
| 875 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "3 | \n", + "0.000426 | \n", + "0.000483 | \n", + "
| 876 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "20 | \n", + "0.001065 | \n", + "0.003221 | \n", + "
| 877 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "4 | \n", + "0.001278 | \n", + "0.000644 | \n", + "
| 878 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "8 | \n", + "0.000213 | \n", + "0.001288 | \n", + "
879 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "7 | \n", + "0.000427 | \n", + "0.001014 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "4 | \n", + "0.001493 | \n", + "0.000579 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "4 | \n", + "0.000213 | \n", + "0.000579 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "11 | \n", + "0.002134 | \n", + "0.001593 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "8 | \n", + "0.000853 | \n", + "0.001159 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 865 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "2 | \n", + "0.000853 | \n", + "0.000290 | \n", + "
| 866 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "2 | \n", + "0.000427 | \n", + "0.000290 | \n", + "
| 867 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "8 | \n", + "0.001067 | \n", + "0.001159 | \n", + "
| 868 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "6 | \n", + "0.001280 | \n", + "0.000869 | \n", + "
| 869 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "11 | \n", + "0.000213 | \n", + "0.001593 | \n", + "
870 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "12 | \n", + "0.000426 | \n", + "0.002164 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "5 | \n", + "0.001493 | \n", + "0.000902 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "1 | \n", + "0.000213 | \n", + "0.000180 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "10 | \n", + "0.002132 | \n", + "0.001803 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "7 | \n", + "0.000853 | \n", + "0.001262 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 868 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "2 | \n", + "0.000853 | \n", + "0.000361 | \n", + "
| 869 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "6 | \n", + "0.000426 | \n", + "0.001082 | \n", + "
| 870 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "6 | \n", + "0.001066 | \n", + "0.001082 | \n", + "
| 871 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "4 | \n", + "0.001279 | \n", + "0.000721 | \n", + "
| 872 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "1 | \n", + "0.000213 | \n", + "0.000180 | \n", + "
873 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "2 | \n", + "7 | \n", + "0.000426 | \n", + "0.001063 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "7 | \n", + "4 | \n", + "0.001491 | \n", + "0.000608 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "2 | \n", + "0.000213 | \n", + "0.000304 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "10 | \n", + "9 | \n", + "0.002130 | \n", + "0.001367 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "5 | \n", + "0.000852 | \n", + "0.000760 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 873 | \n", + "MA0780.1_PAX3 | \n", + "4 | \n", + "1 | \n", + "0.000852 | \n", + "0.000152 | \n", + "
| 874 | \n", + "MA0883.1_Dmbx1 | \n", + "2 | \n", + "10 | \n", + "0.000426 | \n", + "0.001519 | \n", + "
| 875 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "5 | \n", + "7 | \n", + "0.001065 | \n", + "0.001063 | \n", + "
| 876 | \n", + "MA0842.2_NRL | \n", + "6 | \n", + "3 | \n", + "0.001278 | \n", + "0.000456 | \n", + "
| 877 | \n", + "MA0798.2_RFX3 | \n", + "1 | \n", + "5 | \n", + "0.000213 | \n", + "0.000760 | \n", + "
878 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "9 | \n", + "0.000852 | \n", + "0.001677 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "5 | \n", + "0.001278 | \n", + "0.000932 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "7 | \n", + "0.000213 | \n", + "0.001305 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "7 | \n", + "0.001918 | \n", + "0.001305 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "7 | \n", + "0.000852 | \n", + "0.001305 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 868 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "7 | \n", + "0.000426 | \n", + "0.001305 | \n", + "
| 869 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "1 | \n", + "0.000852 | \n", + "0.000186 | \n", + "
| 870 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "5 | \n", + "0.000852 | \n", + "0.000932 | \n", + "
| 871 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "7 | \n", + "0.000426 | \n", + "0.001305 | \n", + "
| 872 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "5 | \n", + "0.000852 | \n", + "0.000932 | \n", + "
873 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "4 | \n", + "0.000852 | \n", + "0.000636 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "2 | \n", + "0.001277 | \n", + "0.000318 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "3 | \n", + "0.000213 | \n", + "0.000477 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "8 | \n", + "0.001916 | \n", + "0.001272 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "9 | \n", + "0.000852 | \n", + "0.001432 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 872 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "1 | \n", + "0.000426 | \n", + "0.000159 | \n", + "
| 873 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "9 | \n", + "0.000852 | \n", + "0.001432 | \n", + "
| 874 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "8 | \n", + "0.000852 | \n", + "0.001272 | \n", + "
| 875 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "9 | \n", + "0.000426 | \n", + "0.001432 | \n", + "
| 876 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "6 | \n", + "0.000852 | \n", + "0.000954 | \n", + "
877 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "18 | \n", + "0.000852 | \n", + "0.002632 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "12 | \n", + "0.001277 | \n", + "0.001755 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "4 | \n", + "0.000213 | \n", + "0.000585 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "6 | \n", + "0.001916 | \n", + "0.000877 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "3 | \n", + "0.000852 | \n", + "0.000439 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 872 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "2 | \n", + "0.000426 | \n", + "0.000292 | \n", + "
| 873 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "9 | \n", + "0.000852 | \n", + "0.001316 | \n", + "
| 874 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "9 | \n", + "0.000852 | \n", + "0.001316 | \n", + "
| 875 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "8 | \n", + "0.000426 | \n", + "0.001170 | \n", + "
| 876 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "11 | \n", + "0.000852 | \n", + "0.001608 | \n", + "
877 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "2 | \n", + "0.000852 | \n", + "0.000381 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "10 | \n", + "0.001278 | \n", + "0.001907 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "2 | \n", + "0.000213 | \n", + "0.000381 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "10 | \n", + "0.001917 | \n", + "0.001907 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "6 | \n", + "0.000852 | \n", + "0.001144 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 871 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "1 | \n", + "0.000426 | \n", + "0.000191 | \n", + "
| 872 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "5 | \n", + "0.000852 | \n", + "0.000953 | \n", + "
| 873 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "3 | \n", + "0.000852 | \n", + "0.000572 | \n", + "
| 874 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "8 | \n", + "0.000426 | \n", + "0.001526 | \n", + "
| 875 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "8 | \n", + "0.000852 | \n", + "0.001526 | \n", + "
876 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "3 | \n", + "0.000851 | \n", + "0.000552 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "16 | \n", + "0.001277 | \n", + "0.002944 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "1 | \n", + "0.000213 | \n", + "0.000184 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "7 | \n", + "0.001916 | \n", + "0.001288 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "6 | \n", + "0.000851 | \n", + "0.001104 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 873 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "2 | \n", + "0.000426 | \n", + "0.000368 | \n", + "
| 874 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "1 | \n", + "0.000851 | \n", + "0.000184 | \n", + "
| 875 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "6 | \n", + "0.000851 | \n", + "0.001104 | \n", + "
| 876 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "3 | \n", + "0.000426 | \n", + "0.000552 | \n", + "
| 877 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "7 | \n", + "0.000851 | \n", + "0.001288 | \n", + "
878 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "7 | \n", + "0.000851 | \n", + "0.001077 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "10 | \n", + "0.001276 | \n", + "0.001539 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "4 | \n", + "0.000213 | \n", + "0.000615 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "6 | \n", + "0.001914 | \n", + "0.000923 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "12 | \n", + "0.000851 | \n", + "0.001846 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 878 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "6 | \n", + "0.000425 | \n", + "0.000923 | \n", + "
| 879 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "8 | \n", + "0.000851 | \n", + "0.001231 | \n", + "
| 880 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "7 | \n", + "0.000851 | \n", + "0.001077 | \n", + "
| 881 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "7 | \n", + "0.000425 | \n", + "0.001077 | \n", + "
| 882 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "5 | \n", + "0.000851 | \n", + "0.000769 | \n", + "
883 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "4 | \n", + "0.000851 | \n", + "0.000795 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "3 | \n", + "0.001276 | \n", + "0.000596 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "3 | \n", + "0.000213 | \n", + "0.000596 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "8 | \n", + "0.001914 | \n", + "0.001589 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "9 | \n", + "0.000851 | \n", + "0.001788 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 878 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "1 | \n", + "0.000425 | \n", + "0.000199 | \n", + "
| 879 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "3 | \n", + "0.000851 | \n", + "0.000596 | \n", + "
| 880 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "10 | \n", + "0.000851 | \n", + "0.001986 | \n", + "
| 881 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "4 | \n", + "0.000425 | \n", + "0.000795 | \n", + "
| 882 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "7 | \n", + "0.000851 | \n", + "0.001391 | \n", + "
883 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "6 | \n", + "0.000852 | \n", + "0.001028 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "13 | \n", + "0.001278 | \n", + "0.002228 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "22 | \n", + "0.000213 | \n", + "0.003770 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "4 | \n", + "0.001917 | \n", + "0.000686 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "5 | \n", + "0.000852 | \n", + "0.000857 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 869 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "12 | \n", + "0.000426 | \n", + "0.002057 | \n", + "
| 870 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "5 | \n", + "0.000852 | \n", + "0.000857 | \n", + "
| 871 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "2 | \n", + "0.000852 | \n", + "0.000343 | \n", + "
| 872 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "13 | \n", + "0.000426 | \n", + "0.002228 | \n", + "
| 873 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "19 | \n", + "0.000852 | \n", + "0.003256 | \n", + "
874 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "8 | \n", + "0.000852 | \n", + "0.001754 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "3 | \n", + "0.001278 | \n", + "0.000658 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "2 | \n", + "0.000213 | \n", + "0.000439 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "3 | \n", + "0.001917 | \n", + "0.000658 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "3 | \n", + "0.000852 | \n", + "0.000658 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 870 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "3 | \n", + "0.000426 | \n", + "0.000658 | \n", + "
| 871 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "2 | \n", + "0.000852 | \n", + "0.000439 | \n", + "
| 872 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "9 | \n", + "0.000852 | \n", + "0.001974 | \n", + "
| 873 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "6 | \n", + "0.000426 | \n", + "0.001316 | \n", + "
| 874 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "6 | \n", + "0.000852 | \n", + "0.001316 | \n", + "
875 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "11 | \n", + "0.000852 | \n", + "0.001896 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "7 | \n", + "0.001277 | \n", + "0.001206 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "9 | \n", + "0.000213 | \n", + "0.001551 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "6 | \n", + "0.001916 | \n", + "0.001034 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "2 | \n", + "0.000852 | \n", + "0.000345 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 872 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "3 | \n", + "0.000426 | \n", + "0.000517 | \n", + "
| 873 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "6 | \n", + "0.000852 | \n", + "0.001034 | \n", + "
| 874 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "15 | \n", + "0.000852 | \n", + "0.002585 | \n", + "
| 875 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "4 | \n", + "0.000426 | \n", + "0.000689 | \n", + "
| 876 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "7 | \n", + "0.000852 | \n", + "0.001206 | \n", + "
877 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "7 | \n", + "0.000852 | \n", + "0.001274 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "9 | \n", + "0.001278 | \n", + "0.001638 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "4 | \n", + "0.000213 | \n", + "0.000728 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "6 | \n", + "0.001917 | \n", + "0.001092 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "5 | \n", + "0.000852 | \n", + "0.000910 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 871 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "6 | \n", + "0.000426 | \n", + "0.001092 | \n", + "
| 872 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "5 | \n", + "0.000852 | \n", + "0.000910 | \n", + "
| 873 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "4 | \n", + "0.000852 | \n", + "0.000728 | \n", + "
| 874 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "13 | \n", + "0.000426 | \n", + "0.002365 | \n", + "
| 875 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "7 | \n", + "0.000852 | \n", + "0.001274 | \n", + "
876 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "4 | \n", + "0.000852 | \n", + "0.000805 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "8 | \n", + "0.001277 | \n", + "0.001609 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "1 | \n", + "0.000213 | \n", + "0.000201 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "5 | \n", + "0.001916 | \n", + "0.001006 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "9 | \n", + "0.000852 | \n", + "0.001811 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 872 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "4 | \n", + "0.000426 | \n", + "0.000805 | \n", + "
| 873 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "7 | \n", + "0.000852 | \n", + "0.001408 | \n", + "
| 874 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "9 | \n", + "0.000852 | \n", + "0.001811 | \n", + "
| 875 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "7 | \n", + "0.000426 | \n", + "0.001408 | \n", + "
| 876 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "9 | \n", + "0.000852 | \n", + "0.001811 | \n", + "
877 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "10 | \n", + "0.000851 | \n", + "0.001611 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "14 | \n", + "0.001277 | \n", + "0.002255 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "3 | \n", + "0.000213 | \n", + "0.000483 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "5 | \n", + "0.001915 | \n", + "0.000805 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "7 | \n", + "0.000851 | \n", + "0.001127 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 874 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "5 | \n", + "0.000426 | \n", + "0.000805 | \n", + "
| 875 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "3 | \n", + "0.000851 | \n", + "0.000483 | \n", + "
| 876 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "20 | \n", + "0.000851 | \n", + "0.003221 | \n", + "
| 877 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "4 | \n", + "0.000426 | \n", + "0.000644 | \n", + "
| 878 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "8 | \n", + "0.000851 | \n", + "0.001288 | \n", + "
879 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "7 | \n", + "0.000852 | \n", + "0.001013 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "4 | \n", + "0.001278 | \n", + "0.000579 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "4 | \n", + "0.000213 | \n", + "0.000579 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "11 | \n", + "0.001917 | \n", + "0.001592 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "8 | \n", + "0.000852 | \n", + "0.001158 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 870 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "2 | \n", + "0.000426 | \n", + "0.000289 | \n", + "
| 871 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "2 | \n", + "0.000852 | \n", + "0.000289 | \n", + "
| 872 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "8 | \n", + "0.000852 | \n", + "0.001158 | \n", + "
| 873 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "6 | \n", + "0.000426 | \n", + "0.000868 | \n", + "
| 874 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "11 | \n", + "0.000852 | \n", + "0.001592 | \n", + "
875 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "12 | \n", + "0.000852 | \n", + "0.002163 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "5 | \n", + "0.001278 | \n", + "0.000901 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "1 | \n", + "0.000213 | \n", + "0.000180 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "10 | \n", + "0.001917 | \n", + "0.001802 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "7 | \n", + "0.000852 | \n", + "0.001261 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 871 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "2 | \n", + "0.000426 | \n", + "0.000360 | \n", + "
| 872 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "6 | \n", + "0.000852 | \n", + "0.001081 | \n", + "
| 873 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "6 | \n", + "0.000852 | \n", + "0.001081 | \n", + "
| 874 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "4 | \n", + "0.000426 | \n", + "0.000721 | \n", + "
| 875 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "1 | \n", + "0.000852 | \n", + "0.000180 | \n", + "
876 rows × 5 columns
\n", + "| \n", + " | motif | \n", + "motif_a | \n", + "motif_b | \n", + "Diffusion_seqs | \n", + "Training_seqs | \n", + "
|---|---|---|---|---|---|
| 0 | \n", + "MA1548.1_PLAGL2 | \n", + "4 | \n", + "7 | \n", + "0.000851 | \n", + "0.001063 | \n", + "
| 1 | \n", + "MA0523.1_TCF7L2 | \n", + "6 | \n", + "4 | \n", + "0.001277 | \n", + "0.000607 | \n", + "
| 2 | \n", + "MA0662.1_MIXL1 | \n", + "1 | \n", + "2 | \n", + "0.000213 | \n", + "0.000304 | \n", + "
| 3 | \n", + "MA1601.1_ZNF75D | \n", + "9 | \n", + "9 | \n", + "0.001915 | \n", + "0.001367 | \n", + "
| 4 | \n", + "MA0840.1_Creb5 | \n", + "4 | \n", + "5 | \n", + "0.000851 | \n", + "0.000759 | \n", + "
| ... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
| 875 | \n", + "MA0780.1_PAX3 | \n", + "2 | \n", + "1 | \n", + "0.000426 | \n", + "0.000152 | \n", + "
| 876 | \n", + "MA0883.1_Dmbx1 | \n", + "4 | \n", + "10 | \n", + "0.000851 | \n", + "0.001519 | \n", + "
| 877 | \n", + "MA1539.1_NR2F6(var.3) | \n", + "4 | \n", + "7 | \n", + "0.000851 | \n", + "0.001063 | \n", + "
| 878 | \n", + "MA0842.2_NRL | \n", + "2 | \n", + "3 | \n", + "0.000426 | \n", + "0.000456 | \n", + "
| 879 | \n", + "MA0798.2_RFX3 | \n", + "4 | \n", + "5 | \n", + "0.000851 | \n", + "0.000759 | \n", + "
880 rows × 5 columns
\n", + "