Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 47 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
![](images/logo.png)
---
[![PyPI](https://img.shields.io/pypi/v/ModDotPlot?color=blue&label=PyPI)](https://pypi.org/project/ModDotPlot/)
[![CI](https://github.com/marbl/ModDotPlot/actions/workflows/black.yml/badge.svg)](https://github.com/marbl/ModDotPlot/actions/workflows/black.yml)

- [](#)
- [Cite](#cite)
- [About](#about)
- [Installation](#installation)
Expand All @@ -19,28 +23,30 @@

## Cite

Alexander P Sweeten, Michael C Schatz, Adam M Phillippy, ModDotPlot—rapid and interactive visualization of tandem repeats, Bioinformatics, Volume 40, Issue 8, August 2024, btae493, [https://doi.org/10.1093/bioinformatics/btae493](https://pmc.ncbi.nlm.nih.gov/articles/PMC11321072/)
Alexander P Sweeten, Michael C Schatz, Adam M Phillippy, ModDotPlot—rapid and interactive visualization of tandem repeats, Bioinformatics, Volume 40, Issue 8, August 2024, btae493, [https://doi.org/10.1093/bioinformatics/btae493](https://doi.org/10.1093/bioinformatics/btae493)

If you use ModDotPlot for your research, please cite our software!

---

## About

ModDotPlot is a dot plot visualization tool designed for large sequences and whole genomes. ModDotPlot outputs an identity heatmap similar to [StainedGlass](https://mrvollger.github.io/StainedGlass/) by rapidly approximating the Average Nucleotide Identity between pairwise combinations of genomic intervals. This significantly reduces the computational time required to produce these plots, enough to view multiple layers of resolution in real time!
_ModDotPlot_ is a dot plot visualization tool designed for large sequences and whole genomes. _ModDotPlot_ outputs an identity heatmap similar to [StainedGlass](https://mrvollger.github.io/StainedGlass/) by rapidly approximating the Average Nucleotide Identity between pairwise combinations of genomic intervals. This significantly reduces the computational time required to produce these plots, enough to view multiple layers of resolution in real time!

![](images/demo.gif)

---

## Installation

_ModDotPlot_ can be installed by running `pip install moddotplot`. It requires Python 3.7+ to run. Alternatively, you can download the current release from GitHub by using:

```
git clone https://github.com/marbl/ModDotPlot.git
cd ModDotPlot
```

Although optional, it's recommended to setup a virtual environment before using ModDotPlot:
Although optional, it's recommended to setup a virtual environment before using _ModDotPlot_:

```
python -m venv venv
Expand All @@ -53,7 +59,7 @@ Once activated, you can install the required dependencies:
python -m pip install .
```

Finally, confirm that the installation was installed correctly by running `moddotplot -h`:
Finally, confirm that the installation was installed correctly and that your version is up to date by running `moddotplot -h`:
```
__ __ _ _____ _ _____ _ _
| \/ | | | | __ \ | | | __ \| | | |
Expand All @@ -62,6 +68,8 @@ Finally, confirm that the installation was installed correctly by running `moddo
| | | | (_) | (_| | | |__| | (_) | |_ | | | | (_) | |_
|_| |_|\___/ \__,_| |_____/ \___/ \__| |_| |_|\___/ \__|

v0.9.4

usage: moddotplot [-h] {interactive,static} ...

ModDotPlot: Visualization of Complex Repeat Structures
Expand All @@ -78,7 +86,7 @@ options:

## Usage

ModDotPlot must be run either in `interactive` mode, or `static` mode:
_ModDotPlot_ must be run either in `interactive` mode, or `static` mode:

### Interactive Mode

Expand All @@ -94,13 +102,15 @@ This will launch a [Dash application](https://plotly.com/dash/) on your machine'
moddotplot static <ARGS>
```

This skips running Dash and quickly creates plots under the specified output directory using [plotnine](https://plotnine.readthedocs.io/en/v0.12.4/). By default, running ModDotPlot in static mode this will produce the following files:
Running _ModDotPlot_ in static mode skips running Dash and quickly creates plots under the specified output directory `-o`. By default, running _ModDotPlot_ in static mode this will produce the following files:

- A paired-end bed file, containing intervals alongside their corresponding identity estimates.
- A self-identity dotplot for each sequence.
- A paired-end bed file `.bedpe`, containing intervals alongside their corresponding identity estimates.
- A self-identity dotplot for each sequence, as both an upper triangle matrix `_TRI` and full matrix `_FULL` representation.
- A histogram of identity values for each sequence.

See [static mode commands](#static-mode-commands) for further info.

All plots and histograms are output in a vectorized (default: `.svg`) and rasterized `.png` image. [Plotnine](https://plotnine.readthedocs.io/en/v0.12.4/) is the Python plotting library used, with [CairoSVG](https://cairosvg.org) used for converting between image formats.

_ModDotPlot_ supports highly customizable plotting features in static mode. See [static mode commands](#static-mode-commands) for a complete list of features.

---

Expand All @@ -112,6 +122,10 @@ The following arguments are the same in both interactive and static mode:

Fasta files to input. Multifasta files are accepted. Interactive mode will only support a maximum of two sequences at a time.

`-b / --bed <.bed file>`

Input bedfile used for dotplot annotation (note: this is not the same as the paired-end bed file produced by ModDotPlot). If selected, this will produce an annotated bedtrack image `_ANNOTATION.svg`. Name in the bedfile must match the name of the fasta sequence header (or input, if `-l` is used instead) in order to produce a correct bed track.

`-k / --kmer <int>`

K-mer size to use. This should be large enough to distinguish unique k-mers with enough specificity, but not too large that sensitivity is removed. Default: 21.
Expand Down Expand Up @@ -184,15 +198,15 @@ Load previously saved matrices. Used instead of `-f/--fasta`

Run `moddotplot static` with a config file, rather than (sample syntax). Recommended when creating a really customized plot. Used instead of `-f/--fasta`.

`-b / --bed <.bed file>`
`-l / --load <.bedpe file>`

Create a plot from a previously computed pairwise bed file. Skips Average Nucleotide Identity computation. Used instead of `-f/--fasta`.
Create a plot from a previously computed pairwise bed file. Skips Average Nucleotide Identity computation. Used instead of `-f/--fasta`. Will only accept paired-end bed files produced by ModDotPlot.

`-w / --window <int>`

Window size. Unlike interactive mode, only one matrix will be created, so this represents the *only* window size. Default is set to `n/1000` (eg. 3000bp for a 3Mbp sequence).

`--no-bed <bool>`
`--no-bedpe <bool>`

Skip output of bed file.

Expand All @@ -204,9 +218,17 @@ Skip output of histogram legend.

Adjust width of self dot plots. Default is 9 inches.

`--dpi <bool>`
`--dpi <int>`

Image resolution in dots per inch (not to be confused with dotplot resolution). Default is 600.
Image resolution in dots per inch (not to be confused with dotplot resolution). Default is `300`.

`--vector <str>`

Vectorized image format to output to. Must be one of ["svg", "pdf", "ps"]. Default: `svg`

`--deraster <bool>`

By default, vectorized ouptuts rasterize the actual plot (not the axis). This is done to save space, as a high-resolution dotplot can be extremely space inefficient and prevent use of image manipulation software. This plot rasterization can be removed using this flag.

`--palette <str>`

Expand All @@ -226,11 +248,11 @@ Add custom identity threshold breakpoints. Note that the number of breakpoints m

`-t / --axes-ticks <list of ints>`

Custom tickmarks for x and y axis. Values outside of the axes-limits will not be shown.
Custom tickmarks for x and y axis. Values outside of the `--axes-limits` will not be shown.

`-a / --axes-limits <int>`

Change axis limits for x and y axis. Useful for comparing multiple plots, allowing them to stay in scale.
Change axis limits for x and y axis. Useful when comparing multiple plots, allowing them to stay in scale.

`--bin-freq <bool>`

Expand Down Expand Up @@ -308,7 +330,6 @@ $ cat config/config.json

{
"identity": 90,
"sparsity": 10,
"palette": "OrRd_7",
"breakpoints": [
90,
Expand Down Expand Up @@ -338,13 +359,13 @@ $ moddotplot static -c config/config.json

Running ModDotPlot in static mode

Retrieving k-mers from Chr1:14M-18M....
Retrieving k-mers from Chr1:14000000-18000000....

Progress: |████████████████████████████████████████| 100.0% Completed

Chr1:14M-18M k-mers retrieved!
Chr1:14000000-18000000 k-mers retrieved!

Computing self identity matrix for chr1:14M-18M...
Computing self identity matrix for chr1:14000000-18000000...

Sequence length n: 4000000

Expand All @@ -357,13 +378,13 @@ Computing self identity matrix for chr1:14M-18M...
Progress: |████████████████████████████████████████| 100.0% Completed


Saved bed file to Chr1_cen_plots/Chr1:14M-18M.bed
Saved bed file to Chr1_cen_plots/Chr1:14000000-18000000.bed

Plots created! Saving to Chr1_cen_plots/Chr1:14M-18M...
Plots created! Saving to Chr1_cen_plots/Chr1:14000000-18000000...

Chr1_cen_plots/Chr1:14M-18M_TRI.png, Chr1_cen_plots/Chr1:14M-18M_TRI.pdf, Chr1_cen_plots/Chr1:14M-18M_FULL.png, Chr1_cen_plots/Chr1:14M-18M_FULL.png, Chr1_cen_plots/Chr1:14M-18M_HIST.png and Chr1_cen_plots/Chr1:14M-18M_HIST.pdf, saved sucessfully.
```
![](images/Chr1:14M-18M_FULL.png)
![](images/Chr1:14000000-18000000_FULL.png)

---

Expand All @@ -376,6 +397,8 @@ ModDotPlot can produce an a vs. b style dotplot for each pairwise combination of
moddotplot interactive -f sequences/chr15_segment.fa sequences/chr21_segment.fa --compare-only
```

![](images/chr14:2000000-5000000_chr21:2000000-5000000.png)

---

## Questions
Expand All @@ -386,10 +409,6 @@ For bug reports or general usage questions, please raise a GitHub issue, or emai

## Known Issues

- Plot width and xlim (limiting the x axis to a different amount) currently do not work. I plan to have those working in v0.9.0.

- Mac users might encounter the following unexpected command line output: `/bin/sh: lscpu: command not found`. This is a known issue with Plotnine, the Python plotting library used by ModDotPlot. This can be safely ignored.

- If you encounter an error with the following traceback: `rv = reductor(4) TypeError: cannot pickle 'generator' object`, ths means that you have a newer version of Plotnine that is incompatible with ModDotPlot. Please uninstall plotnine and reinstall version 0.12.4 `pip install plotnine==0.12.4`.

- In interactive mode, comparing sequences of two sizes will lead to errors in zooming for the larger sequence. I plan to fix this in v0.9.0.
1 change: 0 additions & 1 deletion config/config.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
{
"identity": 90,
"sparsity": 10,
"palette": "OrRd_7",
"breakpoints": [
90,
Expand Down
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "ModDotPlot"
version = "0.9.3"
version = "0.9.4"
requires-python = ">= 3.7"
dependencies = [
"pysam",
Expand All @@ -14,12 +14,12 @@ dependencies = [
"plotnine==0.12.4",
"palettable",
"mmh3",
"tk",
"setproctitle",
"numpy",
"pillow",
"patchworklib==0.6.3",
"cairosvg"
"cairosvg",
"pygenometracks",
]
authors = [
{name = "Alex Sweeten", email = "alex.sweeten@nih.gov"},
Expand Down
2 changes: 1 addition & 1 deletion src/moddotplot/const.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
VERSION = "0.9.3"
VERSION = "0.9.4"
COLS = [
"#query_name",
"query_start",
Expand Down
12 changes: 6 additions & 6 deletions src/moddotplot/estimate_identity.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ def partitionOverlaps(
try:
kmer_list.append(lst[delta_start_index:delta_end_index])
except Exception as e:
print("test")
print("Error in appending list of kmers...\n")
print(e)
kmer_list.append(lst[delta_start_index:seq_len])
counter += win
Expand Down Expand Up @@ -167,7 +167,7 @@ def convertToModimizers(


def convertMatrixToBed(
matrix, window_size, id_threshold, x_name, y_name, self_identity
matrix, window_size, id_threshold, x_name, y_name, self_identity, x_offset, y_offset
):
bed = [
(
Expand All @@ -187,10 +187,10 @@ def convertMatrixToBed(
value = matrix[x, y]
if (not self_identity) or (self_identity and x <= y):
if value >= id_threshold / 100:
start_x = x * window_size + 1
end_x = (x + 1) * window_size
start_y = y * window_size + 1
end_y = (y + 1) * window_size
start_x = x * window_size + x_offset
end_x = start_x + window_size - 1
start_y = y * window_size + y_offset
end_y = start_y + window_size - 1

bed.append(
(
Expand Down
Loading