tRNAgraph is a tool for analyzing tRNA-seq data generated from tRAX. It can be used to create an AnnData object from a tRAX coverage file or to analyze an existing database object and generate expanded visualizations. The database object can also be used to perform further analysis beyond the scope of what tRAX can do.
tRAX is a tool often used for analyzing tRNA-seq data. While it generates a comprehensive set of results, it does not provide a way to visualize specific meta-data associated with a particular experiment. tRNAgraph is a tool that can be used to create a database object from a tRAX coverage file containing various experimental conditions not captured by tRAX. The database object can then be used to generate a variety of visualizations, including heatmaps, coverage plots, PCA plots, and more that are more specific to the experimental conditions of interest.
%%{ init: { 'flowchart': { 'curve': 'basis' } } }%%
flowchart TD
subgraph sg1 [trnagraph.py build]
I1([tRAX Output]) --> B[Build AnnData]
M([metadata.tsv]) -->|optional| B
end
subgraph sg1.1 [trnagraph.py build]
I1.1([tRAX Output]) --> B.1[Build AnnData]
M.1([metadata.tsv]) -->|optional| B.1
end
B --> I2
B.1 --> I2.1
subgraph O [Outputs]
G1
G2
G3
G4
D([CSVs of AnnData])
end
subgraph sgI1 [AnnData]
I2([Primary AnnData])
I2.1([Supplement AnnData])
I2.2([Updated AnnData])
end
I2 --> sg3
subgraph sg3 [trnagraph.py cluster]
C1[Preprocessing] -->|UMAP| C2[Dimensionality Reduction] -->|HDBSCAN| C3[Clustering]
end
sg3 --> I2.2
I2.1([Supplement AnnData]) --> C
I2 --> C
subgraph sg4 [trnagraph.py merge]
C[Combine]
end
C --> I2.2
I2 --> G{Graph Selection}
subgraph sg5 [trnagraph.py tools]
T --> L([log2fc])
T --> D1([csv]) --> D
end
I2 --> T{tools}
subgraph sg2 [trnagraph.py graph]
G -->|default| G1([bar, correlation, count, coverage, logo, pca, radar])
G -->|default| G4([heatmap, volcano]) --> L([log2fc]) --> I2.2
G -->|requires config.json| G2([comparison])
G -->|requires clustering| G3([cluster])
J([config.json]) --> G
end
subgraph sg1.2 [Optional Downstream Analysis]
direction LR
R1(Python, PyTorch, R, Julia, etc.)
end
I2 --> sg1.2
I2.2 <--> sg1.2
style I1 fill:#51BD38,stroke:#333,stroke-width:2px,color:#000000
style I1.1 fill:#51BD38,stroke:#333,stroke-width:2px,color:#000000
style M fill:#25EEFF,stroke:#333,stroke-width:2px,color:#000000
style M.1 fill:#25EEFF,stroke:#333,stroke-width:2px,color:#000000
style J fill:#25EEFF,stroke:#333,stroke-width:2px,color:#000000
style D fill:#FF6AE6,stroke:#333,color:#000000
style G1 fill:#FF6AE6,stroke:#333,color:#000000
style G2 fill:#FF6AE6,stroke:#333,color:#000000
style G3 fill:#FF6AE6,stroke:#333,color:#000000
style G4 fill:#FF6AE6,stroke:#333,color:#000000
style I2 fill:#51BD38,stroke:#333,stroke-width:2px,color:#000000
style I2.1 fill:#25EEFF,stroke:#333,color:#000000
style I2.2 fill:#FF6AE6,stroke:#333,color:#000000
style sg1.1 stroke:#25EEFF,stroke-width:6px
style O stroke:#FF6AE6,stroke-width:6px
style sg1 stroke:#51BD38,stroke-width:6px
style sgI1 stroke:#51BD38,stroke-width:6px
Dependencies can be installed using conda:
conda env create -f requirements.yamlconda activate tRNAgraphtRNAgraph can be used with build, cluster, merge, graph and tools commands. The build command generates an AnnData object from a tRAX coverage file. The graph command creates visualizations from the database object. The cluster command is used to cluster the database object. The merge command is used to merge two database objects. The following sections will describe how to use each command.
tRNAgraph will work a tRAX output directory and a metadatafile.
- tRAX output must be generated with the Python3 Version >=
v1.1.0-beta, older versions will given incorrect results!- tRAX cannot use the
--nofragflag, as it will discard read associated information.
- tRAX cannot use the
You must attribute the meta-data associated with the samples if you want to generate graphs based on specific experimental conditions. To do this, you can provide a .tsv/.csv file (-m/--metadatafile) containing the sample names, sample groups, and any meta-data associated with the samples. This file must also include a header row with, at minimum, sample and group columns. The meta-data file can contain any number of additional columns corresponding to the experimental conditions you want to visualize. An example meta-data file is shown below:
sample group celltype treatment condition
sample1 sampleGroup1 celltypeX treatmentA condition1
sample2 sampleGroup1 celltypeX treatmentA condition2
sample3 sampleGroup2 celltypeY treatmentB condition1- If you wish to run the tool without providing any metadata, you can instead provide the tRAX samples file used to generate your tRAX run and add the header row to the top of the file. The samples file should contain the
sample group fastqcolumns. - If no header column for each observations is provided, then all observations will be annotated automatically as
obs_#where#is the ordered observation.
The tRNAgraph.py script can be used to build a database object from a tRAX directory with the following command:
python trnagraph.py build -i <tRAX_directory> -s <tRAX_samples_file> -o <output_file> -l <list_of_observations>-ior--traxdiris the path to the tRAX directory you want to build an AnnData database object from.-mor--metadatais the path to the metadata file as described in the metadata section.-oor--outputis the path to the output file. The output file should be a.h5adfile. By default, the output file will be namedh5ad/trnagraph.h5adif no path is provided.--logis used to output a log of the shell commands that generate the AnnData object. By default, the log will not be output.-qor--quietsuppresses the shell commands' output. By default, the output will be displayed.
The database object can be clustered using the cluster function. The following code will cluster the database object using UMAP and cluster using HDBSCAN. The default parameters used in tRNAgraph work well on ARMseq, DM-tRNAseq, and OTTRseq data; however, each dataset is different and may require fine-tuning of the parameters to yield the best results. The following code will cluster the database object and save the result to a new database object:
python trnagraph.py cluster -i <input_database> -o <output_database>Clustering is performed across the uniquecoverage, readstarts, readends, mismatchedbases, and deletions categories of the AnnData object. When performing clustering, verifying that it is reproducible and that the results reflect the data is important. This can be done by running the clustering multiple times and comparing the results. The clustering is also performed on sample and group observations. In the case of samples, every set of reads for every single tRNA is used for clustering. In the case of groups, the mean of the reads is taken for each tRNA across the read categories and then used for clustering. This is done to reduce the number of samples used for clustering and to reduce the noise in the clustering. The results will be saved in the obs attribute of the database object as sample_cluster\umap1\umap2 and group_cluster\umap1\umap2, respectively. Clusters annotated as -1 are considered noise and are not included in the clustering. Plotting of the clustering is done as well for convenience.
-ior--anndata- The input database object.-oor--output- The output database object, if clustering was already performed on the input database object, the output database object must also be run with the--overwriteflag.-wor--overwrite- Whether to overwrite the output database object if it already contains clustering information.-ror--randomstate- Specify the random state for UMAP if you want to have a static seed (default: None) (optional). UMAP clustering should give reproducible results regardless of the random state. However, you may want to use a static seed to reproduce the same results for a particular dataset. It is important to verify that the clustering is reproducible before using a static seed.-tor--readcutoff- Specify read count cutoff for clustering (default: 15) (optional). This is the minimum number of reads required for a tRNA to be included in the clustering. This is useful for removing tRNAs not expressed in the dataset, as they can skew the clustering.-c1or--ncomponentsmp- Specify the number of components to use for UMAP clustering of samples (default: 2) (optional). This is the number of components to use for the UMAP clustering of the samples. Since this is independent of the number of components used for the UMAP plotting, you can use more components for clustering to get better results. This will need to be tuned for each dataset and depends on the variability of the dataset.-c2or--ncomponentgrp- Specify the number of components to use for UMAP clustering of groups (default: 2) (optional). See above.-l1or--neighborclusmp- Specify the number of neighbors to use for UMAP clustering of samples (default: 150) (optional). This is the number of neighbors to use for the UMAP clustering of the samples. This will need to be tuned for each dataset and depends on the variability of the dataset. A higher number of neighbors will generally yield a wider dispersion of the clusters across the projection.-l2or--neighborclusgrp- Specify the number of neighbors to use for UMAP clustering of groups (default: 40) (optional). See above.-n1or--neighborstdsmp- Specify the number of neighbors to use for UMAP projection plotting of samples (default: 75) (optional). See above.-n2or--neighborstdgrp- Specify the number of neighbors to use for UMAP projection plotting of groups (default: 20) (optional). See above.-d1or--hdbscanminsampsmp- Specify minimum sample size to use for HDBSCAN clustering of samples (default: 5) (optional). This is the minimum number of samples required to be considered a cluster.-d2or--hdbscanminsampgrp- Specify minsamples size for HDBSCAN group clustering (default: 3) (optional). See above.-b1or--hdbscanminclusmp- Specify the minimum cluster size for HDBSCAN sample clustering (default: 30) (optional). This is the minimum number of samples required to be considered a cluster.-b2or--hdbscanminclugrp- Specify the minimum cluster size for HDBSCAN group clustering (default: 10) (optional). See above.--clusterobsexperimental- Whether to use observations for clustering by incorporating them into the adata.X and adata.var.- NOTE!: This is an experimental test feature and should be used with caution. It is not recommended to use this feature unless you are familiar with the AnnData object and the clustering process and are certain that this is the desired behavior.
--logis used to output a log of the shell commands that generate the clustered AnnData object. By default, the log will not be output.-qor--quietsuppresses the shell commands' output. By default, the output will be displayed.
When working with downstream analysis of the cluster groups, it is important to note that reads that are dropped via the --readcutoff flag will not be included in the clustering however, they are still present in the AnnData object. This means that your object can contain NaN values in the clustering columns. Depending on your use case, you may want to filter these out before performing any analysis.
Two database objects can be merged using the merge function. The following code will merge two database objects and save the result to a new database object:
python trnagraph.py merge -i1 <input_database1> -i2 <input_database2> -o <output_database> -i1or--anndata1- The first input database object.-i2or--anndata2- The second input database object.-oor--outputThe output database object.--dropno- Whether to drop samples not found in both nontRNA_counts files. By default, samples not found in both nontRNA_counts files are kept and filled with zeros.- Different sequencing methods and the input GTF for tRAX can yield vastly different results. To minimize this, it is recommended to use the same GTF file and sequencing method for both input files.
--droprna- Whether to drop samples not found in both type_counts files. By default, samples not found in both type_counts files are kept and filled with zeros.- Different sequencing methods and the input GTF for tRAX can yield vastly different results. To minimize this, it is recommended to use the same GTF file and sequencing method for both input files.
--logoutputs a log of the shell commands used to generate the merged AnnData object. By default, the log will not be output.-qor--quietsuppresses the shell commands' output. By default, the output will be displayed.
The tRNAgraph.py script can be used to generate a variety of visualizations from a database object with the following command:
python trnagraph.py graph -i <input_database> -o <output_directory> -g <graph_type> <graph_parameters>-ior--anndatais the path to the database object you want to generate visualizations from.-oor--outputis the path to the output directory. The output directory will be namedfiguresby default if no path is provided.-gor--graphis the type of graph to generate. The following graphs can be generated:bar- Generates bar plots of the tRNA coverage.cluster- Generates cluster plots of the tRNA coverage.compare- Generates compare plots of the tRNA coverage.correlation- Generates correlation plots of the tRNA coverage.count- Generates count-bar plots of the tRNA amino and isotype distributions.coverage- Generates coverage plots.heatmap- Generates heatmaps of the differential tRNA expression.logo- Generates seqlogos of the tRNA coverage.pca- Generates PCA plots.radar- Generates radar plots of the tRNA coverage.volcano- Generates volcano plots of differential tRNA expression.all- Generates all of the above plots. Excludecompareandclusteras they require additional parameters.
--configis an optional flag to the path to a JSON file containing additional graph parameters. [See the configuration section below for more details.](#Configuration Files)-nor--threadsis the number of threads to use for generating the graphs. By default, the number of threads will be set to the CPU_MAX. This is primarily useful for generating coverage plots and seqlogos as they can take a long time to generate. It will, however, run all non-threaded plots in parallel.--logis used to output a log of the shell commands used to generate the graphs. By default, the log will not be output.-qor--quietsuppresses the shell commands' output. By default, the output will be displayed.-vor--verboseis used to increase the verbosity of the output, including all flags used in the output. By default, the output will not be verbose.
The following parameters are optional and can be used to customize the graphs:
--barcolis the observation for grouping the bar plots into columns. If no observation is provided, the bar plots will be grouped by group.--bargrpis the observation for grouping the bar plot stacks. If no observation is provided, the bar plots will be stacked by amino group.--barsubgrpis the observation to use if you want to perform a secondary grouping of the bar plots. By default, this is disabled.--barsortis the observation for sorting the bar plots. If no observation is provided, the bar plots will be sorted by--barcolby default.--barlabelis the observation for labeling the bar plots. If no observation is provided, the bar plots will be labeled by--barcolby default.- This is useful if you want to label the bar plots with custom names such as manual annotations on top of hdbscan clusters.
--clustergrpis the observation to use for coloring the cluster plots. If no observation is provided, the cluster plots will be colored by amino group.--clusterlabelsis whether to include labels in the cluster plots. By default, this is disabled. If enabled, provide an AnnData observation for the labels, it doesn't have to be the same observation as the--clustergrpflag. This is useful if you have custom names for your HDBscan clusters.--clusteroverviewis whether to generate a 2x2 overview of the cluster plots for amino, iso, readcount, and HDBscan cluster. By default, this is disabled. If enabled, the--clustergrpflag will be ignored.--clusternumericis whether to use numeric values for the cluster plots. By default, this is disabled. Enabling this will plot a color bar with the numeric values in the legend.--clustermaskis whether to mask the cluster plots by the HDBSCAN unclustered points. By default, this is disabled. Enabling this will plot the unclustered points in gray.
--comparegrp1is the observation for grouping the compare plots. If no observation is provided, the compare plots will be grouped by sample group.--comparegrp2is the observation for grouping the compare plots. If no observation is provided, the compare plots will be grouped by sample group.
--corrmethodis the method to use for calculating the correlation. The following methods can be used:pearson- Pearson correlation coefficient.spearman- Spearman rank correlation.kendall- Kendall Tau correlation coefficient.
--covgrpis the observation for grouping the coverage plots. If no observation is provided, the coverage plots will be grouped by sample group.--covobsis the observation used as the basis for each individual coverage plot. If no observation is provided, the coverage plots will be created pertrna.--covcombineis whether to combine the coverage plots into a single plot of the mean for each observation within the group. Enabling this will disable the--coveragegrpflag, and the combined output will be given instead to thecombinedfolder.--covtypeis the type of tRNA coverage to plot, by default, it will plot unique coverage. All possible coverage types match the tRAX coverage types.--covgapis whether to include tRNA gaps in the coverage plots. By default, tRNA gaps will be skipped in the coverage plots.--combinedpdfonlyis whether to only generate the combined PDFs for the coverage plots. By default, the coverage plots for every--covobswill be generated.
--heatgrpis the observation for grouping the heatmap plots. If no observation is provided, the heatmap plots will be grouped by sample group.--diffrtsis the read type for the heatmap/volcano plots. The heatmap/volcano plots will be generated using the unique wholecounts, fiveprime, threeprime, other and total normalized reads by default.--heatcutoffis the cutoff for reads to include in the heatmap plots. The heatmap plots will discard anything with less than 80 reads by default.--heatboundis the range to bound the heatmap plots. By default, the heatmap plots will be bounded by the data's top 25 and the bottom 25 values for each comparative column.
--logogrpis the observation for grouping the logo plots. If no observation is provided, the logo plots will be grouped by amino group.--logomanualgrpInstead of using observation to group the logo plots, you can provide a list of tRNAs to group the logo plots. This is useful if you want to look at a specific set of tRNAs.--logomanualnameif you are using--logomanualgrp, you can provide a name for the group of tRNAs for the output file. If no name is provided, the name will bemanualwith a timestamp.--logopseudocountis the pseudocount for the logo plots. By default, the logo plots will use a pseudocount of 20.--logosizeis the size of the sequences (sprinzl, noloop, or full) for the logo plots. By default, the logo plots will use the non-extension loop sequences. But you can also use the full sequences or the numerical sprinzl positions without gaps.--ccatailis whether or not to include the CCA tail in the logo plots. By default, the logo plots will discard the CCA tail.--pseudogenesis whether or not to include pseudogenes in the logo plots. By default, the logo plots will discard pseudogenes (tRX).--logornamodeis wether to print the output as RNA (U) rather than DNA (T). By default, the output will be DNA.
--pcamarkersis the observation for choosing which markers to populate the PCA plot. By default, the samples will be used as markers.--pcacolorsis the observation for coloring the PCA plot. If no observation is provided, the PCA plot will be colored by samples.
--radargrpis the observation for grouping the radar plots. If no observation is provided, the radar plots will be grouped by sample group.--radarscaledis whether to scale the radar plots to 100% or allow for automatic scaling. By default, the radar plots will not be scaled.--radarmethodis the statistical method for calculating the radar plots. The following methods can be used:mean,median,max,sum, orall. By default, the radar plots will use the mean.
--volgrpis the observation for grouping the volcano plots. If no observation is provided, the volcano plots will be grouped by sample group.--diffrtssee above under Heatmap Plots.--volcutoffis the cutoff for reads to include in the volcano plots. The volcano plots will discard anything with less than 80 reads by default.
The tools function can be used to perform various operations on the database object.
log2fc- The log2 fold change of the database object can be calculated using thelog2fcfunction. This same function is used in thevolcanoandheatmapplots. This can be useful if you want to export specific log2fc values for further analysis.csv- The database object can be exported to a series of CSV files using thecsvfunction. This can be useful if you want to perform further analysis in a different program.
Using the database object for further analysis is easy and follows the same syntax as using the AnnData object. For example, the following code will filter the database object to only include the coverage type unique and drop the gap positions:
import anndata as ad
adata = ad.read_h5ad("tRNAgraph.h5ad")
adata = adata[adata.var["coverage"] == "unique"]
adata = adata[adata.var["gap"] == False]If you wanted to filter the database object further only to include samples from group A and tRNA tRNA-Ala-AGC-1 you could use the following code:
adata = adata[adata.obs["group"] == "A"]
adata = adata[adata.obs["trna"] == "tRNA-Ala-AGC-1"]The resulting table can be called using adata.X.
JSON files can be used for complicated filtering and grouping of the data as well as using custom colormaps. If a config.json is provided with the --config flag a name for filtering and output must be provided.
name- This is a name for the filtering configuration and will be saved as a subfolder in the output directory.obsandvar- Are conditions to filter on in the AnnData observation and variable categories, respectively. The values can be a single value or a list of values. If a list of values is provided, the data will be filtered to include only those in the list. The data will not be filtered on that category if no values are provided.obs_randvar_r- Can also be used to filter the data to exclude the values in the list, rather than include them. This is useful if you want to exclude tRNA pseudogenes, for example.
{
"name": "name",
"obs": {
"treatment": ["treatment 1"],
"celltype": ["HEK293"],
"pseudogene": ["tRNA"]
},
"obs_r": {
"amino": ["Und"]
},
"var": {
"variable_1": ["value1", "value2"]
}
}A custom colormap.json can also be provided with the --colormap flag. The values can be a hex color code, RGB tuple value, or matplotlib color names. If no colormap is provided, the default colormap will be used. The colormap will only be used if the observation for the colormap is selected, generating the plot. For example, if coverage plots are generated, the colormap will only be used if the --coveragegrp flag matches an existing colormap. The JSON file should be formatted as follows:
{
"group": {
"A": "lightskyblue",
"B": "deepskyblue",
"C": "royalblue"
},
"amino": {
"Ala": "#1F77B4",
"Arg": "#AEC7E8",
"Asn": "#FF7F0E",
"Asp": "#FFBB78",
"Cys": "#6fe835",
"Gln": "#d0f2cb",
"Glu": "#D62728",
"Gly": "#FF9896",
"His": "#9258f5",
"Ile": "#deccfc",
"Leu": "#a65223",
"Lys": "#ffceb3",
"iMet": "#00d5e3",
"Met": "#b8fbff",
"Phe": "#edd500",
"Pro": "#ffff99",
"Ser": "#db56bc",
"Thr": "#F7B6D2",
"Trp": "#2CA02C",
"Tyr": "#98DF8A",
"Val": "#6a3d9a",
"SeC": "#C5B0D5",
"Sup": "#808080"
},
"obs_etc": {
"value1": "#1F77B4",
"value2": "#FF7F0E"
}
}Some plots default to using group as the default category for plotting making a colormap with this name will override the default colormap in those cases.
The database object aggregates all the information from the tRAX directory into a single object, allowing easy calls to the data. In addition to using the following variables as flags for figure generation, they can be used for further analysis and data manipulation independent of tRNAgraph.
The observations are the metadata categories used to group and color the plots derived from the provided metadata. The observations are stored in the obs attribute of the database object as a pandas data frame. The following observations are automatically added to the database object:
trna- The tRNA name.iso- The tRNA isotype group.amino- The tRNA amino acid group.sample- The sample name from tRAX.group- The sample group tRAX.pseudogene- Whether the tRNA is a pseudogene (tRNA/tRX).deseq2_sizefactor- The size factor used for normalization in DESeq2 for the sample.refseq- The reference sequence aligned with Sprinzl positions. This subset will drop any gap, extension, or alternate positions. (1-76)refseq_full- The reference sequence aligned with sprinzl positions.dataset- The name of the output file (Useful for combining multiple datasets if the merge function is used)- Any metadata categories provided in the observations list/file.
- The uniquely mapped reads are the reads that map to a single tRNA via alignment and filtering in tRAX and can be broken down into the following categories:
nreads_whole_unique_raw- The raw number of uniquely mapped whole-counts in the sample.nreads_whole_unique_norm- The sample's normalized number of uniquely mapped whole-counts.nreads_fiveprime_unique_raw- The sample's raw number of uniquely mapped 5' end-counts.nreads_fiveprime_unique_norm- The sample's normalized number of uniquely mapped 5' end-counts.nreads_threeprime_unique_raw- The sample's raw number of uniquely mapped 3' end-counts.nreads_threeprime_unique_norm- The sample's normalized number of uniquely mapped 3' end-counts.nreads_other_unique_raw- The sample’s raw number of uniquely mapped other-counts.nreads_other_unique_norm- The sample’s normalized number of uniquely mapped other-counts.nreads_total_unique_raw- The sample's raw number of uniquely mapped total-counts. This is the sum of the above categories.nreads_total_unique_norm- The sample's normalized number of uniquely mapped total-counts. This is the sum of the above categories.
- The multi-mapped reads map via alignment, but ambiguous reads are randomly assigned and can be broken down into the following categories:
nreads_wholecounts_raw- The sample’s raw number of whole-counts.nreads_wholecounts_norm- The sample’s normalized number of whole-counts.nreads_fiveprime_raw- The sample’s raw number of 5' end-counts.nreads_fiveprime_norm- The sample's normalized number of 5' end-counts.nreads_threeprime_raw- The sample’s raw number of 3' end-counts.nreads_threeprime_norm- The sample's normalized number of 3' end-counts.nreads_other_raw- The sample's raw number of other-counts.nreads_other_norm- The sample's normalized number of other-counts.nreads_total_raw- The sample's raw number of total-counts. This is the sum of the above categories.nreads_total_norm- The sample’ normalized number of total-counts. This is the sum of the above categories.- The following additional categories are also found under multi-mapped reads:
nreads_wholeprecounts_raw- The raw number of whole-precounts in the sample.nreads_wholeprecounts_norm- The normalized number of whole-precounts in the sample.nreads_partialprecounts_raw- The raw number of partial-precounts in the sample.nreads_partialprecounts_norm- The normalized number of partial-precounts in the sample.nreads_trailercounts_raw- The raw number of trailer-counts in the sample.nreads_trailercounts_norm- The normalized number of trailer-counts in the sample.
fragment- The type of fragment in the sample. This includesfiveprime_half,fiveprime_fragment,threeprime_half,threeprime_fragment,whole,other_fragment, andmultiple_fragment.- The
multiple_fragmentcategory is used for reads that dip in the middle of reads and are not considered whole- or partial-reads.
- The
Caution: The other category types are comprised of 'antisense', 'wholeprecounts', 'partialprecounts', and 'trailercounts' found via tRAX alignment and are highly dependent on the sequencing method.
The variables are the metadata categories used to filter the read coverage. The variables are stored in the var attribute of the database object as a pandas data frame. The following variables are automatically added to the database object:
gap- Whether a position is a gap in canonical Sprinzl positions. These gaps are skipped in the coverage plots so they can be easier to interpret when comparing different tRNAs.positions- The canonical Sprinzl positions.adenines,cytosines,guanines,thymines, anddeletionsare raw values in tRAX, while the rest are normalized values. To simplify, these are converted to normalized values by dividing by the Desq2 size factor. You can access the raw values usingadata.layers["raw"]instead ofadata.X.
coverage- The coverage type matching coverage types found in the tRAX coverage file.half- Whether the position is a 3' or 5' half position or in the center of a read.location- The portion of the tRNA relative to the sprinzl positions. This includesfiveprime_acceptorstem,threeprime_acceptorstem,a_to_d_internal,dstem,dloop,d_to_anticodon_internal,fiveprime_anticodonstem,threeprime_anticodonstem,anticodonloop,anticodon_to_t_internal,extensionloop,tstem, andtloop.
Since all coverage types are stored in the database object, it is useful to specify which coverage type you want to use if you use it for further analysis.
By default, all values in the database object (adata.X) are normalized using the DESeq2 size factor. The layers feature of AnnData is used to store the raw data from the coverage file for convenience. To access the raw data, you can use adata.layers["raw"] instead of adata.X.
The uns attribute of the database object is used to store information that is not directly aligned with the observations and variables and other metadata. The following uns categories are automatically added to the database object:
amino_counts- The amino acid counts for the tRNAs from the tRAX output directory.anticodon_counts- The anticodon counts for the tRNAs from the tRAX output directory.nontRNA_counts- The non-tRNA counts for the tRNAs from the tRAX output directory, this is based on the GTF file used for tRAX.type_counts- The tRNA type counts for the tRNAs from the tRAX output directory.cluster_runinfo- The information from the clustering run if it was performed.group_cluster_umap- The UMAP coordinates for the group clustering if it was performed.sample_cluster_umap- The UMAP coordinates for the sample clustering if it was performed.log2FC- The log2 fold change for the differential tRNA expression saved as a dictionary of dataframes.- This is automatically calculated for
nreads_total_unique_normandnreads_total_normfor between groups with common read-cutoffs.
- This is automatically calculated for
traxruninfo- The information from the tRAX run is based on the runinfo file.trnagraphruninfo- The information from the tRNAgraph when the database object was generated.
tRNAgraph is licensed under the GNU GPLv3 license.
