forked from fdarthen/taXaminer
-
Notifications
You must be signed in to change notification settings - Fork 1
Output
Freya Arthen edited this page Feb 20, 2024
·
6 revisions
-
summary.txtLists the main assembly metrics (i.e. numbers as well as mean, SD, min and max of length, GC content and coverage) on contig and gene level -
raw_gene_table.csvContains all variables with values as they have been computed on the input data set (for detailed description of each variable, see this section -
imputed_gene_table.csvThe same as raw_gene_table.txt|.csv except for the variables 'c_genecovsd', 'c_genelensd', 'g_covdev_c', 'g_gcdev_c', 'g_lendev_c' which are rescaled to a range of 0 to 1 NaN (= missing) values are imputed with the mean of the respective variable
The label in the plots, which represents the query species, is automatically determined and always colored in a dark grey
-
3D_plot.htmlInteractive 3D scatterplot to examine genes and their taxonomic assignments- with single-clicks on labels you can hide individual groups
- double-clicks hide every group except for the one that was clicked
- hovering over data points shows additional information
- the subdirectory “3D_plot_files” holds additional files for this plot and is required to display the plot (important when working with MobaXterm for example)
-
gene_table_taxon_assignment.csvraw_gene_table with PCA coordinates for each gene and their taxonomic assignment appended- this is a tabular representation of all information that is displayed in the 3D plot
- see Additional information for details on the contained information
-
taxsun.tsvInput file to explore the taxonomic assignment of the genes with taxSun
-
contribution_of_variables.png|.pdfFigure illustrating how much each variable contributes to the first two principal components -
genes_and_variables.png|.pdfBiplot of variables (vectors) and genes (points) in the new coordinate system defined by the first two principal components. Transparency represents the amount of contribution to the principal components -
pca_loadings.csvTable listing the loadings of the original variables (rows) on the computed principal components (columns) -
pca_summary.csvTable listing standard deviation, proportion of explained variance and cumulative proportion of explained variance in the original data for each of the principal components -
scree_plot.png|.pdfScree plot visualising the amount of variance in the original data that is explained by each of the principal components (here: dimensions) -
parallel_analysis.png|.pdfOnly available if parallel analysis was performed on the principal components. Results of Horn’s parallel analysis: plotting random eigenvalues for the given number of PCs, adjusted and unadjusted eigenvalues, indicating which one were retained for the subsequent PCA
This file is the output of DIAMOND and holds the exact hits for each protein. The columns are the following:
-
qseqidQuery Sequence ID (equal column fasta_header in 'gene_table_taxon_assignment.csv') -
sseqidSubject Sequence ID - accession number of the matched protein in DB -
pidentPercentage of identical matches -
lengthAlignment length -
mismatchNumber of mismatches -
gapopenNumber of gap openings -
qstartStart of alignment in query -
qendEnd of alignment in query -
sstartStart of alignment in subject -
sendEnd of alignment in subject -
evalueExpect value -
bitscoreBit score -
staxidsUnique Subject Taxonomy ID(s), separated by a ’;’ (in numerical order) -
sscinameUnique Subject Scientific Name(s), separated by a ';'