Skip to content

PCA Analysis on alignment fasta #40

@necrolyte2

Description

@necrolyte2

Input: fasta file that has been already aligned by user. Will contain multiple datasets(For now just 2)
output: PCA 3D graphics showing difference of dataset

Need to determine best way to allow user to supply different datasets to be compared. Right now, all sequence names have a common name for the dataset
See https://github.com/VDBWRAIR/bio_pieces/blob/pca/tests/testinput/aln1.fasta
This file has 2 data sets that need to be compared. Essentially, the graphic needs to have 2 colors to distinquish them

Currently the script aln_pca builds what I think is a PCA graphic(https://github.com/VDBWRAIR/bio_pieces/blob/pca/docs/_static/pca.png), however, I am not 100% confident it is done correctly as the matplotlib and scikit-learn PCA graphics look different.

You can also check out www.jalview.org as it builds a PCA graphic that we are trying to semi-replicate, but with better colors and axis and such.

The pca branch also includes a hacked together ipython notebook for messing around with which has the matplotlib as well as scikit-learn pca graphic which you can see does not look like the manual one I built using the tutorial from a different website which is listed in that file.

@mmelendrez
@averagehat

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions