The MHT Cluster Project

We are clustering sequences from a multiple sequence alignment (MSA) based on amino acids representing the binding sites. The we use the following procedure:

1. Identify Binding Sites:

Manually identify binding site regions from the MSA data. Create a dataframe containing only sequences names and binding site residues.

2. Convert the Extracted Sites into Feature Vectors:

For each sequence, create a feature vector that corresponds to the amino acids at the binding site positions. Describe features with a mix of ordinal and one-hot encoding methods. Features are from the IMGT website.

3. Clustering:

Using k-means clustering and hierarchical clustering, group similar binding sites with each other. Plot clusters in 2-D space with dimension reduction methods and compare their distinctness using the silhouette score.

4. Validation of Clusters:

Evaluate the biological relevance of the clusters by mapping back the clustered sequences to their original MSA and checking for conservation, structural relevance, or known functional motifs.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
data		data
notebooks		notebooks
notes		notes
.gitignore		.gitignore
ReadMe.md		ReadMe.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The MHT Cluster Project

1. Identify Binding Sites:

2. Convert the Extracted Sites into Feature Vectors:

3. Clustering:

4. Validation of Clusters:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The MHT Cluster Project

1. Identify Binding Sites:

2. Convert the Extracted Sites into Feature Vectors:

3. Clustering:

4. Validation of Clusters:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages