Skip to content
This repository was archived by the owner on Dec 5, 2024. It is now read-only.

Alignment Algorithm

Josh Miller edited this page Apr 17, 2019 · 1 revision

Alignment Algorithm

In this page, we'll explain the protein structure alignment algorithm that Moltimate uses.

Basic Rundown

When computing an alignment in Moltimate, we use a motif in order to find a matching set of residues in another protein. We do this by calling a set of queries inside a motif and using the results in order to find the best selection of residues (if there is one) that fits all of the queries.

Motifs

The first part of understanding the alignment algorithm is understanding what a motif is. A motif describes the active site of a protein. Inside our org.moltimatebackend.model.Motif class, you can see that it has a list of active site residues. In this class, we also have a map of string to ResidueQuerySet. This essentially maps each residue name to a set of MotifSelections which are used to find similarly placed residues in another protein.

Motif Selections

A ResidueQuerySet contains a list of MotifSelection objects. Motif Selections represent queries that can be made at the atomic level for each residue. Each selection contains two residue names, two atom types, and a distance. What the selection is used for is finding an atom of the first type inside a residue with the first residue name that is within the query distance of an atom of type two inside a residue with the second residue name.

For example, let's assume this residue query:

atomType1: CA
atomType2: CB
residueName1: SER
residueName2: HIS
distance: 8

This query would give us a list of all CA atoms in a protein that belong to a SER residue and are within 8 angstroms of a CB atom that belongs to a HIS residue. This distance comparison logic for these selections can be seen in StructureUtils.RunQuery.

We create Motifs in Moltimate by finding the distance between all atoms in all residues of the active site of a protein and creating a query for each.

Residue Selection

After each query is run, we receive a list of atoms that match the query. So the next question is, how do we know which residues to choose? The way we do this is by checking which residue each atom belongs to for the result of every query and managing a count of how many queries each candidate residue has matched from the set of queries in the motif. This can be seen in the Motif.runQueries method. When we are selecting residues, we only select those which match all of the queries for the residue it is supposed to line up with in a motif. For example, if we have a motif that has 25 queries for residue HIS 57, then we will only accept matches for HIS 57 that pass all 25 of its queries. After doing this, we may receive several matches for each residue in a motif, so we return a mapping from Residue to a list of biojava Group objects (which are essentially biojava's version of Residue) where each Residue of the active site is mapped to the list of Groups that matched with it

Alignment Selection

After residues are selected, we fall into one of three cases:

Case 1: Not enough matches

If we find that there aren't any matches for a residue in the motif, then we know that there is no good alignment, and stop here.

Case 2: Just enough matches

It's somewhat often that we get lucky and find that we have exactly one residue matching for each residue in the active site of the motif. In this case, we know that this is the alignment we are looking for and return it

Case 3: Multiple matches

This case is slightly more tricky. If we have multiple matches for a residue, then we have to find some way of figuring out which combination of residues aligns best. The way we do this is by taking the Cartesian product of the lists of matching residues to find all possible combinations that can be used for alignment. We then check each possible alignment to see which one has the lowest RMSD (root mean squared distance between atoms). The alignment combination with the lowest RMSD is then selected and the rest of the combinations are thrown out.

All of this can be seen inside AlignmentService.alignActiveSites (the lower one).

Clone this wiki locally