Skip to content

Tpocket module: Clarification on Tpocket Rankings & Consensus Overlap Criterion #166

@Yorick-126

Description

@Yorick-126

Hi!

I am using the Tpocket module to validate how well fpocket identifies ligandable pockets in a dataset of crystallized ligands. I have a few questions regarding Tpocket’s output and interpretation.

1. Interpretation of p_stats.txt (POS6 Column)

Tpocket generates two types of output files for each of its six ranking criteria:

  • stats_g.txt: General statistics across the dataset.
  • stats_p.txt: Per-protein statistics, which includes a "POS6" column.

My first question is:
Does the "POS6" column in p_stats.txt indicate the rank of the actual ligand binding pocket?

If so, I should be able to aggregate the per-protein statistics and obtain results that match those in g_stats.txt. However, when I visually compare pocket rankings based on POS6 or fpocket’s default ranking, I notice frequent mispredictions of the ligand-binding pocket.

2. Discrepancies in Pocket Ranking

I tried to look into this by implementing the Multiple Overlap Criterion (MOC) as described in the fpocket paper. This criterion, (which should be identical to the POS6 criterium (?)) appears to assign ligand-binding pockets better than Tpocket’s built-in rankings. This however, should be identical. After visual inspection, I also find that Tpocket often assigns the ligand-binding pocket incorrectly.

Some numbers:

  • according to stats_g.txt the consensus overlap criterion should be 0.86 for the top-1 pockets
  • When I aggregate the stats_p.txt file on POS6, the TPR indeed is around 0.86 for the top-1 pockets
  • However, when I run my implementation of the MOC the True Predictive Rate drops to 0.28 for the top-1 pockets
    This aligns with my visual inspections, where the ligand’s actual binding pocket is often incorrectly assigned.

So in sum, my questions are:

  • Am I misinterpreting the POS6 column?
  • Are there additional factors influencing how Tpocket ranks pockets that I should consider?
  • Is there perhaps a known issue with Tpocket's ranking approach?

Any insights into the workings of Tpocket would be greatly appreciated!
Thanks a lot!

Example Data
Below is an part from stats_g.txt, showing the general performance of the consensus overlap criteria:

	--
	-      _ Concensus overlap criteria (alpha sphere overlap) _       -
	--

   Ratio of good predictions (dist = 3A) 

Rank <=  1  :		  0.86
Rank <=  2  :		  0.90
Rank <=  3  :		  0.91
Rank <=  4  :		  0.91
Rank <=  5  :		  0.91
Rank <=  6  :		  0.91
Rank <=  7  :		  0.92
Rank <=  8  :		  0.92
Rank <=  9  :		  0.92
Rank <= 10  :		  0.92
Rank <= 15  :		  0.92
Rank <= 20  :		  0.92
Rank <= 50  :		  0.92
Rank <= 100  :		  0.92
Rank <= 200  :		  0.92
-
Mean relative overlap           :   74.55
Mean pocket volume (estimation) :    1061.59
Mean number of pocket atom      :   62

This suggests that in 86% of cases, the top-ranked pocket is within 3Å of the ligand.

Then, a part from p_stats.txt, which provides per-protein rankings:

LIG COMPLEXE APO NB_PCK CRIT1 CRIT2 CRIT3 CRIT4 CRIT5 CRIT6 POS1 POS2 POS3 POS4 POS5 POS6 REL_OVLP1 REL_OVLP2 REL_OVLP3 REL_OVLP4 REL_OVLP5 REL_OVLP6 LIGMASS LIGVOL PVOL3 NATM3 PVOL6 NATM6
UNL 4m8x_protein_ligand_combined.pdb 4m8x_protein_ligand_combined.pdb    17   62.50   85.92    3.98    0.71    0.80    1.00    1    1    1    1    1    1   800.00    90.14     0.00    71.43    79.66    79.66    679.40    579.49      1313.84   64      1313.84   64
UNL 4mr3_protein_ligand_combined.pdb 4mr3_protein_ligand_combined.pdb     6  100.00   89.19    3.10    0.88    0.59    1.00    1    1    1    1    1    1  2550.00   137.84     0.00    87.50    59.21    59.21    308.20    233.80       990.88   51       990.88   51
UNL 5mrb_protein_ligand_combined.pdb 5mrb_protein_ligand_combined.pdb    20  100.00   66.15    3.70    0.74    0.82    1.00    1    1    1    1    1    1  2600.00    80.00     0.00    74.42    81.94    81.94    540.41    523.31      1149.40   52      1149.40   52
UNL 3nyx_protein_ligand_combined.pdb 3nyx_protein_ligand_combined.pdb    24  100.00  100.00    2.75    1.00    0.61    1.00    1    1    1    1    1    1  2275.00   165.45     0.00   100.00    60.65    60.65    395.85    278.62      1460.10   91      1460.10   91
UNL 4w97_protein_ligand_combined.pdb 4w97_protein_ligand_combined.pdb    29   57.14   74.55    0.78    0.62    0.37    1.00    2    1    2    1    1    1   457.14   189.09   189.09    62.50    37.23    37.23    935.39    664.84       655.70   32      3973.62  208
UNL 2bmk_protein_ligand_combined.pdb 2bmk_protein_ligand_combined.pdb    23   57.14   87.50    2.77    0.81    0.64    1.00    1    1    1    1    1    1   571.43   125.00     0.00    80.95    63.93    63.93    303.10    199.65       790.61   40       790.61   40
UNL 6c2r_protein_ligand_combined.pdb 6c2r_protein_ligand_combined.pdb    47  100.00   95.12    2.41    1.00    0.71    1.00    1    1    1    1    1    1  1966.67   143.90     0.00   100.00    70.53    70.53    463.74    432.10      1131.84   59      1131.84   59
UNL 4ayu_protein_ligand_combined.pdb 4ayu_protein_ligand_combined.pdb    62    0.00    0.00    0.00    0.00    0.00    0.00    0    0    0    0    0    0     0.00     0.00     0.00     0.00     0.00     0.00    146.08    120.91         0.00   -1         0.00   -1
UNL 5oht_protein_ligand_combined.pdb 5oht_protein_ligand_combined.pdb    36  100.00  100.00    0.00    1.00    0.31    1.00    1    1    0    1    1    1  1620.00   324.00     0.00   100.00    30.66    30.66    198.13    121.79         0.00   -1      1216.02   81

Despite POS6 suggesting that these pockets rank first, the actual ligand binding pocket is frequently incorrectly assigned

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions