Merge pull request #3070 from chrishalcrow/unify-isi-violation-notation

samuelgarcia · web-flow · commit 8e9bb8ca916a · 2024-07-03T11:20:41.000+02:00
Unify compute_isi_violation docs and add UltraMegaSort2000 citation
diff --git a/doc/modules/qualitymetrics/isi_violations.rst b/doc/modules/qualitymetrics/isi_violations.rst
@@ -10,46 +10,48 @@ Calculation
 Neurons have a refractory period after a spiking event during which they cannot fire again.
 Inter-spike-interval (ISI) violations refers to the rate of refractory period violations (as described by [Hill]_).
 
+We aim to calculate the contamination rate :math:`C`, measuring the ratio of isi violations in the spike-train of a unit.
 The calculation works under the assumption that the contaminant events happen randomly or come from another neuron that is not correlated with our unit.
 A correlation will lead to an overestimation of the contamination, whereas an anti-correlation will lead to an underestimation.
 
-Different formulas have been developed over the years.
+Different formulas have been developed, but all require:
 
-Calculation from the [Hill]_ paper
-----------------------------------
+- :math:`T` the duration of the recording in seconds.
+- :math:`N` the number of spikes in the unit's spike train.
+- :math:`t_r` the duration of the unit's refractory period in seconds.
+
+Calculation from the [UMS]_ package
+-----------------------------------
+
+Originally implemented in the `rpv_contamination` calculation of the UltraMegaSort2000 package: `<https://github.com/danamics/UMS2K/blob/master/quality_measures/rpv_contamination.m>`_.
 
-The following quantities are required:
+In this method the number of spikes whose refractory period are violated, denoted :math:`n_v`, is used.
 
-- :math:`ISI_t` : biological threshold for ISI violation.
-- :math:`ISI_{min}`: minimum ISI threshold enforced by the data recording system used.
-- :math:`ISI_s` : the array of ISI violations which are observed in the unit's spike train.
-- :math:`\#`: denotes count.
+Here, the refactory period :math:`t_r` is adjusted to take account of the data recording system's minimum possible refactory
+period. E.g. if a system has a sampling rate of :math:`f \text{ Hz}`, the closest that two spikes from the same unit can possibly
+be is :math:`1/f \, \text{s}`. Hence the refactory period :math:`t_r` is the expected biological threshold minus this minimum possible
+threshold.
 
-The threshold for ISI violations is the biological ISI threshold, :math:`ISI_t`, minus the minimum ISI threshold, :math:`ISI_{min}` enforced by the data recording system used.
-The array of inter-spike-intervals observed in the unit's spike train, :math:`ISI_s`, is used to identify the count (:math:`\#`) of observed ISI's below this threshold.
-For a recording with a duration of :math:`T_r` seconds, and a unit with :math:`N_s` spikes, the rate of ISI violations is:
+The contamination rate is calculated to be
 
 .. math::
 
-    \textrm{ISI violations} = \frac{ \#( ISI_s < ISI_t) T_r  }{ 2  N_s^2  (ISI_t - ISI_{min}) }
+    C = \frac{ n_v T }{ 2 N^2 t_r }
 
 Calculation from the [Llobet]_ paper
 ------------------------------------
 
-The following quantities are required:
-
-- :math:`T` the duration of the recording.
-- :math:`N` the number of spikes in the unit's spike train.
-- :math:`t_r` the duration of the unit's refractory period.
-- :math:`n_v` the number of violations of the refractory period.
+In this method the number spikes which violate other spikes' refractory periods, denoted :math:`\tilde{n}_v`, is used.
 
-The estimated contamination :math:`C` can be calculated with 2 extreme scenarios. In the first one, the contaminant spikes are completely random (or come from an infinite number of other neurons). In the second one, the contaminant spikes come from a single other neuron:
+The estimated contamination :math:`C` is calculated in 2 extreme scenarios. In the first, the contaminant spikes
+are completely random (or come from an infinite number of other neurons). In the second, the contaminant spikes
+come from a single other neuron. In these scenarios, the contamination rate is
 
 .. math::
 
     C = \frac{FP}{TP + FP} \approx \begin{cases}
-        1 - \sqrt{1 - \frac{n_v T}{N^2 t_r}} \text{ for the case of random contamination} \\
-        \frac{1}{2} \left( 1 - \sqrt{1 - \frac{2 n_v T}{N^2 t_r}} \right) \text{ for the case of 1 contaminant neuron}
+        1 - \sqrt{1 - \frac{\tilde{n}_v T}{N^2 t_r}} \text{ for the case of random contamination} \\
+        \frac{1}{2} \left( 1 - \sqrt{1 - \frac{2 \tilde{n}_v T}{N^2 t_r}} \right) \text{ for the case of 1 contaminant neuron}
     \end{cases}
 
 Where :math:`TP` is the number of true positives (detected spikes that come from the neuron) and :math:`FP` is the number of false positives (detected spikes that don't come from the neuron).
@@ -58,7 +60,9 @@ Expectation and use
 -------------------
 
 ISI violations identifies unit contamination - a high value indicates a highly contaminated unit.
-Despite being a ratio, ISI violations can exceed 1 (or become a complex number in the [Llobet]_ formula). This is usually due to the contaminant events being correlated with our neuron, and their number is greater than a purely random spike train.
+Despite being a ratio, the contamination can exceed 1 (or become a complex number in the [Llobet]_ formula).
+This is usually due to the contaminant events being correlated with our neuron, and their number is
+greater than a purely random spike train.
 
 Example code
 ------------
@@ -86,8 +90,8 @@ With SpikeInterface:
 References
 ----------
 
-Hill implementation (:code:`isi_violation`)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+UMS implementation (:code:`isi_violation`)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 .. autofunction:: spikeinterface.qualitymetrics.misc_metrics.compute_isi_violations
 
@@ -160,5 +164,5 @@ Links to original implementations
 Literature
 ----------
 
-Introduced by [Hill]_ (2011).
+Introduced in UltraMegaSort2000 [UMS]_ (2011).
 Also described by [Llobet]_ (2022)
diff --git a/doc/references.rst b/doc/references.rst
@@ -50,9 +50,11 @@ If you use the :code:`qualitymetrics` module, i.e. you use the :code:`analyzer.c
 or :code:`compute_quality_metrics()` methods, please include the citations for the :code:`metric_names` that were particularly
 important for your research:
 
-- :code:`amplitude_cutoff` or  :code:`isi_violation` [Hill]_
-- :code:`amplitude_median` or :code:`sliding_rp_violation` [IBL]_
+- :code:`amplitude_cutoff` [Hill]_
+- :code:`amplitude_median` [IBL]_
+- :code:`sliding_rp_violation` [IBL]_
 - :code:`drift` [Siegle]_
+- :code:`isi_violation` [UMS]_
 - :code:`rp_violation` [Llobet]_
 - :code:`sd_ratio` [Pouzat]_
 - :code:`snr` [Lemon]_ [Jackson]_
@@ -122,6 +124,8 @@ References
 
 .. [Siegle] `Survey of Spiking in the Mouse Visual System Reveals Functional Hierarchy. 2021. <https://pubmed.ncbi.nlm.nih.gov/33473216/>`_
 
+.. [UMS] `UltraMegaSort2000 - Spike sorting and quality metrics for extracellular spike data. 2011. <https://github.com/danamics/UMS2K>`_
+
 .. [Varol] `Decentralized Motion Inference and Registration of Neuropixel Data. 2021. <https://ieeexplore.ieee.org/document/9414145>`_
 
 .. [Windolf] `Robust Online Multiband Drift Estimation in Electrophysiology Data. 2022. <https://www.biorxiv.org/content/10.1101/2022.12.04.519043v2>`_
diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py
@@ -241,7 +241,7 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0,
 
     It computes several metrics related to isi violations:
         * isi_violations_ratio: the relative firing rate of the hypothetical neurons that are
-                                generating the ISI violations. Described in [Hill]_. See Notes.
+                                generating the ISI violations. See Notes.
         * isi_violation_count: number of ISI violations
 
     Parameters
@@ -261,22 +261,29 @@ def compute_isi_violations(sorting_analyzer, isi_threshold_ms=1.5, min_isi_ms=0,
     Returns
     -------
     isi_violations_ratio : dict
-        The isi violation ratio described in [Hill]_.
+        The isi violation ratio.
     isi_violation_count : dict
         Number of violations.
 
     Notes
     -----
-    You can interpret an ISI violations ratio value of 0.5 as meaning that contaminating spikes are
-    occurring at roughly half the rate of "true" spikes for that unit.
-    In cases of highly contaminated units, the ISI violations ratio can sometimes be greater than 1.
+    The returned ISI violations ratio approximates the fraction of spikes in each
+    unit which are contaminted. The formulation assumes that the contaminating spikes
+    are statistically independent from the other spikes in that cluster. This
+    approximation can break down in reality, especially for highly contaminated units.
+    See the discussion in Section 4.1 of [Llobet]_ for more details.
+
+    This method counts the number of spikes whose isi is violated. If there are three
+    spikes within `isi_threshold_ms`, the first and second are violated. Hence there are two
+    spikes which have been violated.  This is is contrast to `compute_refrac_period_violations`,
+    which counts the number of violations.
 
     References
     ----------
-    Based on metrics described in [Hill]_
+    Based on metrics originally implemented in Ultra Mega Sort [UMS]_.
 
-    Originally written in Matlab by Nick Steinmetz (https://github.com/cortex-lab/sortingQuality)
-    and converted to Python by Daniel Denman.
+    This implementation is based on one of the original implementations written in Matlab by Nick Steinmetz
+    (https://github.com/cortex-lab/sortingQuality) and converted to Python by Daniel Denman.
     """
     res = namedtuple("isi_violation", ["isi_violations_ratio", "isi_violations_count"])
 
@@ -324,7 +331,6 @@ def compute_refrac_period_violations(
     Calculate the number of refractory period violations.
 
     This is similar (but slightly different) to the ISI violations.
-    The key difference being that the violations are not only computed on consecutive spikes.
 
     This is required for some formulas (e.g. the ones from Llobet & Wyngaard 2022).
 
@@ -351,6 +357,12 @@ def compute_refrac_period_violations(
     -----
     Requires "numba" package
 
+    This method counts the number of violations which occur during the refactory period.
+    For example, if there are three spikes within `refractory_period_ms`, the second and third spikes
+    violate the first spike and the third spike violates the second spike. Hence there
+    are three violations. This is in contrast to `compute_isi_violations`, which
+    computes the number of spikes which have been violated.
+
     References
     ----------
     Based on metrics described in [Llobet]_