Mahalanobis metrics (isolation_distance, l_ratio)

Mahalanobis metrics are quality metrics based on the Mahalanobis distance between spikes and cluster centres in the PCA space.

They include: - Isolation distance (isolation_distance) - L-ratio (l_ratio)

Calculation

Isolation distance

  • \(C\) : cluster of interest.

  • \(N_s\) : number of spikes within cluster \(C\).

  • \(N_n\) : number of spikes outside of cluster \(C\).

  • \(N_{min}\) : minimum of \(N_s\) and \(N_n\).

  • \(\mu_C\), \(\Sigma_C\) : mean vector and covariance matrix for spikes within \(C\) (where each spike within \(C\) is represented by a vector of principal components (PCs)).

  • \(D_{i,C}^2\) : for every spike \(i\) (represented by vector \(x_i\)) outside of cluster \(C\), the Mahalanobis distance (as below) between \(\mu_c\) and \(x_i\) is calculated. These distances are ordered from smallest to largest. The \(N_{min}\)’th entry in this list is the isolation distance.

\[D_{i,C}^2 = (x_i - \mu_C)^T \Sigma_C^{-1} (x_i - \mu_C)\]

Geometrically, the isolation distance for cluster \(C\) is the radius of the circle which contains \(N_{min}\) spikes from cluster \(C\) and \(N_{min}\) spikes outside of the cluster \(C\).

L-ratio

This example assumes use of a tetrode.

L-ratio uses 4 principal components (PCs) for each tetrode channel (the first being energy, the square root of the sum of squares of each sample in the waveform, followed by the first 3 PCs of the energy normalised waveform). This yields spikes which are each represented as a point in 16 dimensional space.

Define, for each cluster \(C\), \(D_{i,C}^2\), the squared Mahalanobis distance from the centre of cluster \(C\) for every spike \(i\) in the dataset (similarly to the calculation for isolation distance above). Assume that spikes in the cluster distribute normally in each dimension, so that \(D^2\) for spikes in a cluster will distribute as \(\chi^2\) with 16 degrees of freedom. This yields \(\textrm{CDF}_{\chi^2_{\mathrm{df}}}\), the cumulative distribution function of the \(\chi^2\) distribution. Define for each cluster \(C\), the value \(L(C)\), representing the amount of contamination of the cluster \(C`\):

\[L(C) = \sum_{i \notin \mathrm{C}} 1 - \mathrm{CDF}_{\chi^2_{\mathrm{df}}}(D^2_{i, C})\]

\(L\) is then the sum of probabilities that each spike which is not a member of cluster \(C\) should be. Therefore the inverse of this cumulative distribution yields the probability of cluster membership for each spike \(i\). \(L\) is then normalised by the number of spikes \(N_s\) in \(C\) to allow larger clusters to tolerate more contamination. This yields L-ratio, which can be expressed as:

\[L_{\mathrm{ratio}}(C) = \frac{L(C)}{N_s}\]

Expectation and use

Isolation distance can be interpreted as a measure of distance from the cluster to the nearest other cluster. A well isolated unit should have a large isolation distance.

L-ratio quantifies unit separation, so a high value indicates a highly contaminated unit (type I error) ([Schmitzer-Torbert] et al.). [Jackson] et al. suggests that this measure is also correlated with type II errors (although more strongly with type I errors).

A well separated unit should have a low L-ratio ([Schmitzer-Torbert] et al.).

Example code

from spikeinterface.metrics.quality.pca_metrics import mahalanobis_metrics

isolation_distance, l_ratio = mahalanobis_metrics(all_pcs=all_pcs, all_labels=all_labels, this_unit_id=0)

References

spikeinterface.metrics.quality.pca_metrics.mahalanobis_metrics(all_pcs, all_labels, this_unit_id)

Calculate isolation distance and L-ratio (metrics computed from Mahalanobis distance).

Parameters:
all_pcs2d array

The PCs for all spikes, organized as [num_spikes, PCs].

all_labels1d array

The cluster labels for all spikes. Must have length of number of spikes.

this_unit_idint

The ID for the unit to calculate these metrics for.

Returns:
isolation_distancefloat

Isolation distance of this unit.

l_ratiofloat

L-ratio for this unit.

References

Based on metrics described in [Schmitzer-Torbert]

Literature

Isolation distance introduced by [Harris]. L-ratio introduced by [Schmitzer-Torbert] et al.. Early discussion and comparison with isolation distance by [Jackson] et al..