pingouin.corr

pingouin.corr(x, y, tail='two-sided', method='pearson')[source]

(Robust) correlation between two variables.

Parameters
x, yarray_like

First and second set of observations. x and y must be independent.

tailstring

Specify whether to return ‘one-sided’ or ‘two-sided’ p-value.

methodstring

Specify which method to use for the computation of the correlation coefficient. Available methods are

'pearson' : Pearson product-moment correlation
'spearman' : Spearman rank-order correlation
'kendall' : Kendall’s tau (ordinal data)
'percbend' : percentage bend correlation (robust)
'shepherd' : Shepherd's pi correlation (robust Spearman)
'skipped' : skipped correlation (robust Spearman, requires sklearn)
Returns
statspandas DataFrame

Test summary

'n' : Sample size (after NaN removal)
'outliers' : number of outliers (only for 'shepherd' or 'skipped')
'r' : Correlation coefficient
'CI95' : 95% parametric confidence intervals
'r2' : R-squared
'adj_r2' : Adjusted R-squared
'p-val' : one or two tailed p-value
'BF10' : Bayes Factor of the alternative hypothesis (Pearson only)
'power' : achieved power of the test (= 1 - type II error).

See also

pairwise_corr

Pairwise correlation between columns of a pandas DataFrame

partial_corr

Partial correlation

Notes

The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Correlations of -1 or +1 imply an exact linear relationship.

The Spearman correlation is a nonparametric measure of the monotonicity of the relationship between two datasets. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. Correlations of -1 or +1 imply an exact monotonic relationship.

Kendall’s tau is a measure of the correspondence between two rankings. Values close to 1 indicate strong agreement, values close to -1 indicate strong disagreement.

The percentage bend correlation [1] is a robust method that protects against univariate outliers.

The Shepherd’s pi [2] and skipped [3], [4] correlations are both robust methods that returns the Spearman’s rho after bivariate outliers removal. Note that the skipped correlation requires that the scikit-learn package is installed (for computing the minimum covariance determinant).

Please note that rows with NaN are automatically removed.

If method='pearson', The Bayes Factor is calculated using the pingouin.bayesfactor_pearson() function.

References

1(1,2)

Wilcox, R.R., 1994. The percentage bend correlation coefficient. Psychometrika 59, 601–616. https://doi.org/10.1007/BF02294395

2(1,2)

Schwarzkopf, D.S., De Haas, B., Rees, G., 2012. Better ways to improve standards in brain-behavior correlation analysis. Front. Hum. Neurosci. 6, 200. https://doi.org/10.3389/fnhum.2012.00200

3(1,2)

Rousselet, G.A., Pernet, C.R., 2012. Improving standards in brain-behavior correlation analyses. Front. Hum. Neurosci. 6, 119. https://doi.org/10.3389/fnhum.2012.00119

4(1,2)

Pernet, C.R., Wilcox, R., Rousselet, G.A., 2012. Robust correlation analyses: false positive and power validation using a new open source matlab toolbox. Front. Psychol. 3, 606. https://doi.org/10.3389/fpsyg.2012.00606

Examples

  1. Pearson correlation

>>> import numpy as np
>>> # Generate random correlated samples
>>> np.random.seed(123)
>>> mean, cov = [4, 6], [(1, .5), (.5, 1)]
>>> x, y = np.random.multivariate_normal(mean, cov, 30).T
>>> # Compute Pearson correlation
>>> from pingouin import corr
>>> corr(x, y)
          n      r         CI95%     r2  adj_r2     p-val  BF10  power
pearson  30  0.491  [0.16, 0.72]  0.242   0.185  0.005813  8.55  0.809
  1. Pearson correlation with two outliers

>>> x[3], y[5] = 12, -8
>>> corr(x, y)
          n      r          CI95%     r2  adj_r2     p-val   BF10  power
pearson  30  0.147  [-0.23, 0.48]  0.022  -0.051  0.439148  0.302  0.121
  1. Spearman correlation

>>> corr(x, y, method="spearman")
           n      r         CI95%     r2  adj_r2     p-val  power
spearman  30  0.401  [0.05, 0.67]  0.161   0.099  0.028034   0.61
  1. Percentage bend correlation (robust)

>>> corr(x, y, method='percbend')
           n      r         CI95%     r2  adj_r2     p-val  power
percbend  30  0.389  [0.03, 0.66]  0.151   0.089  0.033508  0.581
  1. Shepherd’s pi correlation (robust)

>>> corr(x, y, method='shepherd')
           n  outliers      r         CI95%     r2  adj_r2     p-val  power
shepherd  30         2  0.437  [0.09, 0.69]  0.191   0.131  0.020128  0.694
  1. Skipped spearman correlation (robust)

>>> corr(x, y, method='skipped')
          n  outliers      r         CI95%     r2  adj_r2     p-val  power
skipped  30         2  0.437  [0.09, 0.69]  0.191   0.131  0.020128  0.694
  1. One-tailed Pearson correlation

>>> corr(x, y, tail="one-sided", method='pearson')
          n      r          CI95%     r2  adj_r2     p-val   BF10  power
pearson  30  0.147  [-0.23, 0.48]  0.022  -0.051  0.219574  0.467  0.194
  1. Using columns of a pandas dataframe

>>> import pandas as pd
>>> data = pd.DataFrame({'x': x, 'y': y})
>>> corr(data['x'], data['y'])
          n      r          CI95%     r2  adj_r2     p-val   BF10  power
pearson  30  0.147  [-0.23, 0.48]  0.022  -0.051  0.439148  0.302  0.121