# pingouin.corr

pingouin.corr(x, y, tail='two-sided', method='pearson')[source]

(Robust) correlation between two variables.

Parameters
x, yarray_like

First and second set of observations. x and y must be independent.

tailstring

Specify whether to return 'one-sided' or 'two-sided' p-value. Note that the former are simply half the latter.

methodstring

Correlation type:

• 'pearson': Pearson $$r$$ product-moment correlation

• 'spearman': Spearman $$\rho$$ rank-order correlation

• 'kendall': Kendall’s $$\tau$$ correlation (for ordinal data)

• 'bicor': Biweight midcorrelation (robust)

• 'percbend': Percentage bend correlation (robust)

• 'shepherd': Shepherd’s pi correlation (robust)

• 'skipped': Skipped correlation (robust)

Returns
statspandas.DataFrame
• 'n': Sample size (after removal of missing values)

• 'outliers': number of outliers, only if a robust method was used

• 'r': Correlation coefficient

• 'CI95': 95% parametric confidence intervals around $$r$$

• 'r2': R-squared ($$= r^2$$)

• 'adj_r2': Adjusted R-squared

• 'p-val': tail of the test

• 'BF10': Bayes Factor of the alternative hypothesis (only for Pearson correlation)

• 'power': achieved power of the test (= 1 - type II error).

pairwise_corr

Pairwise correlation between columns of a pandas DataFrame

partial_corr

Partial correlation

rm_corr

Repeated measures correlation

Notes

The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Correlations of -1 or +1 imply a perfect negative and positive linear relationship, respectively, with 0 indicating the absence of association.

$r_{xy} = \frac{\sum_i(x_i - \bar{x})(y_i - \bar{y})} {\sqrt{\sum_i(x_i - \bar{x})^2} \sqrt{\sum_i(y_i - \bar{y})^2}} = \frac{\text{cov}(x, y)}{\sigma_x \sigma_y}$

where $$\text{cov}$$ is the sample covariance and $$\sigma$$ is the sample standard deviation.

If method='pearson', The Bayes Factor is calculated using the pingouin.bayesfactor_pearson() function.

The Spearman correlation coefficient is a non-parametric measure of the monotonicity of the relationship between two datasets. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. Correlations of -1 or +1 imply an exact negative and positive monotonic relationship, respectively. Mathematically, the Spearman correlation coefficient is defined as the Pearson correlation coefficient between the rank variables.

The Kendall correlation coefficient is a measure of the correspondence between two rankings. Values also range from -1 (perfect disagreement) to 1 (perfect agreement), with 0 indicating the absence of association. Consistent with scipy.stats.kendalltau(), Pingouin returns the Tau-b coefficient, which adjusts for ties:

$\tau_B = \frac{(P - Q)}{\sqrt{(P + Q + T) (P + Q + U)}}$

where $$P$$ is the number of concordant pairs, $$Q$$ the number of discordand pairs, $$T$$ the number of ties in x, and $$U$$ the number of ties in y.

The biweight midcorrelation and percentage bend correlation  are both robust methods that protects against univariate outliers by down-weighting observations that deviate too much from the median.

The Shepherd pi  correlation and skipped ,  correlation are both robust methods that returns the Spearman correlation coefficient after removing bivariate outliers. Briefly, the Shepherd pi uses a bootstrapping of the Mahalanobis distance to identify outliers, while the skipped correlation is based on the minimum covariance determinant (which requires scikit-learn). Note that these two methods are significantly slower than the previous ones.

Important

Please note that rows with missing values (NaN) are automatically removed.

References

1

Wilcox, R.R., 1994. The percentage bend correlation coefficient. Psychometrika 59, 601–616. https://doi.org/10.1007/BF02294395

2

Schwarzkopf, D.S., De Haas, B., Rees, G., 2012. Better ways to improve standards in brain-behavior correlation analysis. Front. Hum. Neurosci. 6, 200. https://doi.org/10.3389/fnhum.2012.00200

3

Rousselet, G.A., Pernet, C.R., 2012. Improving standards in brain-behavior correlation analyses. Front. Hum. Neurosci. 6, 119. https://doi.org/10.3389/fnhum.2012.00119

4

Pernet, C.R., Wilcox, R., Rousselet, G.A., 2012. Robust correlation analyses: false positive and power validation using a new open source matlab toolbox. Front. Psychol. 3, 606. https://doi.org/10.3389/fpsyg.2012.00606

Examples

1. Pearson correlation

>>> import numpy as np
>>> import pingouin as pg
>>> # Generate random correlated samples
>>> np.random.seed(123)
>>> mean, cov = [4, 6], [(1, .5), (.5, 1)]
>>> x, y = np.random.multivariate_normal(mean, cov, 30).T
>>> # Compute Pearson correlation
>>> pg.corr(x, y).round(3)
n      r         CI95%     r2  adj_r2  p-val  BF10  power
pearson  30  0.491  [0.16, 0.72]  0.242   0.185  0.006  8.55  0.809

1. Pearson correlation with two outliers

>>> x, y = 12, -8
>>> pg.corr(x, y).round(3)
n      r          CI95%     r2  adj_r2  p-val   BF10  power
pearson  30  0.147  [-0.23, 0.48]  0.022  -0.051  0.439  0.302  0.121

1. Spearman correlation (robust to outliers)

>>> pg.corr(x, y, method="spearman").round(3)
n      r         CI95%     r2  adj_r2  p-val  power
spearman  30  0.401  [0.05, 0.67]  0.161   0.099  0.028   0.61

1. Biweight midcorrelation (robust)

>>> pg.corr(x, y, method="bicor").round(3)
n      r         CI95%     r2  adj_r2  p-val  power
bicor  30  0.393  [0.04, 0.66]  0.155   0.092  0.031  0.592

1. Percentage bend correlation (robust)

>>> pg.corr(x, y, method='percbend').round(3)
n      r         CI95%     r2  adj_r2  p-val  power
percbend  30  0.389  [0.03, 0.66]  0.151   0.089  0.034  0.581

1. Shepherd’s pi correlation (robust)

>>> pg.corr(x, y, method='shepherd').round(3)
n  outliers      r         CI95%     r2  adj_r2  p-val  power
shepherd  30         2  0.437  [0.09, 0.69]  0.191   0.131   0.02  0.694

1. Skipped spearman correlation (robust)

>>> pg.corr(x, y, method='skipped').round(3)
n  outliers      r         CI95%     r2  adj_r2  p-val  power
skipped  30         2  0.437  [0.09, 0.69]  0.191   0.131   0.02  0.694

1. One-tailed Pearson correlation

>>> pg.corr(x, y, tail="one-sided", method='pearson').round(3)
n      r          CI95%     r2  adj_r2  p-val   BF10  power
pearson  30  0.147  [-0.23, 0.48]  0.022  -0.051   0.22  0.467  0.194

1. Using columns of a pandas dataframe

>>> import pandas as pd
>>> data = pd.DataFrame({'x': x, 'y': y})
>>> pg.corr(data['x'], data['y']).round(3)
n      r          CI95%     r2  adj_r2  p-val   BF10  power
pearson  30  0.147  [-0.23, 0.48]  0.022  -0.051  0.439  0.302  0.121