pingouin.corr

pingouin.
corr
(x, y, tail='twosided', method='pearson', **kwargs)[source] (Robust) correlation between two variables.
 Parameters
 x, yarray_like
First and second set of observations.
x
andy
must be independent. tailstring
Specify whether to return
'onesided'
or'twosided'
pvalue. Note that the former are simply half the latter. methodstring
Correlation type:
'pearson'
: Pearson \(r\) productmoment correlation'spearman'
: Spearman \(\rho\) rankorder correlation'kendall'
: Kendall’s \(\tau_B\) correlation (for ordinal data)'bicor'
: Biweight midcorrelation (robust)'percbend'
: Percentage bend correlation (robust)'shepherd'
: Shepherd’s pi correlation (robust)'skipped'
: Skipped correlation (robust)
 **kwargsoptional
Optional argument(s) passed to the lowerlevel functions.
 Returns
 stats
pandas.DataFrame
'n'
: Sample size (after removal of missing values)'outliers'
: number of outliers, only if a robust method was used'r'
: Correlation coefficient'CI95'
: 95% parametric confidence intervals around \(r\)'r2'
: Rsquared (\(= r^2\))'adj_r2'
: Adjusted Rsquared'pval'
: tail of the test'BF10'
: Bayes Factor of the alternative hypothesis (only for Pearson correlation)'power'
: achieved power of the test (= 1  type II error).
 stats
See also
pairwise_corr
Pairwise correlation between columns of a pandas DataFrame
partial_corr
Partial correlation
rm_corr
Repeated measures correlation
Notes
The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Correlations of 1 or +1 imply a perfect negative and positive linear relationship, respectively, with 0 indicating the absence of association.
\[r_{xy} = \frac{\sum_i(x_i  \bar{x})(y_i  \bar{y})} {\sqrt{\sum_i(x_i  \bar{x})^2} \sqrt{\sum_i(y_i  \bar{y})^2}} = \frac{\text{cov}(x, y)}{\sigma_x \sigma_y}\]where \(\text{cov}\) is the sample covariance and \(\sigma\) is the sample standard deviation.
If
method='pearson'
, The Bayes Factor is calculated using thepingouin.bayesfactor_pearson()
function.The Spearman correlation coefficient is a nonparametric measure of the monotonicity of the relationship between two datasets. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. Correlations of 1 or +1 imply an exact negative and positive monotonic relationship, respectively. Mathematically, the Spearman correlation coefficient is defined as the Pearson correlation coefficient between the rank variables.
The Kendall correlation coefficient is a measure of the correspondence between two rankings. Values also range from 1 (perfect disagreement) to 1 (perfect agreement), with 0 indicating the absence of association. Consistent with
scipy.stats.kendalltau()
, Pingouin returns the Taub coefficient, which adjusts for ties:\[\tau_B = \frac{(P  Q)}{\sqrt{(P + Q + T) (P + Q + U)}}\]where \(P\) is the number of concordant pairs, \(Q\) the number of discordand pairs, \(T\) the number of ties in x, and \(U\) the number of ties in y.
The biweight midcorrelation and percentage bend correlation [1] are both robust methods that protects against univariate outliers by downweighting observations that deviate too much from the median.
The Shepherd pi [2] correlation and skipped [3], [4] correlation are both robust methods that returns the Spearman correlation coefficient after removing bivariate outliers. Briefly, the Shepherd pi uses a bootstrapping of the Mahalanobis distance to identify outliers, while the skipped correlation is based on the minimum covariance determinant (which requires scikitlearn). Note that these two methods are significantly slower than the previous ones.
Important
Please note that rows with missing values (NaN) are automatically removed.
References
 1
Wilcox, R.R., 1994. The percentage bend correlation coefficient. Psychometrika 59, 601–616. https://doi.org/10.1007/BF02294395
 2
Schwarzkopf, D.S., De Haas, B., Rees, G., 2012. Better ways to improve standards in brainbehavior correlation analysis. Front. Hum. Neurosci. 6, 200. https://doi.org/10.3389/fnhum.2012.00200
 3
Rousselet, G.A., Pernet, C.R., 2012. Improving standards in brainbehavior correlation analyses. Front. Hum. Neurosci. 6, 119. https://doi.org/10.3389/fnhum.2012.00119
 4
Pernet, C.R., Wilcox, R., Rousselet, G.A., 2012. Robust correlation analyses: false positive and power validation using a new open source matlab toolbox. Front. Psychol. 3, 606. https://doi.org/10.3389/fpsyg.2012.00606
Examples
Pearson correlation
>>> import numpy as np >>> import pingouin as pg >>> # Generate random correlated samples >>> np.random.seed(123) >>> mean, cov = [4, 6], [(1, .5), (.5, 1)] >>> x, y = np.random.multivariate_normal(mean, cov, 30).T >>> # Compute Pearson correlation >>> pg.corr(x, y).round(3) n r CI95% r2 adj_r2 pval BF10 power pearson 30 0.491 [0.16, 0.72] 0.242 0.185 0.006 8.55 0.809
Pearson correlation with two outliers
>>> x[3], y[5] = 12, 8 >>> pg.corr(x, y).round(3) n r CI95% r2 adj_r2 pval BF10 power pearson 30 0.147 [0.23, 0.48] 0.022 0.051 0.439 0.302 0.121
Spearman correlation (robust to outliers)
>>> pg.corr(x, y, method="spearman").round(3) n r CI95% r2 adj_r2 pval power spearman 30 0.401 [0.05, 0.67] 0.161 0.099 0.028 0.61
Biweight midcorrelation (robust)
>>> pg.corr(x, y, method="bicor").round(3) n r CI95% r2 adj_r2 pval power bicor 30 0.393 [0.04, 0.66] 0.155 0.092 0.031 0.592
Percentage bend correlation (robust)
>>> pg.corr(x, y, method='percbend').round(3) n r CI95% r2 adj_r2 pval power percbend 30 0.389 [0.03, 0.66] 0.151 0.089 0.034 0.581
Shepherd’s pi correlation (robust)
>>> pg.corr(x, y, method='shepherd').round(3) n outliers r CI95% r2 adj_r2 pval power shepherd 30 2 0.437 [0.09, 0.69] 0.191 0.131 0.02 0.694
Skipped spearman correlation (robust)
>>> pg.corr(x, y, method='skipped').round(3) n outliers r CI95% r2 adj_r2 pval power skipped 30 2 0.437 [0.09, 0.69] 0.191 0.131 0.02 0.694
Onetailed Pearson correlation
>>> pg.corr(x, y, tail="onesided", method='pearson').round(3) n r CI95% r2 adj_r2 pval BF10 power pearson 30 0.147 [0.23, 0.48] 0.022 0.051 0.22 0.467 0.194
Using columns of a pandas dataframe
>>> import pandas as pd >>> data = pd.DataFrame({'x': x, 'y': y}) >>> pg.corr(data['x'], data['y']).round(3) n r CI95% r2 adj_r2 pval BF10 power pearson 30 0.147 [0.23, 0.48] 0.022 0.051 0.439 0.302 0.121