pingouin.corr¶

pingouin.
corr
(x, y, alternative='twosided', method='pearson', **kwargs)¶ (Robust) correlation between two variables.
 Parameters
 x, yarray_like
First and second set of observations.
x
andy
must be independent. alternativestring
Defines the alternative hypothesis, or tail of the correlation. Must be one of “twosided” (default), “greater” or “less”. Both “greater” and “less” return a onesided pvalue. “greater” tests against the alternative hypothesis that the correlation is positive (greater than zero), “less” tests against the hypothesis that the correlation is negative.
 methodstring
Correlation type:
'pearson'
: Pearson \(r\) productmoment correlation'spearman'
: Spearman \(\rho\) rankorder correlation'kendall'
: Kendall’s \(\tau_B\) correlation (for ordinal data)'bicor'
: Biweight midcorrelation (robust)'percbend'
: Percentage bend correlation (robust)'shepherd'
: Shepherd’s pi correlation (robust)'skipped'
: Skipped correlation (robust)
 **kwargsoptional
Optional argument(s) passed to the lowerlevel correlation functions.
 Returns
 stats
pandas.DataFrame
'n'
: Sample size (after removal of missing values)'outliers'
: number of outliers, only if a robust method was used'r'
: Correlation coefficient'CI95'
: 95% parametric confidence intervals around \(r\)'pval'
: pvalue'BF10'
: Bayes Factor of the alternative hypothesis (only for Pearson correlation)'power'
: achieved power of the test with an alpha of 0.05.
 stats
See also
pairwise_corr
Pairwise correlation between columns of a pandas DataFrame
partial_corr
Partial correlation
rm_corr
Repeated measures correlation
Notes
The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed. Correlations of 1 or +1 imply a perfect negative and positive linear relationship, respectively, with 0 indicating the absence of association.
\[r_{xy} = \frac{\sum_i(x_i  \bar{x})(y_i  \bar{y})} {\sqrt{\sum_i(x_i  \bar{x})^2} \sqrt{\sum_i(y_i  \bar{y})^2}} = \frac{\text{cov}(x, y)}{\sigma_x \sigma_y}\]where \(\text{cov}\) is the sample covariance and \(\sigma\) is the sample standard deviation.
If
method='pearson'
, The Bayes Factor is calculated using thepingouin.bayesfactor_pearson()
function.The Spearman correlation coefficient is a nonparametric measure of the monotonicity of the relationship between two datasets. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. Correlations of 1 or +1 imply an exact negative and positive monotonic relationship, respectively. Mathematically, the Spearman correlation coefficient is defined as the Pearson correlation coefficient between the rank variables.
The Kendall correlation coefficient is a measure of the correspondence between two rankings. Values also range from 1 (perfect disagreement) to 1 (perfect agreement), with 0 indicating the absence of association. Consistent with
scipy.stats.kendalltau()
, Pingouin returns the Taub coefficient, which adjusts for ties:\[\tau_B = \frac{(P  Q)}{\sqrt{(P + Q + T) (P + Q + U)}}\]where \(P\) is the number of concordant pairs, \(Q\) the number of discordand pairs, \(T\) the number of ties in x, and \(U\) the number of ties in y.
The biweight midcorrelation and percentage bend correlation [1] are both robust methods that protects against univariate outliers by downweighting observations that deviate too much from the median.
The Shepherd pi [2] correlation and skipped [3], [4] correlation are both robust methods that returns the Spearman correlation coefficient after removing bivariate outliers. Briefly, the Shepherd pi uses a bootstrapping of the Mahalanobis distance to identify outliers, while the skipped correlation is based on the minimum covariance determinant (which requires scikitlearn). Note that these two methods are significantly slower than the previous ones.
The confidence intervals for the correlation coefficient are estimated using the Fisher transformation.
Important
Rows with missing values (NaN) are automatically removed.
References
 1
Wilcox, R.R., 1994. The percentage bend correlation coefficient. Psychometrika 59, 601–616. https://doi.org/10.1007/BF02294395
 2
Schwarzkopf, D.S., De Haas, B., Rees, G., 2012. Better ways to improve standards in brainbehavior correlation analysis. Front. Hum. Neurosci. 6, 200. https://doi.org/10.3389/fnhum.2012.00200
 3
Rousselet, G.A., Pernet, C.R., 2012. Improving standards in brainbehavior correlation analyses. Front. Hum. Neurosci. 6, 119. https://doi.org/10.3389/fnhum.2012.00119
 4
Pernet, C.R., Wilcox, R., Rousselet, G.A., 2012. Robust correlation analyses: false positive and power validation using a new open source matlab toolbox. Front. Psychol. 3, 606. https://doi.org/10.3389/fpsyg.2012.00606
Examples
Pearson correlation
>>> import numpy as np >>> import pingouin as pg >>> # Generate random correlated samples >>> np.random.seed(123) >>> mean, cov = [4, 6], [(1, .5), (.5, 1)] >>> x, y = np.random.multivariate_normal(mean, cov, 30).T >>> # Compute Pearson correlation >>> pg.corr(x, y).round(3) n r CI95% pval BF10 power pearson 30 0.491 [0.16, 0.72] 0.006 8.55 0.809
Pearson correlation with two outliers
>>> x[3], y[5] = 12, 8 >>> pg.corr(x, y).round(3) n r CI95% pval BF10 power pearson 30 0.147 [0.23, 0.48] 0.439 0.302 0.121
Spearman correlation (robust to outliers)
>>> pg.corr(x, y, method="spearman").round(3) n r CI95% pval power spearman 30 0.401 [0.05, 0.67] 0.028 0.61
Biweight midcorrelation (robust)
>>> pg.corr(x, y, method="bicor").round(3) n r CI95% pval power bicor 30 0.393 [0.04, 0.66] 0.031 0.592
Percentage bend correlation (robust)
>>> pg.corr(x, y, method='percbend').round(3) n r CI95% pval power percbend 30 0.389 [0.03, 0.66] 0.034 0.581
Shepherd’s pi correlation (robust)
>>> pg.corr(x, y, method='shepherd').round(3) n outliers r CI95% pval power shepherd 30 2 0.437 [0.08, 0.7] 0.02 0.662
Skipped spearman correlation (robust)
>>> pg.corr(x, y, method='skipped').round(3) n outliers r CI95% pval power skipped 30 2 0.437 [0.08, 0.7] 0.02 0.662
Onetailed Pearson correlation
>>> pg.corr(x, y, alternative="greater", method='pearson').round(3) n r CI95% pval BF10 power pearson 30 0.147 [0.17, 1.0] 0.22 0.467 0.194
>>> pg.corr(x, y, alternative="less", method='pearson').round(3) n r CI95% pval BF10 power pearson 30 0.147 [1.0, 0.43] 0.78 0.137 0.008
Perfect correlation
>>> pg.corr(x, x).round(3) n r CI95% pval BF10 power pearson 30 1.0 [1.0, 1.0] 0.0 inf 1
Using columns of a pandas dataframe
>>> import pandas as pd >>> data = pd.DataFrame({'x': x, 'y': y}) >>> pg.corr(data['x'], data['y']).round(3) n r CI95% pval BF10 power pearson 30 0.147 [0.23, 0.48] 0.439 0.302 0.121