pingouin.partial_corr

pingouin.partial_corr(data=None, x=None, y=None, covar=None, x_covar=None, y_covar=None, alternative='two-sided', method='pearson')

Partial and semi-partial correlation.

Parameters
datapandas.DataFrame

Pandas Dataframe. Note that this function can also directly be used as a pandas.DataFrame method, in which case this argument is no longer needed.

x, ystring

x and y. Must be names of columns in data.

covarstring or list

Covariate(s). Must be a names of columns in data. Use a list if there are two or more covariates.

x_covarstring or list

Covariate(s) for the x variable. This is used to compute semi-partial correlation (i.e. the effect of x_covar is removed from x but not from y). Only one of covar, x_covar and y_covar can be specified.

y_covarstring or list

Covariate(s) for the y variable. This is used to compute semi-partial correlation (i.e. the effect of y_covar is removed from y but not from x). Only one of covar, x_covar and y_covar can be specified.

alternativestring

Defines the alternative hypothesis, or tail of the partial correlation. Must be one of “two-sided” (default), “greater” or “less”. Both “greater” and “less” return a one-sided p-value. “greater” tests against the alternative hypothesis that the partial correlation is positive (greater than zero), “less” tests against the hypothesis that the partial correlation is negative.

methodstring

Correlation type:

  • 'pearson': Pearson \(r\) product-moment correlation

  • 'spearman': Spearman \(\rho\) rank-order correlation

Returns
statspandas.DataFrame
  • 'n': Sample size (after removal of missing values)

  • 'r': Partial correlation coefficient

  • 'CI95': 95% parametric confidence intervals around \(r\)

  • 'p-val': p-value

Notes

Partial correlation [1] measures the degree of association between x and y, after removing the effect of one or more controlling variables (covar, or \(Z\)). Practically, this is achieved by calculating the correlation coefficient between the residuals of two linear regressions:

\[x \sim Z, y \sim Z\]

Like the correlation coefficient, the partial correlation coefficient takes on a value in the range from –1 to 1, where 1 indicates a perfect positive association.

The semipartial correlation is similar to the partial correlation, with the exception that the set of controlling variables is only removed for either x or y, but not both.

Pingouin uses the method described in [2] to calculate the (semi)partial correlation coefficients and associated p-values. This method is based on the inverse covariance matrix and is significantly faster than the traditional regression-based method. Results have been tested against the ppcor R package.

Important

Rows with missing values are automatically removed from data.

References

1

https://en.wikipedia.org/wiki/Partial_correlation

2

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681537/

Examples

  1. Partial correlation with one covariate

>>> import pingouin as pg
>>> df = pg.read_dataset('partial_corr')
>>> pg.partial_corr(data=df, x='x', y='y', covar='cv1').round(3)
          n      r         CI95%  p-val
pearson  30  0.568  [0.25, 0.77]  0.001
  1. Spearman partial correlation with several covariates

>>> # Partial correlation of x and y controlling for cv1, cv2 and cv3
>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
...                 method='spearman').round(3)
           n      r         CI95%  p-val
spearman  30  0.521  [0.18, 0.75]  0.005
  1. Same but one-sided test

>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
...                 alternative="greater", method='spearman').round(3)
           n      r        CI95%  p-val
spearman  30  0.521  [0.24, 1.0]  0.003
>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
...                 alternative="less", method='spearman').round(3)
           n      r         CI95%  p-val
spearman  30  0.521  [-1.0, 0.72]  0.997
  1. As a pandas method

>>> df.partial_corr(x='x', y='y', covar=['cv1'], method='spearman').round(3)
           n      r         CI95%  p-val
spearman  30  0.578  [0.27, 0.78]  0.001
  1. Partial correlation matrix (returns only the correlation coefficients)

>>> df.pcorr().round(3)
         x      y    cv1    cv2    cv3
x    1.000  0.493 -0.095  0.130 -0.385
y    0.493  1.000 -0.007  0.104 -0.002
cv1 -0.095 -0.007  1.000 -0.241 -0.470
cv2  0.130  0.104 -0.241  1.000 -0.118
cv3 -0.385 -0.002 -0.470 -0.118  1.000
  1. Semi-partial correlation on x

>>> pg.partial_corr(data=df, x='x', y='y', x_covar=['cv1', 'cv2', 'cv3']).round(3)
          n      r        CI95%  p-val
pearson  30  0.463  [0.1, 0.72]  0.015