pingouin.partial_corr

pingouin.partial_corr(data=None, x=None, y=None, covar=None, x_covar=None, y_covar=None, tail='two-sided', method='pearson')[source]

Partial and semi-partial correlation.

Parameters
datapd.DataFrame

Dataframe. Note that this function can also directly be used as a pandas.DataFrame method, in which case this argument is no longer needed.

x, ystring

x and y. Must be names of columns in data.

covarstring or list

Covariate(s). Must be a names of columns in data. Use a list if there are two or more covariates.

x_covarstring or list

Covariate(s) for the x variable. This is used to compute semi-partial correlation (i.e. the effect of x_covar is removed from x but not from y). Note that you cannot specify both covar and x_covar.

y_covarstring or list

Covariate(s) for the y variable. This is used to compute semi-partial correlation (i.e. the effect of y_covar is removed from y but not from x). Note that you cannot specify both covar and y_covar.

tailstring

Specify whether to return the ‘one-sided’ or ‘two-sided’ p-value.

methodstring

Specify which method to use for the computation of the correlation coefficient. Available methods are

'pearson' : Pearson product-moment correlation
'spearman' : Spearman rank-order correlation
'kendall' : Kendall’s tau (ordinal data)
'percbend' : percentage bend correlation (robust)
'shepherd' : Shepherd's pi correlation (robust Spearman)
'skipped' : skipped correlation (robust Spearman, requires sklearn)
Returns
statspandas DataFrame

Test summary

'n' : Sample size (after NaN removal)
'outliers' : number of outliers (only for 'shepherd' or 'skipped')
'r' : Correlation coefficient
'CI95' : 95% parametric confidence intervals
'r2' : R-squared
'adj_r2' : Adjusted R-squared
'p-val' : one or two tailed p-value
'BF10' : Bayes Factor of the alternative hypothesis (Pearson only)
'power' : achieved power of the test (= 1 - type II error).

Notes

From [4]:

“With partial correlation, we find the correlation between \(x\) and \(y\) holding \(C\) constant for both \(x\) and \(y\). Sometimes, however, we want to hold \(C\) constant for just \(x\) or just \(y\). In that case, we compute a semi-partial correlation. A partial correlation is computed between two residuals. A semi-partial correlation is computed between one residual and another raw (or unresidualized) variable.”

Note that if you are not interested in calculating the statistics and p-values but only the partial correlation matrix, a (faster) alternative is to use the pingouin.pcorr() method (see example 4).

Rows with missing values are automatically removed from data. Results have been tested against the ppcor R package.

References

1

https://en.wikipedia.org/wiki/Partial_correlation

2

https://cran.r-project.org/web/packages/ppcor/index.html

3

https://gist.github.com/fabianp/9396204419c7b638d38f

4(1,2)

http://faculty.cas.usf.edu/mbrannick/regression/Partial.html

Examples

  1. Partial correlation with one covariate

>>> import pingouin as pg
>>> df = pg.read_dataset('partial_corr')
>>> pg.partial_corr(data=df, x='x', y='y', covar='cv1')
          n      r         CI95%     r2  adj_r2     p-val    BF10  power
pearson  30  0.568  [0.26, 0.77]  0.323   0.273  0.001055  28.695  0.925
  1. Spearman partial correlation with several covariates

>>> # Partial correlation of x and y controlling for cv1, cv2 and cv3
>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
...                 method='spearman')
           n      r         CI95%     r2  adj_r2     p-val  power
spearman  30  0.491  [0.16, 0.72]  0.242   0.185  0.005817  0.809
  1. As a pandas method

>>> df.partial_corr(x='x', y='y', covar=['cv1'], method='spearman')
           n      r         CI95%     r2  adj_r2     p-val  power
spearman  30  0.568  [0.26, 0.77]  0.323   0.273  0.001049  0.925
  1. Partial correlation matrix (returns only the correlation coefficients)

>>> df.pcorr().round(3)
         x      y    cv1    cv2    cv3
x    1.000  0.493 -0.095  0.130 -0.385
y    0.493  1.000 -0.007  0.104 -0.002
cv1 -0.095 -0.007  1.000 -0.241 -0.470
cv2  0.130  0.104 -0.241  1.000 -0.118
cv3 -0.385 -0.002 -0.470 -0.118  1.000
  1. Semi-partial correlation on x

>>> pg.partial_corr(data=df, x='x', y='y', x_covar=['cv1', 'cv2', 'cv3'])
          n      r         CI95%     r2  adj_r2     p-val   BF10  power
pearson  30  0.463  [0.12, 0.71]  0.215   0.156  0.009946  3.809  0.752
  1. Semi-partial on both``x`` and y controlling for different variables

>>> pg.partial_corr(data=df, x='x', y='y', x_covar='cv1',
...                 y_covar=['cv2', 'cv3'], method='spearman')
           n      r         CI95%     r2  adj_r2     p-val  power
spearman  30  0.429  [0.08, 0.68]  0.184   0.123  0.018092  0.676