# pingouin.partial_corr

pingouin.partial_corr(data=None, x=None, y=None, covar=None, x_covar=None, y_covar=None, tail='two-sided', method='pearson')[source]

Partial and semi-partial correlation.

Parameters
datapd.DataFrame

Dataframe. Note that this function can also directly be used as a pandas.DataFrame method, in which case this argument is no longer needed.

x, ystring

x and y. Must be names of columns in data.

covarstring or list

Covariate(s). Must be a names of columns in data. Use a list if there are two or more covariates.

x_covarstring or list

Covariate(s) for the x variable. This is used to compute semi-partial correlation (i.e. the effect of x_covar is removed from x but not from y). Note that you cannot specify both covar and x_covar.

y_covarstring or list

Covariate(s) for the y variable. This is used to compute semi-partial correlation (i.e. the effect of y_covar is removed from y but not from x). Note that you cannot specify both covar and y_covar.

tailstring

Specify whether to return the ‘one-sided’ or ‘two-sided’ p-value.

methodstring

Specify which method to use for the computation of the correlation coefficient. Available methods are

'pearson' : Pearson product-moment correlation
'spearman' : Spearman rank-order correlation
'kendall' : Kendall’s tau (ordinal data)
'percbend' : percentage bend correlation (robust)
'shepherd' : Shepherd's pi correlation (robust Spearman)
'skipped' : skipped correlation (robust Spearman, requires sklearn)
Returns
statspandas DataFrame

Test summary

'n' : Sample size (after NaN removal)
'outliers' : number of outliers (only for 'shepherd' or 'skipped')
'r' : Correlation coefficient
'CI95' : 95% parametric confidence intervals
'r2' : R-squared
'adj_r2' : Adjusted R-squared
'p-val' : one or two tailed p-value
'BF10' : Bayes Factor of the alternative hypothesis (Pearson only)
'power' : achieved power of the test (= 1 - type II error).

Notes

From [4]:

“With partial correlation, we find the correlation between $$x$$ and $$y$$ holding $$C$$ constant for both $$x$$ and $$y$$. Sometimes, however, we want to hold $$C$$ constant for just $$x$$ or just $$y$$. In that case, we compute a semi-partial correlation. A partial correlation is computed between two residuals. A semi-partial correlation is computed between one residual and another raw (or unresidualized) variable.”

Note that if you are not interested in calculating the statistics and p-values but only the partial correlation matrix, a (faster) alternative is to use the pingouin.pcorr() method (see example 4).

Rows with missing values are automatically removed from data. Results have been tested against the ppcor R package.

References

1

https://en.wikipedia.org/wiki/Partial_correlation

2

https://cran.r-project.org/web/packages/ppcor/index.html

3

https://gist.github.com/fabianp/9396204419c7b638d38f

4(1,2)

http://faculty.cas.usf.edu/mbrannick/regression/Partial.html

Examples

1. Partial correlation with one covariate

>>> import pingouin as pg
>>> df = pg.read_dataset('partial_corr')
>>> pg.partial_corr(data=df, x='x', y='y', covar='cv1')
n      r         CI95%     r2  adj_r2     p-val    BF10  power
pearson  30  0.568  [0.26, 0.77]  0.323   0.273  0.001055  37.773  0.925
1. Spearman partial correlation with several covariates

>>> # Partial correlation of x and y controlling for cv1, cv2 and cv3
>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
...                 method='spearman')
n      r         CI95%     r2  adj_r2     p-val  power
spearman  30  0.491  [0.16, 0.72]  0.242   0.185  0.005817  0.809
1. As a pandas method

>>> df.partial_corr(x='x', y='y', covar=['cv1'], method='spearman')
n      r         CI95%     r2  adj_r2     p-val  power
spearman  30  0.568  [0.26, 0.77]  0.323   0.273  0.001049  0.925
1. Partial correlation matrix (returns only the correlation coefficients)

>>> df.pcorr().round(3)
x      y    cv1    cv2    cv3
x    1.000  0.493 -0.095  0.130 -0.385
y    0.493  1.000 -0.007  0.104 -0.002
cv1 -0.095 -0.007  1.000 -0.241 -0.470
cv2  0.130  0.104 -0.241  1.000 -0.118
cv3 -0.385 -0.002 -0.470 -0.118  1.000
1. Semi-partial correlation on x

>>> pg.partial_corr(data=df, x='x', y='y', x_covar=['cv1', 'cv2', 'cv3'])
n      r         CI95%     r2  adj_r2     p-val   BF10  power
pearson  30  0.463  [0.12, 0.71]  0.215   0.156  0.009946  5.404  0.752
1. Semi-partial on bothx and y controlling for different variables

>>> pg.partial_corr(data=df, x='x', y='y', x_covar='cv1',
...                 y_covar=['cv2', 'cv3'], method='spearman')
n      r         CI95%     r2  adj_r2     p-val  power
spearman  30  0.429  [0.08, 0.68]  0.184   0.123  0.018092  0.676