# pingouin.partial_corr¶

pingouin.partial_corr(data=None, x=None, y=None, covar=None, x_covar=None, y_covar=None, alternative='two-sided', method='pearson')

Partial and semi-partial correlation.

Parameters
datapandas.DataFrame

Pandas Dataframe. Note that this function can also directly be used as a pandas.DataFrame method, in which case this argument is no longer needed.

x, ystring

x and y. Must be names of columns in data.

covarstring or list

Covariate(s). Must be a names of columns in data. Use a list if there are two or more covariates.

x_covarstring or list

Covariate(s) for the x variable. This is used to compute semi-partial correlation (i.e. the effect of x_covar is removed from x but not from y). Only one of covar, x_covar and y_covar can be specified.

y_covarstring or list

Covariate(s) for the y variable. This is used to compute semi-partial correlation (i.e. the effect of y_covar is removed from y but not from x). Only one of covar, x_covar and y_covar can be specified.

alternativestring

Defines the alternative hypothesis, or tail of the partial correlation. Must be one of “two-sided” (default), “greater” or “less”. Both “greater” and “less” return a one-sided p-value. “greater” tests against the alternative hypothesis that the partial correlation is positive (greater than zero), “less” tests against the hypothesis that the partial correlation is negative.

methodstring

Correlation type:

• 'pearson': Pearson $$r$$ product-moment correlation

• 'spearman': Spearman $$\rho$$ rank-order correlation

Returns
statspandas.DataFrame
• 'n': Sample size (after removal of missing values)

• 'r': Partial correlation coefficient

• 'CI95': 95% parametric confidence intervals around $$r$$

• 'p-val': p-value

Notes

Partial correlation [1] measures the degree of association between x and y, after removing the effect of one or more controlling variables (covar, or $$Z$$). Practically, this is achieved by calculating the correlation coefficient between the residuals of two linear regressions:

$x \sim Z, y \sim Z$

Like the correlation coefficient, the partial correlation coefficient takes on a value in the range from –1 to 1, where 1 indicates a perfect positive association.

The semipartial correlation is similar to the partial correlation, with the exception that the set of controlling variables is only removed for either x or y, but not both.

Pingouin uses the method described in [2] to calculate the (semi)partial correlation coefficients and associated p-values. This method is based on the inverse covariance matrix and is significantly faster than the traditional regression-based method. Results have been tested against the ppcor R package.

Important

Rows with missing values are automatically removed from data.

References

1

https://en.wikipedia.org/wiki/Partial_correlation

2

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681537/

Examples

1. Partial correlation with one covariate

>>> import pingouin as pg
>>> pg.partial_corr(data=df, x='x', y='y', covar='cv1').round(3)
n      r         CI95%  p-val
pearson  30  0.568  [0.25, 0.77]  0.001

1. Spearman partial correlation with several covariates

>>> # Partial correlation of x and y controlling for cv1, cv2 and cv3
>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
...                 method='spearman').round(3)
n      r         CI95%  p-val
spearman  30  0.521  [0.18, 0.75]  0.005

1. Same but one-sided test

>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
...                 alternative="greater", method='spearman').round(3)
n      r        CI95%  p-val
spearman  30  0.521  [0.24, 1.0]  0.003

>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
...                 alternative="less", method='spearman').round(3)
n      r         CI95%  p-val
spearman  30  0.521  [-1.0, 0.72]  0.997

1. As a pandas method

>>> df.partial_corr(x='x', y='y', covar=['cv1'], method='spearman').round(3)
n      r         CI95%  p-val
spearman  30  0.578  [0.27, 0.78]  0.001

1. Partial correlation matrix (returns only the correlation coefficients)

>>> df.pcorr().round(3)
x      y    cv1    cv2    cv3
x    1.000  0.493 -0.095  0.130 -0.385
y    0.493  1.000 -0.007  0.104 -0.002
cv1 -0.095 -0.007  1.000 -0.241 -0.470
cv2  0.130  0.104 -0.241  1.000 -0.118
cv3 -0.385 -0.002 -0.470 -0.118  1.000

1. Semi-partial correlation on x

>>> pg.partial_corr(data=df, x='x', y='y', x_covar=['cv1', 'cv2', 'cv3']).round(3)
n      r        CI95%  p-val
pearson  30  0.463  [0.1, 0.72]  0.015