pingouin.pairwise_corr

pingouin.pairwise_corr(data, columns=None, covar=None, tail='two-sided', method='pearson', padjust='none', export_filename=None)[source]

Pairwise (partial) correlations between columns of a pandas dataframe.

Parameters
datapandas DataFrame

DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.

columnslist or str

Column names in data

'["a", "b", "c"]' : combination between columns a, b, and c
'["a"]' : product between a and all the other numeric columns
'[["a"], ["b", "c"]]' : product between ["a"] and ["b", "c"]
'[["a", "d"], ["b", "c"]]' : product between ["a", "d"] and ["b", "c"]
'[["a", "d"], None]' : product between ["a", "d"] and all other columns

Note that if column is not specified, then the function will return the pairwise correlation between the combination of all the numeric columns in data. See the examples section for more details on this.

covarNone, string or list

Covariate(s) for partial correlation. Must be one or more columns in data. Use a list if there are more than one covariate. If covar is not None, a partial correlation will be computed using pingouin.partial_corr() function.

tailstring

Indicates whether to return the ‘two-sided’ or ‘one-sided’ p-values

methodstring

Specify which method to use for the computation of the correlation coefficient. Available methods are

'pearson' : Pearson product-moment correlation
'spearman' : Spearman rank-order correlation
'kendall' : Kendall’s tau (ordinal data)
'percbend' : percentage bend correlation (robust)
'shepherd' : Shepherd's pi correlation (robust Spearman)
padjuststring

Method used for testing and adjustment of pvalues. Available methods are

'none' : no correction
'bonferroni' : one-step Bonferroni correction
'holm' : step-down method using Bonferroni adjustments
'fdr_bh' : Benjamini/Hochberg FDR correction
'fdr_by' : Benjamini/Yekutieli FDR correction
export_filenamestring

Filename (without extension) for the output file. If None, do not export the table. By default, the file will be created in the current python console directory. To change that, specify the filename with full path.

Returns
statsDataFrame

Stats summary

'X' : Name(s) of first columns
'Y' : Name(s) of second columns
'method' : method used to compute the correlation
'covar' : List of specified covariate(s) (only for partial correlation)
'tail' : indicates whether the p-values are one-sided or two-sided
'n' : Sample size (after NaN removal)
'r' : Correlation coefficients
'CI95' : 95% parametric confidence intervals
'r2' : R-squared values
'adj_r2' : Adjusted R-squared values
'z' : Standardized correlation coefficients
'p-unc' : uncorrected one or two tailed p-values
'p-corr' : corrected one or two tailed p-values
'p-adjust' : Correction method

Notes

Please refer to the pingouin.corr() function for a description of the different methods. NaN are automatically removed from the data.

This function is more flexible and gives a much more detailed output than the pandas.DataFrame.corr() method (i.e. p-values, confidence interval, Bayes Factor..). This comes however at an increased computational cost. While this should not be discernible for dataframe with less than 10,000 rows and/or less than 20 columns, this function can be slow for very large dataset. For speed purpose, the Bayes Factor is only computed when the sample size is less than 1000 (and method=’pearson’).

This function also works with two-dimensional multi-index columns. In this case, columns must be list(s) of tuple(s). See the Jupyter notebook for more details: https://github.com/raphaelvallat/pingouin/blob/master/notebooks/04_Correlations.ipynb

If covar is specified, this function will compute the pairwise partial correlation between the variables. If you are only interested in computing the partial correlation matrix (i.e. the raw pairwise partial correlation coefficient matrix, without the p-values, sample sizes, etc), a better alternative is to use the pingouin.pcorr() function (see example 7).

Examples

  1. One-tailed spearman correlation corrected for multiple comparisons

>>> from pingouin import pairwise_corr, read_dataset
>>> data = read_dataset('pairwise_corr').iloc[:, 1:]
>>> pairwise_corr(data, method='spearman', tail='two-sided',
...               padjust='bonf')  # doctest: +SKIP
  1. Robust two-sided correlation with uncorrected p-values

>>> pcor = pairwise_corr(data, columns=['Openness', 'Extraversion',
...                                     'Neuroticism'], method='percbend')
  1. One-versus-all pairwise correlations

>>> pairwise_corr(data, columns=['Neuroticism'])  # doctest: +SKIP
  1. Pairwise correlations between two lists of columns (cartesian product)

>>> columns = [['Neuroticism', 'Extraversion'], ['Openness']]
>>> pairwise_corr(data, columns)   # doctest: +SKIP
  1. As a Pandas method

>>> pcor = data.pairwise_corr(covar='Neuroticism', method='spearman')
  1. Pairwise partial correlation

>>> pcor = pairwise_corr(data, covar='Neuroticism')  # One covariate
>>> pcor = pairwise_corr(data, covar=['Neuroticism', 'Openness'])  # Two
  1. Pairwise partial correlation matrix (only the r-values)

>>> data[['Neuroticism', 'Openness', 'Extraversion']].pcorr()
              Neuroticism  Openness  Extraversion
Neuroticism      1.000000  0.092097     -0.360421
Openness         0.092097  1.000000      0.281312
Extraversion    -0.360421  0.281312      1.000000