pingouin.multicomp

pingouin.
multicomp
(pvals, alpha=0.05, method='holm')[source] Pvalues correction for multiple comparisons.
 Parameters
 pvalsarray_like
Uncorrected pvalues.
 alphafloat
Significance level.
 methodstring
Method used for testing and adjustment of pvalues. Can be either the full name or initial letters. Available methods are
'bonf' : onestep Bonferroni correction 'sidak' : onestep Sidak correction 'holm' : stepdown method using Bonferroni adjustments 'fdr_bh' : Benjamini/Hochberg FDR correction 'fdr_by' : Benjamini/Yekutieli FDR correction 'none' : passthrough option (no correction applied)
 Returns
 rejectarray, boolean
True for hypothesis that can be rejected for given alpha.
 pvals_correctedarray
Pvalues corrected for multiple testing.
Notes
This function is similar to the p.adjust R function.
The correction methods include the Bonferroni correction (
bonf
) in which the pvalues are multiplied by the number of comparisons. Less conservative methods are also included such as Sidak (1967) (sidak
), Holm (1979) (holm
), Benjamini & Hochberg (1995) (fdr_bh
), and Benjamini & Yekutieli (2001) (fdr_by
), respectively.The first three methods are designed to give strong control of the familywise error rate. Note that the Holm’s method is usually preferred. The
fdr_bh
andfdr_by
methods control the false discovery rate, i.e. the expected proportion of false discoveries amongst the rejected hypotheses. The false discovery rate is a less stringent condition than the familywise error rate, so these methods are more powerful than the others.The Bonferroni adjusted pvalues are defined as:
\[\widetilde {p}_{{(i)}}= n \cdot p_{{(i)}}\]where \(n\) is the number of finite pvalues (i.e. excluding NaN).
The Sidak adjusted pvalues are defined as:
\[\widetilde {p}_{{(i)}}= 1  (1  p_{{(i)}})^{n}\]The Holm adjusted pvalues are the running maximum of the sorted pvalues divided by the corresponding increasing alpha level:
\[\widetilde {p}_{{(i)}}=\max _{{j\leq i}}\left\{(nj+1)p_{{(j)}} \right\}_{{1}}\]The Benjamini–Hochberg procedure (BH stepup procedure) controls the false discovery rate (FDR) at level \(\alpha\). It works as follows:
1. For a given \(\alpha\), find the largest \(k\) such that \(P_{(k)}\leq \frac {k}{n}\alpha.\)
2. Reject the null hypothesis for all \(H_{(i)}\) for \(i = 1, \ldots, k\).
The BH procedure is valid when the \(n\) tests are independent, and also in various scenarios of dependence, but is not universally valid.
The Benjamini–Yekutieli procedure (BY) controls the FDR under arbitrary dependence assumptions. This refinement modifies the threshold and finds the largest \(k\) such that:
\[P_{(k)} \leq \frac{k}{n \cdot c(n)} \alpha\]References
Bonferroni, C. E. (1935). Il calcolo delle assicurazioni su gruppi di teste. Studi in onore del professore salvatore ortu carboni, 1360.
Šidák, Z. K. (1967). “Rectangular Confidence Regions for the Means of Multivariate Normal Distributions”. Journal of the American Statistical Association. 62 (318): 626–633.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70.
Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289–300.
Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.
Examples
FDR correction of an array of pvalues
>>> import pingouin as pg >>> pvals = [.50, .003, .32, .054, .0003] >>> reject, pvals_corr = pg.multicomp(pvals, method='fdr_bh') >>> print(reject, pvals_corr) [False True False False True] [0.5 0.0075 0.4 0.09 0.0015]
Holm correction with missing values
>>> import numpy as np >>> pvals[2] = np.nan >>> reject, pvals_corr = pg.multicomp(pvals, method='holm') >>> print(reject, pvals_corr) [False True False False True] [0.5 0.009 nan 0.108 0.0012]