pingouin.power_rm_anova

pingouin.
power_rm_anova
(eta=None, m=None, n=None, power=None, alpha=0.05, corr=0.5, epsilon=1)[source] Evaluate power, sample size, effect size or significance level of a balanced oneway repeated measures ANOVA.
 Parameters
 etafloat
ANOVA effect size (etasquare = \(\eta^2\)).
 mint
Number of repeated measurements.
 nint
Sample size per measurement. All measurements must have the same sample size.
 powerfloat
Test power (= 1  type II error).
 alphafloat
Significance level \(\alpha\) (type I error probability). The default is 0.05.
 corrfloat
Average correlation coefficient among repeated measurements. The default is \(r=0.5\).
 epsilonfloat
Epsilon adjustement factor for sphericity. This can be calculated using the
pingouin.epsilon()
function.
Notes
Exactly ONE of the parameters
eta
,m
,n
,power
andalpha
must be passed as None, and that parameter is determined from the others.Notice that
alpha
has a default value of 0.05 so None must be explicitly passed if you want to compute it.Statistical power is the likelihood that a study will detect an effect when there is an effect there to be detected. A high statistical power means that there is a low probability of concluding that there is no effect when there is one. Statistical power is mainly affected by the effect size and the sample size.
For oneway repeated measure ANOVA, etasquare is the same as partial etasquare. It can be evaluated from the Fvalue (\(F^*\)) and the degrees of freedom of the ANOVA (\(v_1, v_2\)) using the following formula:
\[\eta^2 = \frac{v_1 F^*}{v_1 F^* + v_2}\]Note that GPower uses the \(f\) effect size instead of the \(\eta^2\). The formula to convert from one to the other are given below:
\[f = \sqrt{\frac{\eta^2}{1  \eta^2}}\]\[\eta^2 = \frac{f^2}{1 + f^2}\]Using \(\eta^2\), the sample size \(N\), the number of repeated measurements \(m\), the epsilon correction factor \(\epsilon\) (see
pingouin.epsilon()
), and the average correlation between the repeated measures \(c\), one can then calculate the noncentrality parameter as follow:\[\delta = \frac{f^2 * N * m * \epsilon}{1  c}\]Then the critical value of the noncentral Fdistribution is computed using the percentile point function of the Fdistribution with:
\[q = 1  \alpha\]\[v_1 = (m  1) * \epsilon\]\[v_2 = (N  1) * v_1\]Finally, the power of the ANOVA is calculated using the survival function of the noncentral Fdistribution using the previously computed critical value, noncentrality parameter, and degrees of freedom.
scipy.optimize.brenth()
is used to solve power equations for other variables (i.e. sample size, effect size, or significance level). If the solving fails, a nan value is returned.Results have been validated against GPower.
References
 1
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.
 2
Examples
Compute achieved power
>>> from pingouin import power_rm_anova >>> print('power: %.4f' % power_rm_anova(eta=0.1, m=3, n=20)) power: 0.8913
Compute required number of groups
>>> print('m: %.4f' % power_rm_anova(eta=0.1, n=20, power=0.90)) m: 3.1347
Compute required sample size
>>> print('n: %.4f' % power_rm_anova(eta=0.1, m=3, power=0.80)) n: 15.9979
Compute achieved effect size
>>> print('eta: %.4f' % power_rm_anova(n=20, m=4, power=0.80, alpha=0.05)) eta: 0.0680
Compute achieved alpha (significance)
>>> print('alpha: %.4f' % power_rm_anova(eta=0.1, n=20, m=4, power=0.80, ... alpha=None)) alpha: 0.0081
Let’s take a more concrete example. First, we’ll load a repeated measures dataset in wideformat. Each row is an observation (e.g. a subject), and each column a successive repeated measurements (e.g t=0, t=1, …).
>>> import pingouin as pg >>> data = pg.read_dataset('rm_anova_wide') >>> data.head() Before 1 week 2 week 3 week 0 4.3 5.3 4.8 6.3 1 3.9 2.3 5.6 4.3 2 4.5 2.6 4.1 NaN 3 5.1 4.2 6.0 6.3 4 3.8 3.6 4.8 6.8
Note that this dataset has some missing values. We’ll simply delete any row with one or more missing values, and then compute a repeated measures ANOVA:
>>> data = data.dropna() >>> pg.rm_anova(data) Source ddof1 ddof2 F punc np2 eps 0 Within 3 24 5.201 0.006557 0.394 0.694
The repeated measures ANOVA is significant at the 0.05 level. Now, we can easily compute the power of the ANOVA with the information in the ANOVA table:
>>> # n is the sample size and m is the number of repeated measures >>> n, m = data.shape >>> pg.power_rm_anova(eta=0.394, m=m, n=n, epsilon=0.694) 0.9976707714861207
Our ANOVA has a very high statistical power. However, to be even more accurate in our power calculation, we should also fill in the average correlation among repeated measurements. Since our dataframe is in wideformat (with each column being a successive measurement), this can be done by taking the mean of the superdiagonal of the correlation matrix, which is similar to manually calculating the correlation between each successive pairwise measurements and then taking the mean. Since correlation coefficients are not normally distributed, we use the rtoz transform prior to averaging (
numpy.arctanh()
), and then the ztor transform (numpy.tanh()
) to convert back to a correlation coefficient. This gives a more precise estimate of the mean.>>> import numpy as np >>> corr = np.diag(data.corr(), k=1) >>> avgcorr = np.tanh(np.arctanh(corr).mean()) >>> avgcorr 0.19955358859483566
In this example, we’re using a fake dataset and the average correlation is negative. However, it will most likely be positive with real data. Let’s now compute the final power of the repeated measures ANOVA:
>>> pg.power_rm_anova(eta=0.394, m=m, n=n, epsilon=0.694, corr=avgcorr) 0.8545404196391064