pingouin.anova

pingouin.
anova
(data=None, dv=None, between=None, ss_type=2, detailed=False, effsize='np2')[source] Oneway and Nway ANOVA.
 Parameters
 data
pandas.DataFrame
DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.
 dvstring
Name of column in
data
containing the dependent variable. betweenstring or list with N elements
Name of column(s) in
data
containing the betweensubject factor(s). Ifbetween
is a single string, a oneway ANOVA is computed. Ifbetween
is a list with two or more elements, a Nway ANOVA is performed. Note that Pingouin will internally call statsmodels to calculate ANOVA with 3 or more factors, or unbalanced twoway ANOVA. ss_typeint
Specify how the sums of squares is calculated for unbalanced design with 2 or more factors. Can be 1, 2 (default), or 3. This has no impact on oneway design or Nway ANOVA with balanced data.
 detailedboolean
If True, return a detailed ANOVA table (default True for Nway ANOVA).
 effsizestr
Effect size. Must be ‘np2’ (partial etasquared) or ‘n2’ (etasquared). Note that for oneway ANOVA partial etasquared is the same as etasquared.
 data
 Returns
 aov
pandas.DataFrame
ANOVA summary:
'Source'
: Factor names'SS'
: Sums of squares'DF'
: Degrees of freedom'MS'
: Mean squares'F'
: Fvalues'punc'
: uncorrected pvalues'np2'
: Partial etasquare effect sizes
 aov
See also
rm_anova
Oneway and twoway repeated measures ANOVA
mixed_anova
Two way mixed ANOVA
welch_anova
Oneway Welch ANOVA
kruskal
Nonparametric oneway ANOVA
Notes
The classic ANOVA is very powerful when the groups are normally distributed and have equal variances. However, when the groups have unequal variances, it is best to use the Welch ANOVA (
pingouin.welch_anova()
) that better controls for type I error (Liu 2015). The homogeneity of variances can be measured with thepingouin.homoscedasticity()
function.The main idea of ANOVA is to partition the variance (sums of squares) into several components. For example, in oneway ANOVA:
\[ \begin{align}\begin{aligned}SS_{\text{total}} = SS_{\text{effect}} + SS_{\text{error}}\\SS_{\text{total}} = \sum_i \sum_j (Y_{ij}  \overline{Y})^2\\SS_{\text{effect}} = \sum_i n_i (\overline{Y_i}  \overline{Y})^2\\SS_{\text{error}} = \sum_i \sum_j (Y_{ij}  \overline{Y}_i)^2\end{aligned}\end{align} \]where \(i=1,...,r; j=1,...,n_i\), \(r\) is the number of groups, and \(n_i\) the number of observations for the \(i\) th group.
The Fstatistics is then defined as:
\[F^* = \frac{MS_{\text{effect}}}{MS_{\text{error}}} = \frac{SS_{\text{effect}} / (r  1)}{SS_{\text{error}} / (n_t  r)}\]and the pvalue can be calculated using a Fdistribution with \(r1, n_t1\) degrees of freedom.
When the groups are balanced and have equal variances, the optimal posthoc test is the TukeyHSD test (
pingouin.pairwise_tukey()
). If the groups have unequal variances, the GamesHowell test is more adequate (pingouin.pairwise_gameshowell()
).The default effect size reported in Pingouin is the partial etasquare, which, for oneway ANOVA is the same as etasquare and generalized etasquare.
\[\eta_p^2 = \frac{SS_{\text{effect}}}{SS_{\text{effect}} + SS_{\text{error}}}\]Note that missing values are automatically removed. Results have been tested against R, Matlab and JASP.
Warning
Versions of Pingouin below 0.2.5 gave wrong results for unbalanced Nway ANOVA. This issue has been resolved in Pingouin>=0.2.5. In such cases, the ANOVA is calculated via an internal call to the statsmodels package.
Examples
Oneway ANOVA
>>> import pingouin as pg >>> df = pg.read_dataset('anova') >>> aov = pg.anova(dv='Pain threshold', between='Hair color', data=df, ... detailed=True) >>> aov Source SS DF MS F punc np2 0 Hair color 1360.726316 3 453.575439 6.791407 0.004114 0.575962 1 Within 1001.800000 15 66.786667 NaN NaN NaN
Same but using a standard etasquared instead of a partial etasquared effect size. Also note how here we’re using the anova function directly as a method (= builtin function) of our pandas dataframe. In that case, we don’t have to specify
data
anymore.>>> df.anova(dv='Pain threshold', between='Hair color', detailed=False, ... effsize='n2') Source ddof1 ddof2 F punc n2 0 Hair color 3 15 6.791407 0.004114 0.575962
Twoway ANOVA with balanced design
>>> data = pg.read_dataset('anova2') >>> data.anova(dv="Yield", between=["Blend", "Crop"]).round(3) Source SS DF MS F punc np2 0 Blend 2.042 1 2.042 0.004 0.952 0.000 1 Crop 2736.583 2 1368.292 2.525 0.108 0.219 2 Blend * Crop 2360.083 2 1180.042 2.178 0.142 0.195 3 Residual 9753.250 18 541.847 NaN NaN NaN
Twoway ANOVA with unbalanced design (requires statsmodels)
>>> data = pg.read_dataset('anova2_unbalanced') >>> data.anova(dv="Scores", between=["Diet", "Exercise"], ... effsize="n2").round(3) Source SS DF MS F punc n2 0 Diet 390.625 1.0 390.625 7.423 0.034 0.433 1 Exercise 180.625 1.0 180.625 3.432 0.113 0.200 2 Diet * Exercise 15.625 1.0 15.625 0.297 0.605 0.017 3 Residual 315.750 6.0 52.625 NaN NaN NaN
Threeway ANOVA, type 3 sums of squares (requires statsmodels)
>>> data = pg.read_dataset('anova3') >>> data.anova(dv='Cholesterol', between=['Sex', 'Risk', 'Drug'], ... ss_type=3).round(3) Source SS DF MS F punc np2 0 Sex 2.075 1.0 2.075 2.462 0.123 0.049 1 Risk 11.332 1.0 11.332 13.449 0.001 0.219 2 Drug 0.816 2.0 0.408 0.484 0.619 0.020 3 Sex * Risk 0.117 1.0 0.117 0.139 0.711 0.003 4 Sex * Drug 2.564 2.0 1.282 1.522 0.229 0.060 5 Risk * Drug 2.438 2.0 1.219 1.446 0.245 0.057 6 Sex * Risk * Drug 1.844 2.0 0.922 1.094 0.343 0.044 7 Residual 40.445 48.0 0.843 NaN NaN NaN