pingouin.anova

pingouin.
anova
(data=None, dv=None, between=None, ss_type=2, detailed=False)[source] Oneway and Nway ANOVA.
 Parameters
 data
pandas.DataFrame
DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.
 dvstring
Name of column in
data
containing the dependent variable. betweenstring or list with N elements
Name of column(s) in
data
containing the betweensubject factor(s). Ifbetween
is a single string, a oneway ANOVA is computed. Ifbetween
is a list with two or more elements, a Nway ANOVA is performed. Note that Pingouin will internally call statsmodels to calculate ANOVA with 3 or more factors, or unbalanced twoway ANOVA. ss_typeint
Specify how the sums of squares is calculated for unbalanced design with 2 or more factors. Can be 1, 2 (default), or 3. This has no impact on oneway design or Nway ANOVA with balanced data.
 detailedboolean
If True, return a detailed ANOVA table (default True for Nway ANOVA).
 data
 Returns
 aovDataFrame
ANOVA summary
'Source' : Factor names 'SS' : Sums of squares 'DF' : Degrees of freedom 'MS' : Mean squares 'F' : Fvalues 'punc' : uncorrected pvalues 'np2' : Partial etasquare effect sizes
See also
rm_anova
Oneway and twoway repeated measures ANOVA
mixed_anova
Two way mixed ANOVA
welch_anova
Oneway Welch ANOVA
kruskal
Nonparametric oneway ANOVA
Notes
The classic ANOVA is very powerful when the groups are normally distributed and have equal variances. However, when the groups have unequal variances, it is best to use the Welch ANOVA (
pingouin.welch_anova()
) that better controls for type I error (Liu 2015). The homogeneity of variances can be measured with thepingouin.homoscedasticity()
function.The main idea of ANOVA is to partition the variance (sums of squares) into several components. For example, in oneway ANOVA:
\[SS_{total} = SS_{treatment} + SS_{error}\]\[SS_{total} = \sum_i \sum_j (Y_{ij}  \overline{Y})^2\]\[SS_{treatment} = \sum_i n_i (\overline{Y_i}  \overline{Y})^2\]\[SS_{error} = \sum_i \sum_j (Y_{ij}  \overline{Y}_i)^2\]where \(i=1,...,r; j=1,...,n_i\), \(r\) is the number of groups, and \(n_i\) the number of observations for the \(i\) th group.
The Fstatistics is then defined as:
\[F^* = \frac{MS_{treatment}}{MS_{error}} = \frac{SS_{treatment} / (r  1)}{SS_{error} / (n_t  r)}\]and the pvalue can be calculated using a Fdistribution with \(r1, n_t1\) degrees of freedom.
When the groups are balanced and have equal variances, the optimal posthoc test is the TukeyHSD test (
pingouin.pairwise_tukey()
). If the groups have unequal variances, the GamesHowell test is more adequate (pingouin.pairwise_gameshowell()
).The effect size reported in Pingouin is the partial etasquare. However, one should keep in mind that for oneway ANOVA partial etasquare is the same as etasquare and generalized etasquare. For more details, see Bakeman 2005; Richardson 2011.
\[\eta_p^2 = \frac{SS_{treatment}}{SS_{treatment} + SS_{error}}\]Note that missing values are automatically removed. Results have been tested against R, Matlab and JASP.
Warning
Versions of Pingouin below 0.2.5 gave wrong results for unbalanced Nway ANOVA. This issue has been resolved in Pingouin>=0.2.5. In such cases, the ANOVA is calculated via an internal call to the statsmodels package.
References
 1
Liu, Hangcheng. “Comparing Welch’s ANOVA, a KruskalWallis test and traditional ANOVA in case of Heterogeneity of Variance.” (2015).
 2
Bakeman, Roger. “Recommended effect size statistics for repeated measures designs.” Behavior research methods 37.3 (2005): 379384.
 3
Richardson, John TE. “Eta squared and partial eta squared as measures of effect size in educational research.” Educational Research Review 6.2 (2011): 135147.
Examples
Oneway ANOVA
>>> import pingouin as pg >>> df = pg.read_dataset('anova') >>> aov = pg.anova(dv='Pain threshold', between='Hair color', data=df, ... detailed=True) >>> aov Source SS DF MS F punc np2 0 Hair color 1360.726 3 453.575 6.791 0.00411423 0.576 1 Within 1001.800 15 66.787   
Note that this function can also directly be used as a Pandas method
>>> df.anova(dv='Pain threshold', between='Hair color', detailed=True) Source SS DF MS F punc np2 0 Hair color 1360.726 3 453.575 6.791 0.00411423 0.576 1 Within 1001.800 15 66.787   
Twoway ANOVA with balanced design
>>> data = pg.read_dataset('anova2') >>> data.anova(dv="Yield", between=["Blend", "Crop"]).round(3) Source SS DF MS F punc np2 0 Blend 2.042 1 2.042 0.004 0.952 0.000 1 Crop 2736.583 2 1368.292 2.525 0.108 0.219 2 Blend * Crop 2360.083 2 1180.042 2.178 0.142 0.195 3 Residual 9753.250 18 541.847 NaN NaN NaN
Twoway ANOVA with unbalanced design (requires statsmodels)
>>> data = pg.read_dataset('anova2_unbalanced') >>> data.anova(dv="Scores", between=["Diet", "Exercise"]).round(3) Source SS DF MS F punc np2 0 Diet 390.625 1.0 390.625 7.423 0.034 0.553 1 Exercise 180.625 1.0 180.625 3.432 0.113 0.364 2 Diet * Exercise 15.625 1.0 15.625 0.297 0.605 0.047 3 Residual 315.750 6.0 52.625 NaN NaN NaN
Threeway ANOVA, type 3 sums of squares (requires statsmodels)
>>> data = pg.read_dataset('anova3') >>> data.anova(dv='Cholesterol', between=['Sex', 'Risk', 'Drug'], ... ss_type=3) Source SS DF MS F punc np2 0 Sex 2.075 1.0 2.075 2.462 0.123191 0.049 1 Risk 11.332 1.0 11.332 13.449 0.000613 0.219 2 Drug 0.816 2.0 0.408 0.484 0.619249 0.020 3 Sex * Risk 0.117 1.0 0.117 0.139 0.710541 0.003 4 Sex * Drug 2.564 2.0 1.282 1.522 0.228711 0.060 5 Risk * Drug 2.438 2.0 1.219 1.446 0.245485 0.057 6 Sex * Risk * Drug 1.844 2.0 0.922 1.094 0.343041 0.044 7 Residual 40.445 48.0 0.843 NaN NaN NaN