pingouin.anova

pingouin.anova(dv=None, between=None, data=None, detailed=False, export_filename=None)[source]

One-way and two-way ANOVA.

Parameters
dvstring

Name of column in data containing the dependent variable.

betweenstring or list with two elements

Name of column(s) in data containing the between-subject factor(s). If between is a single string, a one-way ANOVA is computed. If between is a list with two elements (e.g. [‘Factor1’, ‘Factor2’]), a two-way ANOVA is computed.

datapandas DataFrame

DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.

detailedboolean

If True, return a detailed ANOVA table (default True for two-way ANOVA).

export_filenamestring

Filename (without extension) for the output file. If None, do not export the table. By default, the file will be created in the current python console directory. To change that, specify the filename with full path.

Returns
aovDataFrame

ANOVA summary

'Source' : Factor names
'SS' : Sums of squares
'DF' : Degrees of freedom
'MS' : Mean squares
'F' : F-values
'p-unc' : uncorrected p-values
'np2' : Partial eta-square effect sizes

See also

rm_anova

One-way and two-way repeated measures ANOVA

mixed_anova

Two way mixed ANOVA

welch_anova

One-way Welch ANOVA

kruskal

Non-parametric one-way ANOVA

Notes

The classic ANOVA is very powerful when the groups are normally distributed and have equal variances. However, when the groups have unequal variances, it is best to use the Welch ANOVA (welch_anova) that better controls for type I error (Liu 2015). The homogeneity of variances can be measured with the homoscedasticity function.

The main idea of ANOVA is to partition the variance (sums of squares) into several components. For example, in one-way ANOVA:

\[SS_{total} = SS_{treatment} + SS_{error}\]
\[SS_{total} = \sum_i \sum_j (Y_{ij} - \overline{Y})^2\]
\[SS_{treatment} = \sum_i n_i (\overline{Y_i} - \overline{Y})^2\]
\[SS_{error} = \sum_i \sum_j (Y_{ij} - \overline{Y}_i)^2\]

where \(i=1,...,r; j=1,...,n_i\), \(r\) is the number of groups, and \(n_i\) the number of observations for the \(i\) th group.

The F-statistics is then defined as:

\[F^* = \frac{MS_{treatment}}{MS_{error}} = \frac{SS_{treatment} / (r - 1)}{SS_{error} / (n_t - r)}\]

and the p-value can be calculated using a F-distribution with \(r-1, n_t-1\) degrees of freedom.

When the groups are balanced and have equal variances, the optimal post-hoc test is the Tukey-HSD test (pingouin.pairwise_tukey()). If the groups have unequal variances, the Games-Howell test is more adequate (pingouin.pairwise_gameshowell()).

The effect size reported in Pingouin is the partial eta-square. However, one should keep in mind that for one-way ANOVA partial eta-square is the same as eta-square and generalized eta-square. For more details, see Bakeman 2005; Richardson 2011.

\[\eta_p^2 = \frac{SS_{treatment}}{SS_{treatment} + SS_{error}}\]

Note that missing values are automatically removed. Results have been tested against R, Matlab and JASP.

Important

Versions of Pingouin below 0.2.5 gave wrong results for unbalanced two-way ANOVA. This issue has been resolved in Pingouin>=0.2.5. In such cases, a type II ANOVA is calculated via an internal call to the statsmodels package. This latter package is therefore required for two-way ANOVA with unequal sample sizes.

References

1

Liu, Hangcheng. “Comparing Welch’s ANOVA, a Kruskal-Wallis test and traditional ANOVA in case of Heterogeneity of Variance.” (2015).

2

Bakeman, Roger. “Recommended effect size statistics for repeated measures designs.” Behavior research methods 37.3 (2005): 379-384.

3

Richardson, John TE. “Eta squared and partial eta squared as measures of effect size in educational research.” Educational Research Review 6.2 (2011): 135-147.

Examples

One-way ANOVA

>>> import pingouin as pg
>>> df = pg.read_dataset('anova')
>>> aov = pg.anova(dv='Pain threshold', between='Hair color', data=df,
...             detailed=True)
>>> aov
       Source        SS  DF       MS      F       p-unc    np2
0  Hair color  1360.726   3  453.575  6.791  0.00411423  0.576
1      Within  1001.800  15   66.787      -           -      -

Note that this function can also directly be used as a Pandas method

>>> df.anova(dv='Pain threshold', between='Hair color', detailed=True)
       Source        SS  DF       MS      F       p-unc    np2
0  Hair color  1360.726   3  453.575  6.791  0.00411423  0.576
1      Within  1001.800  15   66.787      -           -      -

Two-way ANOVA with balanced design

>>> data = pg.read_dataset('anova2')
>>> data.anova(dv="Yield", between=["Blend", "Crop"]).round(3)
         Source        SS  DF        MS      F  p-unc    np2
0         Blend     2.042   1     2.042  0.004  0.952  0.000
1          Crop  2736.583   2  1368.292  2.525  0.108  0.219
2  Blend * Crop  2360.083   2  1180.042  2.178  0.142  0.195
3      residual  9753.250  18   541.847    NaN    NaN    NaN

Two-way ANOVA with unbalanced design (requires statsmodels)

>>> data = pg.read_dataset('anova2_unbalanced')
>>> data.anova(dv="Scores", between=["Diet", "Exercise"]).round(3)
            Source       SS   DF       MS      F  p-unc    np2
0             Diet  390.625  1.0  390.625  7.423  0.034  0.553
1         Exercise  180.625  1.0  180.625  3.432  0.113  0.364
2  Diet * Exercise   15.625  1.0   15.625  0.297  0.605  0.047
3         residual  315.750  6.0   52.625    NaN    NaN    NaN