pingouin.cochran#

pingouin.cochran(data=None, dv=None, within=None, subject=None)[source]#

Cochran Q test. A special case of the Friedman test when the dependent variable is binary.

Parameters:

datapandas.DataFrame: DataFrame. Both wide and long-format dataframe are supported for this test.
dvstring: Name of column containing the dependent variable (only required if data is in long format).
withinstring: Name of column containing the within-subject factor (only required if data is in long format). Two or more within-factor are not currently supported.
subjectstring: Name of column containing the subject/rater identifier (only required if data is in long format).

Returns:

statspandas.DataFrame

'Q': The Cochran Q statistic
'p_unc': Uncorrected p-value
'dof': degrees of freedom

Notes

The Cochran Q test [1] is a non-parametric test for ANOVA with repeated measures where the dependent variable is binary.

The Q statistics is defined as:

\[Q = \frac{(r-1)(r\sum_j^rx_j^2-N^2)}{rN-\sum_i^nx_i^2}\]

where \(N\) is the total sum of all observations, \(j=1,...,r\) where \(r\) is the number of repeated measures, \(i=1,...,n\) where \(n\) is the number of observations per condition.

The p-value is then approximated using a chi-square distribution with \(r-1\) degrees of freedom:

\[Q \sim \chi^2(r-1)\]

Data are expected to be in long-format. Missing values are automatically removed using a strict listwise approach (= complete-case analysis). In other words, any subject with one or more missing value(s) is completely removed from the dataframe prior to running the test.

References

[1]

Cochran, W.G., 1950. The comparison of percentages in matched samples. Biometrika 37, 256–266. https://doi.org/10.1093/biomet/37.3-4.256

Examples

Compute the Cochran Q test for repeated measurements.

>>> from pingouin import cochran, read_dataset
>>> df = read_dataset("cochran")
>>> cochran(data=df, dv="Energetic", within="Time", subject="Subject")
        Source  dof         Q     p_unc
cochran   Time    2  6.705882  0.034981

Same but using a wide-format dataframe

>>> df_wide = df.pivot_table(index="Subject", columns="Time", values="Energetic")
>>> cochran(df_wide)
         Source  dof         Q     p_unc
cochran  Within    2  6.705882  0.034981