New concepts of multiple tests and their use for evaluating high-dimensional EEG data

https://doi.org/10.1016/j.jneumeth.2004.08.008Get rights and content

Abstract

Recently, new concepts of type I error control in multiple comparisons have been proposed, in addition to FWE and FDR control. We introduce these criteria and investigate in simulations how the powers of corresponding test procedures for multiple endpoints depend on various quantities such as number and correlation of endpoints, percentage of false hypotheses, etc. We applied the different multiple tests to EEG coherence data. We compared the memory encoding of subsequently recalled and not recalled nouns. The results show that subsequently recalled nouns elicited significantly higher coherence than not recalled ones.

Introduction

Modern procedures of EEG analysis yield large sets of high-dimensional parameters, which have to be evaluated statistically. Let k denote the dimension of the observations. This means there are k components, which are also called multiple endpoints. In Hemmelmann et al. (2004) we dealt with so-called global tests or multivariate tests which provide one joint statement on all k endpoints. We now consider procedures that provide a statement for each endpoint. Many authors use an α-level test for each single component or endpoint of the observational vector, see e.g. Rappelsberger and Petsche (1988). However, this practice results in a large number of false positive statements (false discoveries, type I errors). There exist several techniques to cope with this general drawback in multiple comparisons. Corresponding multiple tests will be considered in the present paper.

Our paper has the following aims: (a) to introduce both traditional and recently proposed concepts of error control in multiple comparisons, (b) to investigate corresponding multiple test procedures regarding their dependence on the dimension k, the fraction of false hypotheses and the correlation structure of the data and compare the powers of different methods, and (c) to demonstrate the use of different multiple tests in problems of multiple comparisons of coherence values obtained from EEG data recorded during the memory encoding of subsequently recalled or not recalled abstract nouns (Weiss et al., 2000).

The techniques we discuss are not specific to EEG data; they are equally applicable to the large data in MEG and fMRI.

Section snippets

Multiple tests and type I error control

As explained in Section 1, our observations are vectors of dimension k. Assume we have to compare paired samples or two independent samples. Let x = (x1, …, xk) and y = (y1, …, yk) denote the corresponding random vectors and (μx1,,μxk) and (μy1,,μyk) the respective means. Then, the individual null hypotheses to be tested are H1:μx1=μy1,,Hk:μxk=μyk. Tests for H1,,Hk are called multiple tests.

It can happen that one of the k hypotheses, say Hi, is rejected though it is true. Such an event is

Estimation of the FDR and P(Q > 0.1)

As mentioned in Section 2.1.2, BH and BKY control the FDR under the condition of independence of the test statistics or at least of positive regression dependency. In multiple endpoint problems, it is difficult to determine the correlation between the test statistics. However, it may be possible to estimate the correlation between the endpoints. (Of course, the correlation of the test statistics will be related to the correlation of the endpoints.) Thus, we investigated how the FDR of BH and

Discussion

We have introduced four criteria for controlling type I errors (three of them may be new for most readers) and derived the relationships between them, see Fig. 1.

Control of the FWE means to require that no type I error occurs no matter how large the number of hypotheses and the number of rejected hypotheses is. With high-dimensional data, this criterion is too strict. Therefore, the other criteria were proposed.

In our opinion, the requirement P(Q > γ) ≤ α provides the most reasonable criterion.

Acknowledgement

This work was supported by Interdisziplinäres Zentrum für Klinische Forschung Jena, Projekt 1.8, the Austrian Science Foundation (“Herta Firnberg”-project T127) and the German Science Foundation (SFB 360). We gratefully acknowledge the helpful comments of the two referees.

References (33)

  • Benjamini Y, Krieger A, Yekutieli D. Two staged linear step up FDR controlling procedure. Technical Report. Tel Aviv...
  • P.J. Durka et al.

    On the statistical significance of event-related EEG desynchonization and synchronization in the time–frequency plane

    IEEE Trans Biomed Eng

    (2004)
  • I. Einot et al.

    A study of the powers of several methods of multiple comparisons

    J Am Stat Assoc

    (1975)
  • J. Fell et al.

    Human memory formation is accompanied by rhinal–hippocampal coupling and decoupling

    Nat Neurosci

    (2001)
  • J. Fell et al.

    Rhinal-hippocampal theta coherence during declarative memory formation: interaction with gamma synchronization?

    Eur J Neurosci.

    (2003)
  • Hemmelmann C, Horn M, Reiterer S, Schack B, Süsse T, Weiss S. Multivariate tests for the evaluation of high-dimensional...
  • Cited by (28)

    • Systematic differences of gluteal muscle activation during overground and treadmill walking in healthy older adults

      2019, Journal of Electromyography and Kinesiology
      Citation Excerpt :

      Post-hoc statistical calculations for the mean amplitudes were corrected using the Bonferroni algorithm. To assess statistical differences between the grand averaged curves of the two walking conditions, paired t-tests were performed including Bonferroni-Holm correction for multiple testing in order to avoid the accumulation of a type-I statistical error (mainly due to the high number of time points (Hemmelmann et al., 2005)). In detail, for the k = 201 single time points of the time-normalized SEMG curves, the calculated p values were ranked in ascending order where p1 denotes the smallest and p201 the largest p level.

    • Effects of stimulus mode and ambient temperature on cerebral responses to local thermal stimulation: An EEG study

      2017, International Journal of Psychophysiology
      Citation Excerpt :

      The statistical analysis was performed on each frequency band and each EEG channel respectively. To account for multiple comparisons, false discovery rate (FDR) correction was applied to confirm significant effects on 4 frequency bands and 19 EEG channels (Hemmelmann et al., 2005). There were 76 subjective thermal scorings (19 subjects × 4 simulated blocks) in each session.

    • Trunk muscle amplitude-force relationship is only quantitatively influenced by control strategy

      2016, Journal of Biomechanics
      Citation Excerpt :

      T-tests were applied to compare the normalized data with expected values and to compare corresponding load levels between the control strategies. In order to address the internal dependency of data together with the multiple test problem the results of the T-tests were corrected by application of the Holm–Bonferroni procedure controlling the familywise error rate representing the probability of committing at least one type I error (Hemmelmann et al., 2005). The ANOVA of the MVC normalized data proved a strong influence owing to control strategies on all abdominal muscles, meanwhile back musculature seemed to be unaffected by control strategy (Table 3).

    • Effects of heel cushioning elements in safety shoes on muscle-physiological parameters

      2015, International Journal of Industrial Ergonomics
      Citation Excerpt :

      For the entire statistical analysis of the grand averaged curves we have to consider the possible accumulation of type 1 errors (Korn et al., 2004) and results therefore need to be corrected. For this we applied a stepwise procedure (Hemmelmann et al., 2005), which in the actual case of 0.5% time-accuracy (i.e. 201 relative time points) requires a critical p value of 0.000249. With this study being carried out on 10 subjects, the least accessible significance level is 0.002, so our data does not meet the required p value.

    • Structured multiplicity and confirmatory statistical analyses in pharmacodynamic studies using the quantitative electroencephalogram

      2011, Journal of Neuroscience Methods
      Citation Excerpt :

      On the other hand, if there was an unexpected delay in the drug effect, so that it could not be demonstrated at the anticipated time, applying the predefined analysis to a later time point may lend more credibility to the exploratory findings than if nothing was pre-specified at all. In this contribution, we did not discuss some alternative approaches to cope with multiplicity, such as the false discovery rate (see e.g. Hemmelmann et al., 2005) or statistical fields (Worsley et al., 1996). We consider both as out of scope of the present work, the former, because this method goes beyond the framework of confirmatory statistics and the latter, because the spatial resolution of the EEG data is probably too low for its application.

    View all citing articles on Scopus
    View full text