The Multiple Comparisons Problem
fMRI
An important issue in fMRI data analysis is the specification of an appropriate threshold for statistical maps. If there would be only a single voxel’s data, a conventional threshold of p < 0.05 (or p < 0.01) could be used, which would indicate the probability to obtain the observed effect, quantified by an R, t or F statistic, solely due to noise fluctuations. Running the statistical analysis separately for each voxel creates, however, a massive multiple comparisons (MCP) problem. If a single test is performed, the conventional threshold protects us with a probability of p < 0.05 from wrongly declaring a voxel as significantly modulated when there is no effect (alpha error). Note that an error probability of p = 0.05 means that if we would repeat the same test 100 times and assume that there is no effect (null hypothesis), we would wrongly reject (accept) the null (alternative) hypothesis on average in five cases (false positive, alpha error). If we assume that there is no real effect in any voxel time course, running a statistical test spatially in parallel is statistically identical to repeating the test 100,000 times for a single voxel. It is evident that this would lead to about 5000 false positives, i.e. about 5000 voxels would be labeled "significant" although they would surpass the 0.05 threshold purely due to chance.

MEEG

In cognitive experiments, the data is usually collected in different experimental conditions, and the experimenter wants to know whether there is a difference in the data observed in these conditions. In statistics, a result (for example, a difference among conditions) is statistically significant if it is unlikely to have occurred by chance according to a predetermined threshold probability, the significance level.

An important feature of the MEG and EEG data is that it has a spatial temporal structure, i.e. the data is sampled at multiple time-points and sensors. The nature of the data influences which kind of statistics is the most suitable for comparing conditions. If the experimenter is interested in a difference in the signal at a certain time-point and sensor, then the more widely used parametric tests are also sufficient. If it is not possible to predict where the differences are, then many statistical comparisons are necessary which lead to the multiple comparisons problem (MCP). The MCP arises from the fact that the effect of interest (i.e., a difference between experimental conditions) is evaluated at an extremely large number of (channel,time)-pairs. This number is usually in the order of several thousands. Now, the MCP involves that, due to the large number of statistical comparisons (one per (channel,time)-pair), it is not possible to control the so called family-wise error rate (FWER) by means of the standard statistical procedures that operate at the level of single (channel,time)-pairs. The FWER is the probability, under the hypothesis of no effect, of falsely concluding that there is a difference between the experimental conditions at one or more (channel,time)-pairs. A solution of the MCP requires a procedure that controls the FWER at some critical alpha-level (typically, 0.05 or 0.01). The FWER is also called the false alarm rate.