The Multiple Comparisons Problem
An important issue in fMRI data analysis is the specification of an appropriate threshold for statistical maps. If there would be only a single voxel’s data, a conventional threshold of p < 0.05 (or p < 0.01) could be used, which would indicate the probability to obtain the observed effect, quantified by an R, t or F statistic, solely due to noise fluctuations. Running the statistical analysis separately for each voxel creates, however, a massive multiple comparisons (MCP) problem. If a single test is performed, the conventional threshold protects us with a probability of p < 0.05 from wrongly declaring a voxel as significantly modulated when there is no effect (alpha error). Note that an error probability of p = 0.05 means that if we would repeat the same test 100 times and assume that there is no effect (null hypothesis), we would wrongly reject (accept) the null (alternative) hypothesis on average in five cases (false positive, alpha error). If we assume that there is no real effect in any voxel time course, running a statistical test spatially in parallel is statistically identical to repeating the test 100,000 times for a single voxel. It is evident that this would lead to about 5000 false positives, i.e. about 5000 voxels would be labeled "significant" although they would surpass the 0.05 threshold purely due to chance.

对不起我要用中文了,这个问题困扰我很久, 关于mcp的问题,每次都是看到基因检测的解释,但是对于我们这些脑成像的人来讲 理解起来非常困难。现在对MEEG/FRMI的数据进行解释下为什么要做MCP
对于frmi我们是给予对于voxels的统计,如果只计算一个voxel的对比,你可以使用p<0.05或者0.01 不用校正, 这里的p值指的是重复一百次这种实验,出现的概率不会超过五次。 但是我们使用的大脑模型可能10万个voxels 相当于同时统计了十万个subjects,如果还是按照0.05的阈值,什么概念? 有五千个错误,5000个不显著 被划归为显著,这里面的关键问题是 对于voxels的态度,很多次我很郁闷 我只比较两个不同condition的某个位置的voxel为啥还要校正, 这里我应该是理解错了, 对于统计方法而言,它眼里只是十万个不同的subject在做比较, If we assume that there is no real effect in any voxel time course, running a statistical test spatially in parallel is statistically identical to repeating the test 100,000 times for a single voxel. 意思就是 如果同步对10万个voxels做并行统计,就相当于对单个voxels重复10万次实验,这个时候的p值是致命关键的。


In cognitive experiments, the data is usually collected in different experimental conditions, and the experimenter wants to know whether there is a difference in the data observed in these conditions. In statistics, a result (for example, a difference among conditions) is statistically significant if it is unlikely to have occurred by chance according to a predetermined threshold probability, the significance level.

An important feature of the MEG and EEG data is that it has a spatial temporal structure, i.e. the data is sampled at multiple time-points and sensors. The nature of the data influences which kind of statistics is the most suitable for comparing conditions. If the experimenter is interested in a difference in the signal at a certain time-point and sensor, then the more widely used parametric tests are also sufficient. If it is not possible to predict where the differences are, then many statistical comparisons are necessary which lead to the multiple comparisons problem (MCP). The MCP arises from the fact that the effect of interest (i.e., a difference between experimental conditions) is evaluated at an extremely large number of (channel,time)-pairs. This number is usually in the order of several thousands. Now, the MCP involves that, due to the large number of statistical comparisons (one per (channel,time)-pair), it is not possible to control the so called family-wise error rate (FWER) by means of the standard statistical procedures that operate at the level of single (channel,time)-pairs. The FWER is the probability, under the hypothesis of no effect, of falsely concluding that there is a difference between the experimental conditions at one or more (channel,time)-pairs. A solution of the MCP requires a procedure that controls the FWER at some critical alpha-level (typically, 0.05 or 0.01). The FWER is also called the false alarm rate.

这里同样的道理,如果你知道哪里不同 可以用参数检验,但是我们做认知实验的时候,我们复杂的多维数据,而且并不知道差异在哪里,所以对于超过100个channle的数据来讲,如果同步对每个channel单独做检验,就相当于对一个时间点做了 channlextime 次的重复测量统计,此时0.05的错误率已经变得不能接受, 所以必须要做mcp.