Multivariate analysis of electrophysiological data

Background 

this tutorial objective is to intruduce the new method for  multivariate analysis of electrophysiological data. the aim is to find the task related feature which allows the prediction of to which task trials belongs.

多变量分析的主要目的是对事件相关信号进行分类

it is very different with the classical statistical testing where the features are identified by pooling the mutiple subjects or where the subjects are considered to be independent

这个经典统计分析不同的,经典统计分析主要是池化多个样本来进行属性分类 或者具有独立属性的单独个体

in this tutorial you will learn how the do the classification with basic alogrithms such as support vertor machine , and you will learn the improtance of regularization .

 

Multivariate analysis method has been an alternative approach to the EEG data analysis, which could allow statements to be made based on the content information on the single trial , this method also are heavily used for the multivariate analysis for the funcitional neuroimaging. by this way this method has used a extenal machine learning toolbox to handle the classify .

 

Dataset

this dataset has been design for brian computer interface, a 275 channel MEG signals are acquired while the subject are instructed with a certrally presented cueto attend to the left or right direction.  the experiment design is that if we could predict on a single trial level to wich ondition to the signal trial belong.

 

Preprocessing

this dataset has been preprocessed the ft_preprocessing (segemented in the trials of interests), the data has been detrended and downloadsampled to 300HZ,  the trial starts at a cue offset and ends at 2.5 second later.  artifical rejection is out of this preprocessing

in following we will work our way though the time and frequecy domain analysis pipelines

first load the data:

load covvt.mat
数据内容

微信截图_20170112150907

274个channel  127个trials  每段信号是2.5秒长  也就是901个data points

数据都已经被剪切好,也就是cue开始后的2.5秒的数据

 

 

微信截图_20170112150941

 

这个是channel的label

 

微信截图_20170112150951

 

这个是对应的时间点

微信截图_20170112151002

 

 

cfg             = [];
cfg.parameter   = 'trial';
cfg.keeptrials  = 'yes'; % classifiers operate on individual trials 这个需要说明何时需要kepp,对单个trial进行操作都需要keep,比如做单个tril的分类 ,对于一些需求求平均的就可以不keep.
cfg.channel     = {'MLO' 'MRO'}; % occipital channels only  选择occipital的区域,没有选择C 后面正面确实是O效果好
tleft   = ft_timelockanalysis(cfg,left);  进行时序分析
tright  = ft_timelockanalysis(cfg,right); 进行时序分析

After time lock analysis, we could do  the timelockstatistics, in fieldtrip , there are four main kinds methods: ‘montecarlo’  ‘analytic’   ‘stats’     ‘crossvalidate’ , the first three are the estimates of the significance probabilities and/or critical values, the last one is the kind of new to do the prediction of Performance,

cfg         = [];
cfg.layout  = 'CTF275.lay';
cfg.method  = 'crossvalidate';crossvalidate five folders to evalute the results 




Herein we need specify the design matrix which could be considered as a testdata, this is simply vector with 1 label for the trials who belong to the trial left and with2 label who belong to the trial right

cfg.degisn=[ones(size(tleft.trial,1),1);2*ones(size(tright.trial,1),1)]

这里其实是设计了一个结果检测对象,把每个trial对应的方向设为1或者2 后续预测的时候就可以计算准确率

let us focus on the last segement of data 这里只取得了最后0.5秒的数据

cfg.latency     = [2.0 2.5];

Now we have to the call the ft_timelockstatistics() to do the classification procedure 
there are two steps :
first this function will standardlize the data (subtract the mean and divison with the standard deviation)
second the SVM will le called to do the classification 

stat = ft_timelockstatistics(cfg,tleft,tright);

please be attention to last code ,we have put the tleft followed by tright as we had designed the degisn mastrix. 

stat.statistic

now we see the reuslt in the stat.statistic 

微信截图_20170112153723这里注意的是有个binomial 其实就是p值, 因为p<0.05 所以 我们的到的结果意义是显著的

省了很多事情

当然 我们需要更准确的数据  我们可以
cfg.statistic = {'accuracy' 'binomial' 'contingency'};

stat = ft_timelockstatistics(cfg,tleft,tright);
stat.statistic.contingency

微信截图_20170112154317
通过上述表格 我们清楚的知道分类错误的具体情况
另外通过测试,增加MRC channel和增加时间段 [0 2.5]准确率大幅度提升
微信截图_20170116152746

Sensor level classification on the frequency domain

Now we are trying to do the classification on the frequency domain, first we need to do the frequency anslysis to fit the input data, let us focus on the alpha band at the end of the attention period 

cfg              = [];
cfg.output       = 'pow';输出是功率谱
cfg.method       = 'mtmconvol'; multiple tape 卷积
cfg.taper        = 'hanning';汉宁窗
cfg.foi          = 5:2:12;  只关注5到12hz的信息
cfg.t_ftimwin    = ones(length(cfg.foi),1).*0.5; time window 
cfg.channel      = {'MLO' 'MRO'}; %只选择Occuapital
cfg.toi          = 2.0:0.1:2.5; 时间
cfg.keeptrials   = 'yes'; % keep trials 一定要注意 因为你的操作是基于单个trial 所以一定保留,否则只有平均数了

tfright=ft_frequencyanalysis(cfg,right);
tfleft=ft_frequencyanalysis(cfg,left);


then we call frequencystatics with crossvalidation as our method 

cfg=[];
cfg.method='crossvalidate';
cfg.layout='CTF275.lay';
cfg.design=[zeros(size(right.trial,1),1),2*zeros(size(left.trial,1),1)];

stat        = ft_freqstatistics(cfg,tfrleft,tfrright);
 微信截图_20170116151740这里需要注意的是alpha band的取值范围有原来8-14 修改为 5-12  结果提升了5% 可见频率对其影响比较大