matlab⽣物,matlab-⽣物信息⼯具箱bioinformatics tool学习之
系统发⽣分析 -开始太傻了!!
1、⾸先建⽴malab结构,将要分析的各物种的信息输⼊:
>>data =
{'German_Neanderthal' 'AF011222';
'Russian_Neanderthal' 'AF254446';
'European_Human' 'X90314' ;
'Mountain_Gorilla_Rwanda' 'AF089820';
'Chimp_Troglodytes' 'AF176766';
};%建⽴结构
2、开始准备序列,以备下⾯分析:
>>for ind = 1:5
%设置循环,因为上⾯结构⼀共五个物种
seqs(ind).Header =
data{ind,1};%⼀样设置为结构
seqs(ind).Sequence = getgenbank(data{ind,2},...
'sequenceonly', true);
end
3、这⼉就要讲⼀下,如何进⾏序列⽐对了:
D =
seqpdist(Seqs)%可以是细胞数组,或者是包含序列结构向量,或者是矩阵,每⼀⾏⼀条序列;
D = seqpdist(Seqs, ...'Method',
MethodValue, ...)% p距离或者(Default is
Jukes-Cantor)或者序列⽐对得分。
D = seqpdist(Seqs, ...'Indels',
IndelsValue, ...)% 插⼊缺失突变是否算分
D = seqpdist(Seqs, ...'Optargs', OptargsValue, ...)
D = seqpdist(Seqs, ...'PairwiseAlignment', PairwiseAlignmentValue,
...)%全局序列⽐对,长度不⼀则为TRUE,否则为FALSE
D = seqpdist(Seqs, ...'JobManager', JobManagerValue,
...)%需要进⾏Parallel Computing Toolbox分析
D = seqpdist(Seqs, ...'WaitInQueue', WaitInQueueValue, ...)
D = seqpdist(Seqs, ...'SquareForm', SquareFormValue,
...)%是否把结果输出⽅阵
D = seqpdist(Seqs, ...'Alphabet', AlphabetValue, ...)%'NT'标⽰序列为核酸or 蛋⽩'AA'
D = seqpdist(Seqs, ...'ScoringMatrix', ScoringMatrixValue,
...)
% 算分矩阵
*
'PAM40'
*
'PAM250'
*
tool工具箱'DAYHOFF'
*
'GONNET'
*
'BLOSUM30' increasing by 5 up to 'BLOSUM90'
*
'BLOSUM62'
*
'BLOSUM100'
默认Default is:
*
'NUC44' (when AlphabetValue equals 'NT')
*
'BLOSUM50' (when AlphabetValue equals 'AA')
D = seqpdist(Seqs, ...'Scale', ScaleValue, ...)
D = seqpdist(Seqs, ...'GapOpen', GapOpenValue, ...)%空位罚分,默认为8
D = seqpdist(Seqs, ...'ExtendGap', ExtendGapValue, ...)
3、对序列⽐对的距离进⾏构树,
>>tree =
seqlinkage(distances,'UPGMA',seqs)
%Tree = seqlinkage(Dist, Method,
Names)
⽅法如下:
Names来描述分⽀点。
%'single'
Nearest distance (single linkage
method)
%'complete'
Furthest distance (complete
linkage method)
%'average' (default)
Unweighted Pair Group Method
Average (UPGMA, group average).
%'weighted'
Weighted Pair Group Method Average (WPGMA)
%'centroid'
Unweighted Pair Group Method
Centroid (UPGMC)
%'median'
Weighted Pair Group Method
Centroid (WPGMC)
4、画出系统发育树
>>h
= plot(tree,'orient','bottom');
>>ylabel('Evolutionary
distance')
>>inalNodeLabels,'Rotation',-45) 5、下⾯加上7条序列
>>data2 =
{'Puti_Orangutan' 'AF451972';
'Jari_Orangutan' 'AF451964';
'Western_Lowland_Gorilla' 'AY079510';
'Eastern_Lowland_Gorilla' 'AF050738';
'Chimp_Schweinfurthii' 'AF176722';
'Chimp_Vellerosus' 'AF315498';
'Chimp_Verus' 'AF176731';
};
6、获取序列
>>for ind =
1:7
seqs(ind+5).Header =
data2{ind,1};
seqs(ind+5).Sequence = getgenbank(data2{ind,2},... 'sequenceonly', true);
end
7、同上操作
>>distances =
seqpdist(seqs,'Method','Jukes-Cantor','Alpha','DNA'); >>tree =
seqlinkage(distances,'UPGMA',seqs);
8、同上画图
>>h
= plot(tree,'orient','bottom');
>>ylabel('Evolutionary
distance')
>>inalNodeLabels,'Rotation',-45)
9、罗列系统发育树的名称
>>names =
get(tree,'LeafNames')
10、选择输出
>>[h_all,h_leaves]
= select(tree,'reference',3,...
'criteria','distance',...
'threshold',0.6);
11、去除⽆⽤的枝杈
>>leaves_to_prune =
~h_leaves;
pruned_tree = prune(tree,leaves_to_prune)
h = plot(pruned_tree,'orient','bottom');
ylabel('Evolutionary distance')
inalNodeLabels,'Rotation',-30) 12、显⽰最终结果
phytreetool(pruned_tree)

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。