miRNA结合位点预测软件RNAhybrid的使⽤教程
RNAhybrid的介绍
RNAhybrid是Behmsmeier M等基于miRNA和靶基因⼆聚体⼆级结构开发的miRNA靶基因预测软件。RNAhybrid预测算法禁⽌分⼦内、miRNA分⼦间及靶基因间形成⼆聚体,根据miRNA和靶基因间结合能探测最佳的靶位点。尽管随着靶基因序列长度增加,运算复杂度也相应增加,但RNAhybrid和其它RNA⼆级结构预测软件诸如mfold, RNAfold, RNAcofold和pairfold相⽐,仍具有明显的速度优势。此外,RNAhybrid允许⽤户⾃定义⾃由能阈值及p值,也允许⽤户设置杂交位点的偏向,如杂交位点必须包含miRNA 5’端2-7nt等。
1.RNAhybrid的下载与安装
1wget bitec.uni-bielefeld.de/applications/rnahybrid/resources/downloads/RNAhybrid-2.1.
2tar -xzvf RNAhybrid-2.1.
3 cd /path/to/ RNAhybrid-2.1.2
4 ./configure
5 sudo make #这⾥尽量使⽤管理员模式,不然容易出错
6 sudo make install
验证是否安装成功,可以输⼊which RNAhybrid,如显⽰地址,则安装成功,以下是⽤win10下的WSL下的ubuntu做的⽰范:
2.输⼊⽂件的准备
1.target sequence(s)
This contains one or more sequences that are used by RNAhybrid to hybridize the miRNA(s) on. RNAhybrid uses all this sequences to find minimal free energy hybridisations between miRNA(s) and target sequence(s). Sequences should be in RNA.fasta format but RNAhybrid can also use DNA.fasta files. A single Sequences one can use can contain up to 50000 basepairs.
2.miRNA sequence(s)
contains one or more micro RNA(s) that RNAhybrid uses to hybridize with the RNA sequences and to find the minimal free energy hybridization. A single micro RNA sequence can contain up to 2000 basepairs.
3.RNAhybrid的使⽤
Usage: RNAhybrid [options] [target sequence] [query sequence].
options:
-b <number of hits per target> #意思是⼀个miRNA和⼀个target sequence的某⼀段序列匹配情况最多列出⼏次,⽐如⼀个miRNA和⼀个target sequence的某⼀段序列匹配存在多种情况,则-b 1就是列出最优的匹配情况,⼀般选1就⽐较好。这个最终得到的数⽬也与<energy cut-off>的设定值有关。
-c compact output #使⽤这个参数,每⼀个匹配只会显⽰⼀⾏输出。如果只想知道结果是否与RNAhybrid校准的结果相同,建议使⽤这个参数。
-d <xi>,<theta> #位置和形状参数
-f helix constraint #
-h help
-m <max targetlength>
-
n <max query length>
-u <max internal loop size (per side)> #内部成环的错配碱基的个数,使⽤-u 0,将得到完全没有错配碱基内部成环的结构。
-v <max bulge loop size> #internal loop是两条链都没有结合位点的内部环,⽽bulge loop是某⼀条上多出的碱基的突出
-e <energy cut-off> #两条序列匹配的最低⾃由能,先设置 -e -30看看效果。
-p <p-value cut-off>
-s (3utr_fly|3utr_worm|3utr_human) #⽤于极值分布参数的快速估计,你可以选择nothing,3utr_fly, 3utr_worm和3utr_human来更好的匹配这些物种。你不能同时使⽤helix constrain和approximate p-value这两个参数。
-g (ps|png|jpg|all) #图⽚输出的格式,有ps,png,jpg或者all四个选项
-t <target file> #fasta格式的target gene⽂件
-q <query file> #fasta格式的miRNA⽂件
Either a target file has to be given (FASTA format)
or one target sequence directly.idea开发安卓app教程
Either a query file has to be given (FASTA format)
or one query sequence directly.
The helix constraint format is "from,to", eg. -f 2,7 forces
structures to have a helix from position 2 to 7 with respect to the query.
<xi> and <theta> are the position and shape parameters, respectively,
of the extreme value distribution assumed for p-value calculation.
If omitted, they are estimated from the maximal duplex energy of the query.
In that case, a data set name has to be given with the -s flag.
PS graphical output not supported.
PNG and JPG graphical output not supported.
Name Description
helix constraint from Forces all structures to have a helix from position a to position b in respect to the query. The first base has position 1. The parameter "Helix constrain from" has to be lower or equal to the parameter "Helix constraint to". You can not use Helix constraint and approximate p-values at the same time.
hits per target This Parameter defines how many hits are shown by RNAhybrid. The hits are shown by increasing minimal free energy ( the lower the energy the better the result)
Compact output When this parameter is used RNAhybrid gives you only one line of output instead of the whole output it normally generates.
Generate graphics Generates a graphical representation of the output in jpg, png and ps format, if less than 6 hits choosen. If RNAhybrid breaks with an unexpected error, it is often a good idea not to enable the graphical representation generation.
Max internal
loop length
The maximal number of unpaired nucleotides in either side of an internal loop.
Shows the hits with all minimal free energy's lower then the threshold (the lower the result the better). The value has to be lower or equal to zero.
energy Threshold Shows the hits with all minimal free energy's lower then the threshold (the lower the result the better). The value has to be lower or equal to zero. Notice that the output only shows the results that exceed the energy threshold or the maximal hits per target.
Max bulge
loop length
the maximal number of unpaired nucleotides in a bulge loop.
No G:U in seed If you click on this you choose weather their are no G:U bindings allowed in the seed or not. This parameter can only be chosen if you also use the parameters "Helix constraint from" and "Helix constraint to".
helix
constraint to
see helix constraint this is position b you have to use both parameters to use Helix constraints.
approximate p-value Used for a quick estimate of extreme value distribution Parameters. You can choose between nothing, 3utr_fly, 3utr_worm and 3utr_human for better equitation within these species. You can not use Helix constraint and approximate p-values at the same time.
4.RNAhybrid进⾏⼈miRNA的靶位点预测的条件
1.miRNA的第8到12个碱基和circRNA的必须是完全配对的,这⾥需要设置的参数是-f helix constraint,也就是设置-f 8,12
2.是指上下两条链都错配形成的错配环,这种错配环中任何⼀条链的错配碱基不能超过1个,这⾥需要设置的参数是-u <max internal loop size (per side)> ,也就是设置-u 1
3.突出环即⼀条链多出了⼀个碱基的突出,这种突出环最多突出⼀个碱基,这⾥需要设置的参数是-v <max bulge loop size> ,也就是设置-v 1
4.允许G:U配对,默认的参数是允许G:U配对,你也可以设置no G:U in seeds来设置不允许G:U配对
5.末端未配对的突出不能超过两个碱基
6.不允许存在连续3个碱基的错配
7.总数不超过4个碱基的错配
1 RNAhybrid -g jpg -b 1 -e -20 -f 8,1
2 -u 1 -v 1 -s 3utr_human -t SFTSV_24vscontrol_DEcircBase.fa -q hsa_miRNA.fa>SFTSV_24bscontrol_circRNA_miRNA_RNAhybrid #输出会直接打印在终端⾥,所以建议你在终端以 “>" 输出保存为⼀个⽂件
RNAhybrid产⽣的结果中,设置了-g jpg但是没有产出jpg⽂件,不知道为什么
这⾥产⽣的结果需整理成circRNA miRNA格式的包含⾏名为circRNA和miRNA的数据框,然后⽤cytoscape做ceRNA⽹络图。
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论