{include file='htmlhead.tpl'}
RNAsnp::Software
{include file='../../../wwwservers/templates/header.html'}
RNAsnp
Efficient detection of local RNA secondary structure changes induced by SNPs
Download
Installation
After downloading the RNAsnp package, you can follow the installation steps given below,
tar -xzvf RNAsnp-1.2.tar.gz
cd RNAsnp-1.2
./configure
make
make install
Note the step make install is optional. The RNAsnp can also be executed from
Progs/RNAsnp
RNAsnp requires an environment variable named RNASNPPATH to run. The location of
the RNAsnp-1.2 directory needs to be assigned to the RNASNPPATH variable. For
example in the bash terminal, this can be assigned as,
export RNASNPPATH='<PATH>/RNAsnp-1.2'
You can add this line to .bashrc file available in the home directory to avoid
executing the above command every time to start with RNAsnp on new terminal.
Usage
Summary
RNAsnp requires an RNA sequence and optionally a list of SNPs to be analyzed.
The effect of SNPs on local RNA secondary structure can be detected in three
possible modes,
- Mode 1: The first mode is designed to compute the effect of SNPs by
using global folding. This option should be used only for short input sequences, since
the base pair probabilities are calculated using RNAfold.
- Mode 2: The second mode is designed to compute the effect of SNPs on
large sequences. Here the local base pair probabilities are calculated using RNAplfold
(with the parameters -W 200 and -L 120).
- Mode 3: The third mode is the combination of the above two.
It is intended to determine the positions of putative structure-disruptive SNPs
using either transcript or genome sequence.
The programs RNAfold and RNAplfold are components of the Vienna RNA Package [1,2].
Strictly, the Mode 1 and 2 requires both RNA sequence and SNP as input, whereas
Mode 3 requires only the RNA sequence. In Mode 3, the program starts with Mode 2
to find the effect of all possible substitutions at each nucleotide position.
The SNPs with p-value less than 0.4 are further evaluated using Mode 1 and
report the SNPs which have p-value less than 0.1.
By default the program uses a window of 400nts, +/- 200nts around the SNP
position, to compute the base pairing probability in all the three Modes. This
default window length 200 can be changed between 100 and 800 in multiples of 50
for Mode 1, and between 200 and 800 in multiples of 50 for Mode 2 and 3. This
restriction is necessary to keep size of parameter tables for the p-values
calculations manageable. Please see below for more details about the
parameters and their usage.
Syntax
RNAsnp -f <seq_file> -s <snp_file> [options]
General options:
Help:
-h, --help
Print help and exit
--detailed-help
Print help, including all details and hidden options, and exit
--full-help
Print help, including hidden options, and exit
-V, --version
Print version and exit
Input Options:
-f, --seq=STRING
File containing the input sequence
The single input sequence can be provided either in fasta format or
linear sequence without any gaps
-s, --snp=STRING
File containing the list of SNP
The list of SNPs to be tested have to be provided in separate lines, see
README file for more description about the input format
-m, --mode=INT
Select the mode of operation (default=`1')
1 - perform global folding by using RNAfold and compute the difference in
base pair probabilities for all sequence intervals
2 - perform local folding by using RNAplfold and compute the difference in
base pair probabilities for all sequence intervals of fixed length
3 - screen putative structure-disruptive SNPs in an RNA sequence
Mode 1 is designed to predict the effect of SNPs on short RNA sequences
(i.e., -w parameter is less than or equal to 500), where the base pair
probabilities of the wild-type and mutant RNA sequences are calculated using
the global folding method RNAfold. The structural difference between
wild-type and mutant is computed using Euclidean distance and Pearson
correlation measures for all sequence intervals (with minimum size of 50,
-l). Finally, the interval with maximum base pair distance or minimum
correlation coefficient and the corresponding p-value is reported.
Mode 2 is designed to predict the effect of SNPs on large RNA sequence.
Here, the base pair probabilities are calculated using the local folding
method RNAplfold (with -W 200 and -L 120 options). As a first step, the
structural difference is calculated using the Euclidean distance measure for
all sequence intervals of fixed window length (default: 20, -X) and allowing
the bases within the window can pair up to a distance of 120 (i.e. the
maximal span of a base pair, -Y). In the second step, the sequence interval
[u, v] with maximum base pair distance is selected to re-compute the
difference for all internal local intervals that starting at u. Finally, the
interval with maximum base pair distance and the corresponding p-value is
reported.
Mode 3, the combination of modes 1 and 2, is designed to screen all possible
structure-disruptive SNPs in an input sequence using a brute-force approach.
First, Mode 2 is applied to evaluate the SNP effect for all possible
substitutions at every nucleotide position. Second, the SNPs with p-value
less than 0.4 (--pvalue1) are subjected to Mode 1 to re-compute the structure
effect using a global folding approach. The SNPs that have significant local
structural effect (p-value less than 0.1, --pvalue2) are finally reported.
-w, --winsizeFold=INT
length of flanking sequence on either side of SNP considered for folding (default=`200')
By default the program uses +/- 200nts around the SNP position to compute the
base pair probabilities in all the three modes. This default value can be
changed between 100 and 800 (inclusive) in multiples of 50 for Mode 1, and
between 200 and 800 (inclusive) in multiples of 50 for Mode 2 and 3. In order
to achieve this, however, please make sure that the input sequence is at
least twice the size of chosen flanking. This restriction is necessary to
keep the size of parameter tables for the p-value calculations manageable.
In case the input sequence is less than twice the size of chosen flanking,
the RNAsnp takes the nts up to the start and end position of the given
sequence from the SNP position and perform the analysis. However, in this
case the reporting p-value is not accurate since the input sequence length
does not match the sequence length available in the pre-computed parameter
tables.
Additonal parameters:
The following optional paramaters can be provided as input together with the above general
options. However, it is important to note that the precomputed background scores, which
RNAsnp uses to estimate p-value, are based on the default value assigned to the following
parameters. Thus, if the default value is changed for any of the following
parameters (except --pvalue1 and --pvalue2), then the reporting p-value is not accurate.
-c, --cutoff=FLOAT
cut-off for the base pair probabilities.
This parameter is applicable to both Mode 1 and 2 (default=`0.01')
Base pair probabilities that are above this cut-off are only considered to
compute the Euclidean distance or correlation coefficient between wild-type
and mutant.
Parameters associated with mode -M 1:
-l, --minLen=INT
minimum length of the sequence interval (default=`50')
The structural difference between wild-type and mutant is computed for all
sequence intervals with the selected minimum length
Parameters associated with mode -M 2:
-W, --winsize=INT
Average the pair probabilities over windows of given size (default=`200')
-L, --span=INT
Set the maximum allowed separation of a base pair to span.
i.e. no pairs (i,j) with j-i > L will be allowed. (default=`120')
-X, --regionX=INT
Length of the local structural element that we expect to have
an effect (default=`20')
-Y, --regionY=INT
Length of the interval over which the local structural changes are evaluated,
i.e., the maximal span of a base pair (default=`120')
The functions of each of these parameters are mentioned in the description of
mode 2 shown above
Parameters associated with mode -M 3:
--pvalue1=FLOAT
p-value threshold to filter SNPs that are predicted using Mode 2 (default=`0.4')
--pvalue2=FLOAT
p-value threshold to filter SNPs that are predicted using Mode 1 (default=`0.1')
-e, --winsizeExt=INT
size of the flanking region on either side of SNP that includes the local window
returned by Mode 2. This subsequence is then passed to Mode 1 for re-computation
(default=`200')
Addition option to compute edist:
-E, --edist=INT
compute ensemble Euclidean distance between the distribution of structures between
two sequences (default=`0')
-C, --boltzmannPreFactor=DOUBLE
Multiply the bolztmann factor with a prefactor alpha (default=`1')
Input formats
Sequence file must contain one sequence (preferably in FASTA format). A sequence of length minimum 200 nts is required to run RNAsnp mode 1, and a minimum length of 400 nts is required to run RNAsnp mode 2 and 3.
SNP file must contain the list of SNPs that are given in separate lines. The SNPs are described as, wild-type nucletodie followed by nucleotide position
followed by mutant nucleotide. In case of multiple SNPs, the SNPs are delimited
by the special character "-".
Example SNP formats:
for single SNP: A201G
where, A is the wild-type nucleotide in the given sequence, 201 is the sequence
position of wild-type nucleotide and G is the mutant (or SNP).
for multiple SNPs: A201G-U257A-C260G
The multiple SNPs (which occurs together) are defined next to each other with the delimiter "-" between them.
Examples
The sequence and SNP files used for the demonstration here are present in the directory 'examples/'
RNAsnp mode 1
1) Test for the effect of single SNP with RNAsnp default mode -m 1
$ RNAsnp -f examples/seq1.txt -s examples/snp1.txt
SNP W Slen GC interval d_max p-value interval r_min p-value
U1013C 200 3344 0.5411 975-1025 0.2432 0.0724 998-1052 0.0615 0.0932
2) Test for the effect of mutiple SNPs with RNAsnp default mode -m 1
$ RNAsnp -f examples/seq2.txt -s examples/snp2.txt
SNP W Slen GC interval d_max p-value interval r_min p-value
C9294A-U9296G 200 9605 0.4814 9261-9310 0.1951 0.0749 9268-9317 0.2345 0.1213
RNAsnp mode 2
1) Test for the effect of single SNP with RNAsnp mode 2
$ RNAsnp -f examples/seq1.txt -s examples/snp1.txt -m 2
SNP w Slen GC max_k d_max p-value interval d p-value
U1013C 200 3344 0.5411 994 4.3961 0.2176 994-1019 0.1265 0.1232
2) Test for the effect of single SNP with RNAsnp mode 2
$ RNAsnp -f examples/seq2.txt -s examples/snp2.txt -m 2
SNP w Slen GC max_k d_max p-value interval d p-value
C9294A-U9296G 200 9605 0.4814 9270 7.0487 0.0624 9270-9298 0.2463 0.0099
RNAsnp mode 3
1) Screen possible structure-disruptive SNPs in a sequence with
default p-value thresholds (pvalue1<0.4 and pvalue2<0.1)
$ RNAsnp -f examples/seq1.txt -m 3
SNP w Slen GC interval d_max pvalue1 ewin interval d_max pvalue2
G1A 200 3344 0.5522 1-39 0.0185 0.2024 200 1-50 0.0961 0.0467
G1C 200 3344 0.5522 1-46 0.0421 0.0755 200 1-50 0.1581 0.0183
....
....
2) Screen putative structure-disruptive SNPs in a sequence with
different p-value thresholds (pvalue1<0.1 and pvalue2<0.1)
$ RNAsnp -f examples/seq1.txt -m 3 --pvalue1 0.1 --pvalue2 0.1
SNP w Slen GC interval d_max pvalue1 ewin interval d_max pvalue2
G1C 200 3344 0.5522 1-46 0.0421 0.0755 200 1-50 0.1581 0.0183
G7A 200 3344 0.5556 1-43 0.2236 0.0207 200 1-50 0.1570 0.0996
....
....
Please refer to the REAMDE file from the RNAsnp package to get more details about the ouput.
Datasets
The three different SNP datasets used for the RNAsnp analysis can be downloaded from here.
It contains the details of the SNPs, mapped sequences and the RNAsnp output for each dataset.
Changelog
RNAsnp software release and changes,
RNAsnp-1.2
- Feb 24, 2016 - Updated the README file and help page with detailed description of each parameters and its usage. Also,
included warnings if the default values for additional parameters are changed, because this could affect
the accuracy of reporting p-value.
RNAsnp-1.1
- Dec 03, 2012 - first release
- Apr 30, 2013 - fixed installation issue that occurred 'make install' option overwritten the library file "libRNA.a"
if a Vienna RNA package is previously installed.
- Jul 16, 2014 - Updated copyright details
References
- Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P. (1994)
Fast Folding and Comparison of RNA Secondary Structures.
Monatshefte f. Chemie 125: 167-188
- Lorenz R, Bernhart SH, Honer zu Siederdissen C, Tafer H, Flamm C, Stadler PF,
Hofacker IL (2011) ViennaRNA Package 2.0. Alg. Mol. Biol. 6:26.
If you find this software useful for your research, please cite the following work:
- Sabarinathan R, Tafer H, Seemann SE, Hofacker IL, Stadler PF, Gorodkin J.
RNAsnp: Efficient detection of local RNA secondary structure changes induced
by SNPs. Human Mutation 34:546-556, 2013 [ PubMed]
Contact
For any comments or bug reports please contact the authors. Email: sabari@rth.dk, htafer@bioinf.uni-leipzig.de
{include file='subcontent_soft.html'}