RNAbound v1.1
=============

CONTENTS:
--------
	1) Introduction	
	2) Usage
	3) Input formats
	4) Examples
 	5) References
	6) Contact

1) Introduction
---------------

Self-contained structured domains of RNA sequences have often distinct molecular
functions. The RNAbound program helps to predict the boundaries of self-contained
structured domain from the base pairing probability (P) matrix. The input matrix (P)
can be computed using RNAfold for single sequence or PETfold for multiple sequence
alignments. 
	
2) Usage 
--------
A) Run RNAbound directly using base pairing probability matrix
==============================================================
Usage:
      rnabound.pl [OPTIONS] --dotfile=<input>

Options:
    --dotfile <string>
        base pairing probability matrix (postscript file) from RNAfold

    --relibmat <string>
        base pairing reliability matrix (text file) from PETfold

    --minLen <int>
        minimum size of the subsequence [k,l] to be considered for fitness
        function (default: 10)

    --flanking <int>
        size of the flanking regions to be considered for spaining base pair
        probabilities for the fitness function (default: 10)

    --pnull <float>
        threshold for base pair probabilities (default: 0.0005)


B) Compute base pairing probability for given sequence or alignemnt and run RNAbound
====================================================================================
Usage: 
        run_bppCal_rnabound.sh [-s <single_Seqfile> | -m <multiple sequence file>]

Options:
    -h  display this help and exit
    -s  single sequence file 
    -m  multiple sequence alignment (fasta file)


3) Input formats 
----------------
For single sequence, the base pairing probabilitiy matrix (dot plot) can be 
computed using RNAfold (with -p parameter) from Vienna RNA package. RNAfold
returns the base pairing probabilities in a post-script file (e.g,. dot.ps), 
which can be provided as input to the RNAbound with the --dotfile" parameter.

For multiple sequence alignment, the base pairing probability matrix can be 
computed using PETfold (with -r parameter). PETfold returns the base pairing
probailities in a text file, which can be provided as input to the RNAbound
with the "--relibmat" parameter.

Alternatively, if the PETFold and RNAfold (from Vienna RNA package) are installed 
locally, then "run_bppCal_rnabound.sh" can be used to compute the base pairing
probability matrix. The input sequence must be in fasta format for both single
sequence and multiple sequence alignment. 

4) Examples 
-----------
a) Run RNAbound using base pairing probability computed using RNAfold (on single tRNA sequence)

$ scripts/rnabound.pl --dotfile examples/tRNA_rnafold_pp_w100.ps >output.txt

b) Run RNAbound using base pairing probaiility computed using PETfold (on multiple sequence alignment of tRNA sequences)

$ scripts/rnabound.pl --relibmat examples/tRNA_petfold_pp_w100.txt >output.txt

Output details: 
Column 1: start position of the detected boundary (k)
Column 2: end position of the detected boundary (l)
Column 3: (D_kl_w) output of the fitness function that represents the self-containedness of the 
          RNA secondary structure of a segment [k, l]. 

The output file is sorted based on the D_kl_w score (higher to lower) and the top hit is considered
as the boundary of the secondary structure. 

c) The following script can be used together with rnabound output to filter out the overlapping hits and provide only
the unique hits. 

$ scripts/rnabound.pl --dotfile examples/tRNA_rnafold_pp_w100.ps  | python scripts/unique_intervals.py /dev/stdin >output.txt

d) Compute base pairing probability matrix and then run RNAbound

# for single sequence
$ scripts/run_bppCal_rnabound.sh -s examples/tRNA_seq.fa

# for multiple sequence alignment
$ scripts/run_bppCal_rnabound.sh -f examples/tRNA_align.fa

5) References
-------------
[1] Lorenz R, Bernhart SH, Honer zu Siederdissen C, Tafer H, Flamm C, Stadler PF,
    Hofacker IL (2011) ViennaRNA Package 2.0. Alg. Mol. Biol. 6:26.
[2] Seemann SE, Gorodkin J, Backofen R (2008) Unifying evolutionary and
    thermodynamic information for RNA folding of multiple alignments. 
    Nucleic Acids Res. 36, 6355–6362.

If you find this software useful for your research, please cite the following work:
    Sabarinathan R, Anthon C, Gorodkin J, Seemann SE (2018)
    Multiple sequence alignments enhance boundary definition of RNA structures
    Gene (under revision).

6) Contact
----------
For any comments or bug reports please contact the authors
Email: sabari@rth.dk, seemann@rth.dk
