PETcofold v3.2
==============
by Stefan E Seemann:
seemann@rth.dk


Outline:
--------
	1) Introduction
	2) Installation
	3) Usage
	4) Example
        5) References
	6) Contact


1) Introduction
---------------

Integrated framework with PETfold to fold and search for RNA-RNA
interactions between two multiple alignments of RNA sequences.

PETcofold is a computational method that predicts the joint secondary
structure of two RNA alignments including RNA-RNA interactions with
maximum expected accuracy (MEA).  PETcofold is an extension of PETfold
(see [1]) which is the first tool integrating the duality of
energy-based and evolution-based approaches for folding of multiple
aligned RNA sequences into a single optimization problem.

PETcofold, like PETfold, applies Pfold (see [2]) which identifies
basepairs that are most conserved and energetically most favorable
using a maximum expected accuracy scoring. The probabilities of single
stranded and base paired positions in the thermodynamic structure
ensemble are calculated by the Vienna RNA package (see [3]).

The PETcofold pipeline consists of two steps: (1) intra-molecular
folding by PETfold of both alignments and selection of a set of highly
reliable base pairs (partial structure) that only decreases the
probability of the ensemble of the partial structure in some
pre-defined range; (2) inter-molecular folding by adapted PETfold of
concatenated alignments using constraints from step 1. In the end,
partial structures and constrained inter-molecular structures are
combined to the RNA-RNA cofolded structure including pseudoknots.


2) Installation
---------------

PETcofold comes together with the source code of PETfold and Pfold
(thanks to Bjarne Knudsen). In addition, it needs the Vienna RNA
Package (see [3]) installed on your computer. You have to edit the
variables 'HPATH' and 'LPATH' at the top of the 'Makefile' in the
'src' subfolder by setting them to the location of the Vienna RNA
Package header files and the libRNA.a file.

$ cd src
$ make clean
$ make

If you don't start PETcofold directly from the bin folder then you
have to set the environment variable PETFOLDBIN to the path of Pfold
grammar files scfg.rate and article.grm:
$ export PETFOLDBIN='<PATH>/bin'


3) Usage
--------

PETcofold reads two RNA sequence alignments in fasta format (-f or
--fasta) and works with the intersection of common identifiers. You
can run PETcofold with only one common identifier, however, at least
three common identifiers are recommended to exploit the strength of
the evolutionary model.

The standard output shows the partial structures for the
intra-molecular folding of both alignments and their probabilities in
the thermodynamic and the evolutionary model. Then, it shows the
PETfold predicted RNA secondary structures for both alignments and the
PETcofold predicted RNA secondary structure of the concatenated
alignments. Here, curly brackets stay for constrained base pairs (as
part of the partial structures), round brackets stay for
intra-molecular base pairs predicted in the second step and squared
brackets for RNA-RNA interactions. Finally, the (normalized) score
calculated by PETcofold is shown. An alternative output is provided in
fasta format (--war). A long output including the alignment without
gaps and the evolutionary tree is provided by the parameter --verbose.

The single-stranded and basepair probabilities of the PET model can be
written in a pp-file (-r or --ppfile) that can be drawn as dotplot by
'drawdot' (part of the Pfold package). If the cofolding secondary
structure is already known then its PETcofold score and base pair
reliabilities can be calculated too (-s or --setstruc). If the
pseudo-knot free consensus secondary structure of the first and/or
second RNA alignment is already known then step 1 of the PETcofold
pipeline looks only for highly reliable base pairs in these structures
(-n or --setstruc1, and -m or --setstruc2). By default, a phylogenetic
tree is calculated from pairwise distances using the neighbour joining
(NJ) algorithm.  However, the user can specify his own tree (-t or
--settree).

PETcofold has several parameters to influence the model. The maximal
allowed intra-molecular base pair reliability to be free for RNA-RNA
interaction is changed by -d or --setdelta, and the minimal allowed
partial structure probability is set by -i or --setgamma. You can also
decide if constrained stems of the partial structures should be
extended by reliable inner and outer base pairs (--extstem). From
PETfold the following parameters are adapted: --setevocon_bp,
--setevocon_ss, --setalpha, --setbeta, --setgap. The latter decides
how many gaps are allowed in one alignment column (default 0.25),
otherwise the column is ignored in the calculation (marked as "-" in
the output).


Here the usage:

PETcofold v3.2
==============
by Stefan E Seemann (seemann@rth.dk)
Reference: Seemann, Richter et al. Algorithms Mol Biol. 5:22, 2010
           Seemann, Richter et al. Bioinformatics 27(2):211-9, 2011
Web service: http://rth.dk/resources/petcofold

Usage:
  PETcofold -f <file1> -f <file2> [ options ] [ parameter settings ]

  -f --fasta <file1>         ... 1st alignment in fasta format
  -f --fasta <file2>         ... 2nd alignment in fasta format with same organisms than 1st
Options:
  -s --setstruc <struc|file> ... calculates score for given duplex structure in dot-bracket notation
                             ... step 1 constraints intramolecular ('(',')') and step 2 intermolecular structure ('[',']')
  -n --setstruc1 <struc|file>... finds reliable base pairs in given structure of 1st alignment in dot-bracket notation
  -m --setstruc2 <struc|file>... finds reliable base pairs in given structure of 2nd alignment in dot-bracket notation
  -t --settree <tree|file>   ... calculates score for given tree in Newick tree format
  --war                      ... fasta format output
  --intermol                 ... structure output of intermolecular base pairs
  -r --ppfile <file>         ... writes PET reliabilities in file
  --verbose                  ... writes long output
  --help                     ... this output
Parameter settings:
  -p --setevocon_bp <reliab> ... reliab.threshold for conserved base pairs (default: 0.9)
  -u --setevocon_ss <reliab> ... reliab.threshold for conserved unpaired bases (default: 1)
  -a --setalpha <nr>         ... weighting factor for unpaired reliabilities (default: 0.2)
  -b --setbeta <nr>          ... weighting factor for thermodynamic overlap (default: 1)
  -g --setgap <nr>           ... max. percent of gaps in alignment column (default:0.25)
  -d --setdelta <reliab>     ... max. intramolecular base pair reliability
                                 to be free for RNA-RNA interaction (former petcon; default: 0.9)
  -i --setgamma <probab>     ... minimal partial structure probability (former partprob; default: 0.1)
  --extstem                  ... constrained stems get extended by inner and outer base pairs

You may set the environment variable PETFOLDBIN to path of files scfg.rate and article.grm:
export PETFOLDBIN=<PATH>


4) Example
----------
(using ViennaRNA-2.0.7)

$ cd <PATH>/PETfold

$ bin/PETcofold -f example/example1.fasta -f example/example2.fasta
PETfold Alignment 1:
Delta = 0.90; thermodynamic partial structure probability = 0.66029; evolutionary partial structure probability = 0.94125
Partial Structure 1:	.((((..........--..............))))..
PETfold Alignment 2:
Delta = 0.90; thermodynamic partial structure probability = 0.94273; evolutionary partial structure probability = 0.98038
Partial Structure 2:	..((((((..............---)))))).
PETfold RNA structure:		((((((....((...--.))..........)))))). ..((((((..............---)))))).
PETcofold RNA structure:	.{{{{....(((...--.))).[[[[[[[[.}}}}..&..{{{{{{.]].]]]]]]....---}}}}}}.
Score_{model,structure}{tree,alignment} = 0.577586

$ bin/PETcofold -f example/example1.fasta -f example/example2.fasta --war
PETfold Alignment 1:
Delta = 0.90; thermodynamic partial structure probability = 0.66029; evolutionary partial structure probability = 0.94125
Partial Structure 1:	.((((..........--..............))))..
PETfold Alignment 2:
Delta = 0.90; thermodynamic partial structure probability = 0.94273; evolutionary partial structure probability = 0.98038
Partial Structure 2:	..((((((..............---)))))).
>gca_bovine
AGCCCUGUGGUGAAUUUACACGUUGAAUUGGGGGCUU&GAGGCCGGUCAAAUUCAGAUCAAU-CCGGCCA
>gca_chicken
GACUCUGUAGUGAAGU-UCAUAAUGAGUUGGGGGUCU&GAGGCCCACCAAACUCGUUUAA-AGUGGGCCA
>gca_mouse
GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGGAGACUU&GGCGUUGGGCAAACUCGAAAAAU-CCCAACGU
>gca_rat
AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGAGGGCUU&GGGGUUGGGCAAACUCGAAAAUCUACCAACUA
>structure
.{{{{....(((...--.))).[[[[[[[[.}}}}..&..{{{{{{.]].]]]]]]....---}}}}}}.

$ bin/PETcofold -f example/example1.fasta -f example/example2.fasta --extstem
PETfold Alignment 1:
Delta = 0.90; thermodynamic partial structure probability = 0.58189; evolutionary partial structure probability = 0.44087
Partial Structure 1:	((((((.........--.............)))))).
PETfold Alignment 2:
Delta = 0.90; thermodynamic partial structure probability = 0.94273; evolutionary partial structure probability = 0.98038
Partial Structure 2:	..((((((..............---)))))).
PETfold RNA structure:		((((((....((...--.))..........)))))). ..((((((..............---)))))).
PETcofold RNA structure:	{{{{{{...(((...--.))).[[[[[[[[}}}}}}.&..{{{{{{.]].]]]]]]....---}}}}}}.
Score_{model,structure}{tree,alignment} = 0.496890

$ bin/PETcofold -f example/example1.fasta -f example/example2.fasta -t '(gca_bovine:0.9,gca_chicken:0.01,(gca_mouse:0.01,gca_rat:0.1):0.9)'
PETfold Alignment 1:
Delta = 0.90; thermodynamic partial structure probability = 0.66029; evolutionary partial structure probability = 0.98129
Partial Structure 1:	.((((..........--..............))))..
PETfold Alignment 2:
Delta = 0.90; thermodynamic partial structure probability = 0.94273; evolutionary partial structure probability = 0.98859
Partial Structure 2:	..((((((..............---)))))).
PETfold RNA structure:		((((((...(((...--.))).........)))))). ..((((((..............---)))))).
PETcofold RNA structure:	.{{{{....(((...--.))).[[[[[[[[.}}}}..&..{{{{{{.]].]]]]]]....---}}}}}}.
Score_{model,structure}{tree,alignment} = 0.603243

$ bin/PETcofold -f example/example1.fasta -f example/example2.fasta -t '(gca_bovine,gca_chicken,(gca_mouse,gca_rat))'
PETfold Alignment 1:
Delta = 0.90; thermodynamic partial structure probability = 0.66029; evolutionary partial structure probability = 0.94008
Partial Structure 1:	.((((..........--..............))))..
PETfold Alignment 2:
Delta = 0.90; thermodynamic partial structure probability = 0.94273; evolutionary partial structure probability = 0.98043
Partial Structure 2:	..((((((..............---)))))).
PETfold RNA structure:		((((((....((...--.))..........)))))). ..((((((..............---)))))).
PETcofold RNA structure:	.{{{{....(((...--.))).[[[[[[[[.}}}}..&..{{{{{{.]].]]]]]]....---}}}}}}.
Score_{model,structure}{tree,alignment} = 0.577531

$ bin/PETcofold -f example/example1.fasta -f example/example2.fasta -s "((((((...((([[[..[))).........)))))).&..((((((.....]]]].....---))))))."
PETfold Alignment 1:
Delta = 0.00; thermodynamic partial structure probability = 0.01669; evolutionary partial structure probability = 0.07884
Partial Structure 1:	((((((...(((...--.))).........)))))).
PETfold Alignment 2:
Delta = 0.00; thermodynamic partial structure probability = 0.94273; evolutionary partial structure probability = 0.98038
Partial Structure 2:	..((((((..............---)))))).
PETfold RNA structure:		((((((...(((...--.))).........)))))). ..((((((..............---)))))).
PETcofold RNA structure:	{{{{{{...{{{[[[--[}}}.........}}}}}}.&..{{{{{{.....]]]].....---}}}}}}.
Score_{model,structure}{tree,alignment} = 0.213759



5) References
-------------

[1] Seemann SE, Gorodkin J, Backofen, R. (2008)
    Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments.
    Nucleic Acids Research, 36(20):6355-6362
[2] Knudsen, B. and Hein, J. (2003) 
    Pfold: RNA secondary structure prediction using stochastic context-free grammars. 
    Nucleic Acids Research, 31 (13), 3423-3428
[3] I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) 
    Fast Folding and Comparison of RNA Secondary Structures. 
    Monatshefte f. Chemie 125: 167-188

If you find this software useful for your research, please cite the following work:
    Seemann SE, Richter AS, Gesell T, Backofen R, Gorodkin J.
    PETcofold: Predicting conserved interactions and structures of two multiple alignments of RNA sequences.
    Bioinformatics, 27(2):211-9, 2011

    Seemann SE, Richter AS, Gorodkin J, Backofen, R.
    Hierarchical folding of multiple sequence alignments for the prediction of structures and RNA-RNA interactions.
    Algorithms in Molecular Biology, 5:22, 2010


6) Contact
----------

seemann@rth.dk

