PETcofold 3.1
=============
by Stefan E Seemann:
seemann@genome.ku.dk


Outline:
--------
	1) Introduction
	2) Installation
	3) Usage
	4) Example
        5) References
	6) Contact


1) Introduction
---------------

PETcofold is a computational method for determining the joint secondary
structure of two RNA alignments including RNA-RNA interactions. PETcofold is an
extension of PETfold (see [1]) which is the first tool integrating the duality
of energy-based and evolution-based approaches for folding of multiple aligned
RNA sequences into a single optimization problem.

PETcofold, like PETfold, applies Pfold (see [2]) which identifies basepairs that
are most conserved and energetically most favorable using a maximum expected
accuracy scoring. The free energies of single stranded and base paired positions
are taken from RNAfold (see [3]).  Inter-molecular thermodynamic folding
probabilities are calculated by RNAcofold in PETcofold.

The PETcofold pipeline consists of two steps: (1) intra-molecular folding by
PETfold of both alignments and selection of a set of highly reliable base pairs
(partial structure) that only decreases the probability of the ensemble of the
partial structure in some pre-defined range; (2) inter-molecular folding by
adapted PETfold of concatenated alignments using constraints from step 1. In the
end, partial structures and constrained inter-molecular structures are
combined to the RNA-RNA cofolded structure including pseudoknots.


2) Installation
---------------

PETcofold needs Pfold (see [2]) and the Vienna RNA Package (see [3]) installed
on your computer. An adapted Pfold version is delivered with PETcofold (Thanks
to Bjarne Knudsen). It is located in './programs/pfold/bin/'. Make sure the
Vienna RNA Package is correctly installed.

The binary folder of the Vienna RNA Package (location of RNAfold and RNAcofold)
has to be made public in an environmental variable called 'RNAfold_bin' and the
binary folder of Pfold (location of fasta2col, findphyl, scfg.rate, mltree,
scfg.rate, scfg, article.grm, drawplot) in an environmental variable called
'Pfold_bin'.

In a bash the commands would be:

$ export RNAfold_bin='/opt/ViennaRNA/bin/'
$ export Pfold_bin='<PATH>./programs/pfold/bin/'

Furthermore, some Pfold scripts need GNU awk (gawk) installed in '/bin/'.

Now PETcofold is ready!


3) Usage
--------

PETcofold reads two RNA sequence alignments in fasta format (-fasta). PETcofold
wants at least three common identifiers (text before the first occurrence of a
dot) in both alignments to build a phylogenetic tree, otherwise it stops.

The standard output shows the partial structure and its probabilities for the
intra-molecular folding of both alignments.  Then, it shows the PETfold RNA
secondary structures for both alignments and the PETcofold RNA secondary
structure of the concatenated alignments. Here, curly brackets stay for
constrained base pairs (as part of the partial structures), round brackets stay for intra-molecular base pairs predicted in the second step and squared brackets
for RNA-RNA interactions.  In the end, the PETcofold score, reliability (score
divided by the alignments length) and delta Reliability binding (difference of
cofolding reliabilty and arithmetric mean of intra-molecular structure
reliabilities of both alignments) of the RNA cofolding structure are listed. An
alternative output is provided in fasta format (--war).

By default, a phylogenetic tree is calculated from pairwise distances using the
neighbour joining (NJ) algorithm. However, the user can specify his own tree
(--settree).  It is also possible to write the probabilities of the PET model in
a pp-file (--ppfile) that can be drawn as dotplot by 'drawplot'.

PETcofold has several parameters to influence the model. First of all, columns
with more/equal than 25% gaps are ignored. This threshold can be changed by the
parameter (-setgap).  Further parameters set the maximal intra-molecular base
pair reliability to be free for RNA-RNA interaction (-setpetcon) and the minimal
partial structure probability (-partprob). You can also decide if lonely base
pairs should be forbidden in the thermodynamic folding (-noLP) and if constraint
stems should be extended by reliable inner and outer base pairs (-extstem).

From PETfold the following parameters are adapted: --setevocon_bp,
--setevocon_ss,--setalpha, --setbeta.  The remaining parameters have been used
to evaluate PETcofold, but they have not been shown to increase or change the
performance of PETcofold.  All available paramters are listed when the program
is called without the mandatory parameters.

Here the usage:

PETcofold v3.1
==============
by Stefan E Seemann (seemann@genome.ku.dk)

Usage: PETcofold.pl -fasta <file1> -fasta <file2> [ options ] [ parameter settings ]

Mandatory Input:
   -fasta <file1>  -fasta <file2>    ... 2 alignments with same organisms in fasta format
Options:
   -settree <tree>		     ... calculates score for given tree in Newick tree format
   -war			     	     ... fasta format output
   -intermol			     ... structure output of inter-molecular base pairs
   -ppfile <file>		     ... writes PET probabilities in pp-file
                                         which can be drawn (as ps) using 'programs/pfold/bin/drawdot'
Parameter settings:
   -setgap <nr>                      ... max. percent of gaps in alignment column (default: 0.25)
   -setpetcon <reliability>          ... max. intra-mol. base pair rel. to be free for RNA-RNA interaction (default: 0.9)
   -setpartprob <probability>	     ... minimal partial structure probability (default: 0.1)
   -noLP			     ... RNA(co)fold option: disallows pairs that can only occur isolated
   -extstem			     ... constraint stems get extended by inner and outer base pairs
PETfold specific parameters:
   -setevocon_bp <reliability>       ... rel. threshold for conserved base pairs (default: 0.9)
   -setevocon_ss <reliability>       ... rel. threshold for conserved single stranded pos. (default: 1)
   -setalpha <nr>                    ... weighting factor for single stranded probs (default: 0.2)
   -setbeta <nr>                     ... weighting factor for thermodynamic overlap (default: 1)
Experimental parameters:
   -setbetaduplex <nr>		     ... weighting factor for thermodynamic overlap in RNA duplex folding - step 2 (default: 1)
					 if the average reliability of the extended stem is larger than 'setpetcon'
   -mrnanotconstr 		     ... mRNA (second sequence by convention) is not constrained in RNA cofolding - step 2
   -partprob_and		     ... thermodynamic AND evolutionary partial structure probability have to be larger than 'setpartprob'
                                         by default OR


4) Example
----------

$ cd <PATH>/PETcofold

$ bin/PETcofold_3_1_2.pl -fasta example/example1.fasta -fasta example/example2.fasta

PETfold on alignment 1:
petcon = 0.9; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.998503333124646
Partial structure 1: .<<............--................>>..
PETfold on alignment 2:
petcon = 0.9; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.980380211147457
Partial structure 2: ..<<<<<<..............--->>>>>>.
Input Files:	example/example1.fasta example/example2.fasta
Common Identifiers:	gca_bovine, gca_chicken, gca_mouse, gca_rat, 
PETfold RNA sec.struct.:	((((((....((...--.))..........)))))). ..((((((..............---)))))).
PETcofold RNA sec.struct.:	.{{......(((...--.))).[[[[[[[[...}}..&..{{{{{{.].]]]]]]]....---}}}}}}.
Score_{model,structure}(tree,alignment) = 36.322593231148
Reliability_{model,structure}(tree,alignment) = 0.567540519236688
delta Reliability binding = 0.0379148081205763

$ bin/PETcofold_3_1_2.pl -fasta example/example1.fasta -fasta example/example2.fasta -war

PETfold on alignment 1:
petcon = 0.9; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.998503333124646
Partial structure 1: .<<............--................>>..
PETfold on alignment 2:
petcon = 0.9; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.980380211147457
Partial structure 2: ..<<<<<<..............--->>>>>>.
>gca_bovine
AGCCCUGUGGUGAAUUUACACGUUGAAUUGGGGGCUU&GAGGCCGGUCAAAUUCAGAUCAAU-CCGGCCA
>gca_chicken
GACUCUGUAGUGAAGU-UCAUAAUGAGUUGGGGGUCU&GAGGCCCACCAAACUCGUUUAA-AGUGGGCCA
>gca_mouse
GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGGAGACUU&GGCGUUGGGCAAACUCGAAAAAU-CCCAACGU
>gca_rat
AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGAGGGCUU&GGGGUUGGGCAAACUCGAAAAUCUACCAACUA
>structure
.{{......(((......))).[[[[[[[[...}}..&..{{{{{{.].]]]]]]].......}}}}}}.

$ bin/PETcofold_3_1_2.pl -fasta example/example1.fasta -fasta example/example2.fasta -setpetcon 0.8 -extstem -noLP

PETfold on alignment 1:
petcon = 0.8; thermodynamic partial structure probability = 0.983905604336076; evolutionary partial structure probability = 0.440970109828062
Partial structure 1: <<<<<<.........--.............>>>>>>.
PETfold on alignment 2:
petcon = 0.8; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.980380211147457
Partial structure 2: ..<<<<<<..............--->>>>>>.
Input Files:	example/example1.fasta example/example2.fasta
Common Identifiers:	gca_bovine, gca_chicken, gca_mouse, gca_rat, 
PETfold RNA sec.struct.:	((((((....((...--.))..........)))))). ..((((((..............---)))))).
PETcofold RNA sec.struct.:	{{{{{{...(((...--.))).[[[[[[[[}}}}}}.&..{{{{{{.].]]]]]]]....---}}}}}}.
Score_{model,structure}(tree,alignment) = 37.7899705234401
Reliability_{model,structure}(tree,alignment) = 0.590468289428752
delta Reliability binding = 0.0611194109769494

$ bin/PETcofold_3_1_2.pl -fasta example/example1.fasta -fasta example/example2.fasta -settree '(gca_bovine:0.9,gca_chicken:0.01,(gca_mouse:0.01,gca_rat:0.1):0.9)'

PETfold on alignment 1:
petcon = 0.9; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.998503333124646
Partial structure 1: .<<............--................>>..
PETfold on alignment 2:
petcon = 0.9; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.980380211147457
Partial structure 2: ..<<<<<<..............--->>>>>>.
Input Files:	example/example1.fasta example/example2.fasta
Common Identifiers:	gca_bovine, gca_chicken, gca_mouse, gca_rat, 
PETfold RNA sec.struct.:	((((((...(((...--.))).........)))))). ..((((((..............---)))))).
PETcofold RNA sec.struct.:	.{{......(((...--.))).[[[[[[[[...}}..&..{{{{{{.].]]]]]]]....---}}}}}}.
Score_{model,structure}(tree,alignment) = 37.1869426243743
Reliability_{model,structure}(tree,alignment) = 0.581045978505848
delta Reliability binding = 0.0406535135290625


5) References
-------------

[1] Seemann SE, Gorodkin J, Backofen, R. (2008)
    Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments.
    Nucleic Acids Research, 36(20):6355-6362
[2] Knudsen, B. and Hein, J. (2003) 
    Pfold: RNA secondary structure prediction using stochastic context-free grammars. 
    Nucleic Acids Research, 31 (13), 3423-3428
[3] I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) 
    Fast Folding and Comparison of RNA Secondary Structures. 
    Monatshefte f. Chemie 125: 167-188

If you find this software useful for your research, please cite the following work:
    Almob in progress
    Bioinf in progress

6) Contact
----------

seemann@genome.ku.dk

