PETfold 1.0
===========
by Stefan E Seemann:
seemann@genome.ku.dk


Outline:
--------
	1) Introduction
	2) Installation
	3) Usage
	4) Example
        5) References
	6) Contact


1) Introduction
---------------

PETfold is a computational method for determining the secondary
structure of RNA sequences. PETfold is the first tool which integrates
the duality of energy-based and evolution-based approaches for folding
of multiple aligned RNA sequences into a single optimization problem.

PETfold is an extended version of Pfold (see [1]) which identifies
basepairs that are most conserved and energetically most favorable
using a maximum expected accuracy scoring. The free energies of single
stranded and base paired positions are taken from RNAfold (see [2]).

PETfold was tested on a set of 46 well curated Rfam families and its
performance compared to that of Pfold and RNAalifold. On average,
PETfold performs best when comparing the predicted structures to that
of Rfam. We obtained the following averages of Matthews correlation
coefficient: PETfold: 0.85, Pfold: 0.71, RNAalifold: 0.79.


2) Installation
---------------

PETfold needs Pfold (see [1]) and the Vienna RNA Package (see [2])
installed on your computer. Pfold is delivered with PETfold (Thanks to
Bjarne Knudsen). It is located in './programs/pfold/bin/'. Make sure
the Vienna RNA Package is correctly installed.

The binary folder of the Vienna RNA Package (location of RNAfold) has
to be made public in an environmental variable called 'RNAfold_bin'
and the binary folder of Pfold (location of fasta2col, findphyl,
scfg.rate, mltree, scfg.rate, scfg, article.grm) in an environmental
variable called 'Pfold_bin'.

In a bash the commands would be:

$ export RNAfold_bin='/opt/ViennaRNA/bin/'
$ export Pfold_bin='<PATH>./programs/pfold/bin/'

Furthermore, some Pfold scripts need GNU awk (gawk) installed in '/bin/'.

Now petfold is ready!


3) Usage
--------

PETfold reads an RNA sequence alignment in fasta format (--fasta).

The standard output is the consensus structure calculated by Pfold,
evolutionary constraints based on the reliabilities calculated by
Pfold and the consensus structure with its score and reliability
calculated by PETfold. An alternative output is provided in fasta
format (--war). 

If the secondary structure is already known then its PETfold score and
reliability can be calculated too (--setstruct). By default, a
phylogenetic tree is calculated from pairwise distances using the
neighbour joining (NJ) algorithm. However, the user can specify his
own tree (--settree). It is also possible to write the probabilities
of the PET model in a pp-file (--ppfile) that can be drawn as dotplot
by 'drawplot'.

PETfold is optimized by several parameters. The advanced user has the
possibility to change them (--setevocon_bp, --setevocon_ss,
--setalpha, --setbeta, --setgap).
 
Here the usage:

PETfold v1.0
============
by Stefan E Seemann (seemann@genome.ku.dk)

Usage: PETfold.pl --fasta <file> [ options ] [ parameter settings ]

   --fasta <file>               ... alignment in fasta format
Options:
   --setstruct <structure>      ... calculates score for given structure
   --settree <tree>	        ... calculates score for given tree in Newick tree format
   --war                        ... fasta format output
   --ppfile <file>	        ... writes PET probabilities in pp-file
	                            which can be drawn (as ps) using 'pfold/drawdot'
Parameter settings:
   --setevocon_bp <reliability> ... rel.threshold for conserved basepairs (default: 0.9)
   --setevocon_ss <reliability> ... rel.threshold for conserved single stranded pos. (default: 1)
   --setalpha <nr>              ... weighting factor for single stranded probs (default: 0.2)
   --setbeta <nr>               ... weighting factor for thermodynamic overlap (default: 1)
   --setgap <nr>                ... max. percent of gaps in alignment column (default:0.25)


4) Example
----------

$ cd <PATH>/PETfold1.0

$ bin/PETfold.pl --fasta example/example.fasta

Pfold RNA sec.struct.:          (((((((...((...--.))..(((((((...)))))))....((((.....---.))))))))))).
Constraints:                    <<<<<<<........--......<<<<<.....>>>>>.....<<<<.....---.>>>>>>>>>>>.
PETfold RNA sec.struct.:        (((((((..(((...--.))).(((((((...)))))))....(((((....---)))))))))))).
Score_{model,structure}(tree,alignment) = 43.667773989852
Reliability_{model,structure}(tree,alignment) = 0.69313926968019

$ bin/PETfold.pl --fasta example/example.fasta --war

>gca_bovine
AGCCCUGUGGUGAAUUUACACGUUGAAUUGCAAAUUCAGAGAAGCAGCUUCAAU-UCUGCCGGGGCUU
>gca_chicken
GACUCUGUAGUGAAGU-UCAUAAUGAGUUGCAAACUCGUUGAUGUACACUAA-AGUGUGCCGGGGUCU
>gca_mouse
GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAAU-CUCUACUAAGACUU
>gca_rat
AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAUCU-UCUACUAAGGCUU
>structure
(((((((..(((......))).(((((((...)))))))....(((((.......)))))))))))).

$ bin/PETfold.pl --fasta example/example.fasta --setstruct '......................(((((((..............(((((.......)))))))))))).'

Pfold RNA sec.struct.:          (((((((...((...--.))..(((((((...)))))))....((((.....---.))))))))))).
Constraints:                    xxxxxxxxxxxxxxx--xxxxx<<<<<<<xxxxxxxxxxxxxx<<<<<xxxx--->>>>>>>>>>>>x
PETfold RNA sec.struct.:        ...............--.....(((((((..............(((((....---)))))))))))).
Score_{model,structure}(tree,alignment) = 16.851337033201
Reliability_{model,structure}(tree,alignment) = 0.267481540209539

$ bin/PETfold.pl --fasta example/example.fasta --settree '(gca_bovine,gca_chicken,(gca_mouse,gca_rat))'

Pfold RNA sec.struct.:	(((((((...((...--.))..(((((((...)))))))....((((.....---.))))))))))).
Constraints:	<<<<<<<........--......<<<<<.....>>>>>.....<<<<.....---.>>>>>>>>>>>.
PETfold RNA sec.struct.:	(((((((..(((...--.))).(((((((...)))))))....(((((....---)))))))))))).
Score_{model,structure}(tree,alignment) = 43.668862589852
Reliability_{model,structure}(tree,alignment) = 0.69315654904527

$ bin/PETfold.pl --fasta example/example.fasta --settree '(gca_bovine:0.4,gca_chicken:0.3,(gca_mouse:0.2,gca_rat:0.1):0.5)'

Pfold RNA sec.struct.:	(((((((...((...--.))..(((((((...)))))))....((((.....---.))))))))))).
Constraints:	<<<<<<<........--......<<<<<.....>>>>>.....<<<<.....---.>>>>>>>>>>>.
PETfold RNA sec.struct.:	(((((((..(((...--.))).(((((((...)))))))....(((((....---)))))))))))).
Score_{model,structure}(tree,alignment) = 44.348724789852
Reliability_{model,structure}(tree,alignment) = 0.703948012537333


5) References
-------------

[1] Knudsen, B. and Hein, J. (2003) 
    Pfold: RNA secondary structure prediction using stochastic context-free grammars. 
    Nucleic Acids Research, 31 (13), 3423-3428
[2] I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) 
    Fast Folding and Comparison of RNA Secondary Structures. 
    Monatshefte f. Chemie 125: 167-188

If you find this software useful for your research, please cite the following work:
    Seemann SE, Gorodkin J, Backofen, R.
    Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments.
    Nucleic Acids Research, 36(20):6355-6362, 2008
 

6) Contact
----------

seemann@genome.ku.dk

