PETfold v2.0pre
===============
by Stefan E Seemann:
seemann@rth.dk


Outline:
--------
	1) Introduction
	2) Installation
	3) Usage
	4) Example
        5) References
	6) Contact


1) Introduction
---------------

PETfold is a computational method for determining the conserved
secondary structure of RNA sequences with maximium expected accuracy
(MEA). PETfold is the first tool which integrates the duality of
energy-based and evolution-based approaches for folding of multiple
aligned RNA sequences into a single optimization problem.

PETfold is an extended version of Pfold (see [1]) which identifies
basepairs that are most conserved and energetically most favorable
using a maximum expected accuracy scoring. The probabilities of single
stranded and base paired positions in the thermodynamic structure
ensemble are calculated by the Vienna RNA package (see [2]).

PETfold was tested on a set of 46 well curated Rfam families and its
performance compared to that of Pfold and RNAalifold. On average,
PETfold performs best when comparing the predicted structures to that
of Rfam. We obtained the following averages of Matthews correlation
coefficient: PETfold: 0.85, Pfold: 0.71, RNAalifold: 0.79.


2) Installation
---------------

PETfold needs the Vienna RNA Package (see [2]) installed on your
computer. You have to edit the variables 'HPATH' and 'LPATH' at the
top of the 'Makefile' in the 'src' subfolder by setting them to the
location of the Vienna RNA Package header files and the libRNA.a file.
The source code of Pfold is part of the PETfold program (thanks to
Bjarne Knudsen) and is compiled together with it.

$ cd src
$ make clean
$ make

If you don't start PETfold directly from the bin folder then you have
to set the environment variable PETFOLDBIN to the path of Pfold
grammar files scfg.rate and article.grm:
$ export PETFOLDBIN='<PATH>/bin'


3) Usage
--------

PETfold reads an RNA sequence alignment in fasta format (-f or --fasta).

The standard output is the consensus structure calculated by Pfold,
evolutionary constraints based on the reliabilities calculated by
Pfold and the consensus structure with its (normalized) score
calculated by PETfold. An alternative output is provided in fasta
format (--war). 

The single-stranded and basepair probabilities of the PET model can be
written in a pp-file (-r or --ppfile) that can be drawn as dotplot by
'drawdot' (part of the Pfold package). If the secondary structure is
already known then its PETfold score and base pair reliabilities can
be calculated too (-s or --setstruc). By default, a phylogenetic tree
is calculated from pairwise distances using the neighbour joining (NJ)
algorithm. However, the user can specify his own tree (-t or
--settree). Alternative structures can be calculated by a structure
sampling algorithm (-o or --suboptimal). A large number of suboptimals
should be generated and afterwards parsed and filtered by the user.

PETfold is optimized by several parameters. The advanced user has the
possibility to change them (--setevocon_bp, --setevocon_ss,
--setalpha, --setbeta, --setgap).
 
Here the usage:

PETfold v2.0pre
===============
by Stefan E Seemann (seemann@rth.dk)
Reference: Seemann et al. Nucleic Acids Res. 36(20):6355-62, 2008
Web service: http://rth.dk/resources/petfold

Usage:
  PETfold -f <file> [ options ] [ parameter settings ]

  -f --fasta <file>          ... alignment in fasta format
Options:
  -s --setstruc <structure>  ... calculates score for given structure in dot-bracket notation
  -t --settree <tree>        ... calculates score for given tree in Newick tree format
  --war                      ... fasta format output
  -r --ppfile <file>         ... writes PET reliabilities in file
  -o --suboptimal <nr>       ... number of alternative structures found by sampling
  --verbose                  ... writes long output
  --help                     ... this output
Parameter settings:
  -p --setevocon_bp <reliab> ... reliab.threshold for conserved base pairs (default: 0.9)
  -u --setevocon_ss <reliab> ... reliab.threshold for conserved unpaired bases (default: 1)
  -a --setalpha <nr>         ... weighting factor for unpaired reliabilities (default: 0.2)
  -b --setbeta <nr>          ... weighting factor for thermodynamic overlap (default: 1)
  -g --setgap <nr>           ... max. percent of gaps in alignment column (default:0.25)

You may set the environment variable PETFOLDBIN to path of files scfg.rate and article.grm:
export PETFOLDBIN=<PATH>

Parse sampling of suboptimal structures:
./PETfold -f <file> -o 1000 | grep Subopt | sort | uniq -c | sort -k 5 | tac | less


4) Example
----------
(using ViennaRNA-1.8.4)

$ cd <PATH>/PETfold

$ bin/PETfold -fexample/example.fasta

Pfold RNA structure:	(((((((...((...--.))..(((((((...)))))))....(((((....---)))))))))))).
Constraints:		(((((((........--......(((((.....))))).....((((.....---.))))))))))).
PETfold RNA structure:	(((((((..(((...--.))).(((((((...)))))))....(((((....---)))))))))))).
Score_{model,structure}{tree,alignment} = 0.716788257

$ bin/PETfold -f example/example.fasta -s '......................(((((((..............(((((.......)))))))))))).'

Pfold RNA structure:	(((((((...((...--.))..(((((((...)))))))....(((((....---)))))))))))).
Constraints:		(((((((........--......(((((.....))))).....((((.....---.))))))))))).
PETfold RNA structure:	...............--.....(((((((..............(((((....---)))))))))))).
Score_{model,structure}{tree,alignment} = 0.279231625

$ bin/PETfold -f example/example.fasta -t '(gca_bovine,gca_chicken,(gca_mouse,gca_rat))'
Pfold RNA structure:	(((((((...((...--.))..(((((((...)))))))....(((((....---)))))))))))).
Constraints:		(((((((........--......(((((.....))))).....((((.....---.))))))))))).
PETfold RNA structure:	(((((((..(((...--.))).(((((((...)))))))....(((((....---)))))))))))).
Score_{model,structure}{tree,alignment} = 0.716805118

$ bin/PETfold -f example/example.fasta -t '(gca_bovine:0.4,gca_chicken:0.3,(gca_mouse:0.2,gca_rat:0.1):0.5)'
Pfold RNA structure:	(((((((..(((...--.))).(((((((...)))))))....(((((....---)))))))))))).
Constraints:		(((((((........--......(((((.....))))).....((((.....---.))))))))))).
PETfold RNA structure:	(((((((..(((...--.))).(((((((...)))))))....(((((....---)))))))))))).
Score_{model,structure}{tree,alignment} = 0.728351409

$ bin/PETfold -f example/example.fasta -o 1000 | grep Subopt | sort | uniq -c | sort -k 5 | tac | head -5
1 Suboptimal structure:   (((((((.(((((..--.)).)(((((((...))))))).))..........---.....))))))).	0.512480370
1 Suboptimal structure:   .(((((((.(...).--)(..((((((((...)))))))).).(((((...)---)))).))))))..	0.482524834
1 Suboptimal structure:   .((((((..(((...--.)))((....).).(((....)..))(((((....---)))))))))))..	0.462736759
1 Suboptimal structure:   .((...).)((((((--((((((((((((...))))))))))).((((....---)))).))))))).	0.379600592
1 Suboptimal structure:   (((((((....))).--))))((((((((...))))))(...)..(((....---))).))(....).	0.346904255

$ bin/PETfold -f example/example.fasta --war

>gca_bovine
AGCCCUGUGGUGAAUUUACACGUUGAAUUGCAAAUUCAGAGAAGCAGCUUCAAU-UCUGCCGGGGCUU
>gca_chicken
GACUCUGUAGUGAAGU-UCAUAAUGAGUUGCAAACUCGUUGAUGUACACUAA-AGUGUGCCGGGGUCU
>gca_mouse
GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAAU-CUCUACUAAGACUU
>gca_rat
AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAUCU-UCUACUAAGGCUU
>structure
(((((((..(((...--.))).(((((((...)))))))....(((((....---)))))))))))).


5) References
-------------

[1] Knudsen, B. and Hein, J. (2003) 
    Pfold: RNA secondary structure prediction using stochastic context-free grammars. 
    Nucleic Acids Research, 31 (13), 3423-3428
[2] I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) 
    Fast Folding and Comparison of RNA Secondary Structures. 
    Monatshefte f. Chemie 125: 167-188

If you find this software useful for your research, please cite the following work:
    Seemann SE, Gorodkin J, Backofen, R.
    Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments.
    Nucleic Acids Research, 36(20):6355-6362, 2008
 

6) Contact
----------

seemann@rth.dk

