Contact

Please contact Stefan Seemann (seemann@rth.dk) for any technical issues or general questions about the software.

Introduction

PETfold performs Probabilistic Evolutionary and Thermodynamic folding of a multiple alignment of RNA sequences.

PETfold is a computational method that predicts the conserved secondary structure of RNA sequences with maximum expected accuracy (MEA). PETfold is the first tool which integrates the duality of energy-based and evolution-based approaches for folding of multiple aligned RNA sequences into a single optimization problem.

PETfold is an extended version of Pfold which identifies basepairs that are most conserved and energetically most favorable using a maximum expected accuracy scoring. The probabilities of single stranded and base paired positions in the thermodynamic structure ensemble are calculated by the Vienna RNA package.

The algorithm was tested on a set of 46 well curated Rfam families and its performance compared to that of Pfold and RNAalifold. On average, PETfold performs best when comparing the predicted structures to that of Rfam. We obtained the following averages of Matthews correlation coefficient: PETfold: 0.85, Pfold: 0.71, RNAalifold: 0.79.

The PETfold algorithm was first described together with its performance evaluation in:
Seemann SE, Gorodkin J, Backofen, R.
Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments.
Nucleic Acids Research, 36(20):6355-6362, 2008

Input

Multiple Sequence Alignment

PETfold reads an RNA sequence alignment in FASTA format.

Example
>gca_bovine
AGCCCUGUGGUGAAUUUACACGUUGAAUUGCAAAUUCAGAGAAGCAGCUUCAAU-UCUGCCGGGGCUU
>gca_chicken
GACUCUGUAGUGAAGU-UCAUAAUGAGUUGCAAACUCGUUGAUGUACACUAA-AGUGUGCCGGGGUCU
>gca_mouse
GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAAU-CUCUACUAAGACUU
>gca_rat
AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAUCU-UCUACUAAGGCUU

Optional constraints

Phylogenetic tree

By default, a phylogenetic tree is calculated from pairwise distances using the neighbour joining (NJ) algorithm. However, the user can specify his own tree, which contains the same species that are also contained in the sequence alignment. The tree must be in Newick format, which is described here. The node names must be the same as the sequence names in the alignment file. If the branch lengths are not given then they are estimated by maximum likelihood.

Example
Tree with 4 species and branch lengths:
((gca_rat:0.61783,gca_mouse:0.070947):0.29012,gca_chicken:0.372963,gca_bovine:0.159582):0.001
Tree with 4 species and without branch lengths:
((gca_rat,gca_mouse),gca_chicken,gca_bovine)

RNA secondary structure

If the Pseudo-knot free secondary structures is already known then its PETfold score and reliability can be calculated. The structure must be in dot bracket notation, which is described here.

Example
(((((((..(((......))).(((((((...)))))))....(((((.......)))))))))))).

Optional parameters

The PETfold algorithm is optimized by several parameters. They influence the scoring scheme and the impact of the evolutionary model on the structure prediction. The advanced user has the possibility to change them (values between 0 and 1):