Please contact Stefan Seemann (seemann@rth.dk) for any technical issues or general questions about the software.
PETfold performs Probabilistic Evolutionary and Thermodynamic folding of a multiple alignment of RNA sequences.
PETfold is a computational method that predicts the conserved secondary structure of RNA sequences with maximum expected accuracy (MEA). PETfold is the first tool which integrates the duality of energy-based and evolution-based approaches for folding of multiple aligned RNA sequences into a single optimization problem.
PETfold is an extended version of Pfold which identifies basepairs that are most conserved and energetically most favorable using a maximum expected accuracy scoring. The probabilities of single stranded and base paired positions in the thermodynamic structure ensemble are calculated by the Vienna RNA package.
The algorithm was tested on a set of 46 well curated Rfam families and its performance compared to that of Pfold and RNAalifold. On average, PETfold performs best when comparing the predicted structures to that of Rfam. We obtained the following averages of Matthews correlation coefficient: PETfold: 0.85, Pfold: 0.71, RNAalifold: 0.79.
The PETfold algorithm was first described together with its performance evaluation in:
Seemann SE, Gorodkin J, Backofen, R.
Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments.
Nucleic Acids Research, 36(20):6355-6362, 2008
PETfold reads an RNA sequence alignment in FASTA format.
>gca_bovine AGCCCUGUGGUGAAUUUACACGUUGAAUUGCAAAUUCAGAGAAGCAGCUUCAAU-UCUGCCGGGGCUU >gca_chicken GACUCUGUAGUGAAGU-UCAUAAUGAGUUGCAAACUCGUUGAUGUACACUAA-AGUGUGCCGGGGUCU >gca_mouse GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAAU-CUCUACUAAGACUU >gca_rat AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAUCU-UCUACUAAGGCUU
By default, a phylogenetic tree is calculated from pairwise distances using the neighbour joining (NJ) algorithm. However, the user can specify his own tree, which contains the same species that are also contained in the sequence alignment. The tree must be in Newick format, which is described here. The node names must be the same as the sequence names in the alignment file. If the branch lengths are not given then they are estimated by maximum likelihood.
((gca_rat:0.61783,gca_mouse:0.070947):0.29012,gca_chicken:0.372963,gca_bovine:0.159582):0.001Tree with 4 species and without branch lengths:
((gca_rat,gca_mouse),gca_chicken,gca_bovine)
If the Pseudo-knot free secondary structures is already known then its PETfold score and reliability can be calculated. The structure must be in dot bracket notation, which is described here.
(((((((..(((......))).(((((((...)))))))....(((((.......)))))))))))).
The PETfold algorithm is optimized by several parameters. They influence the scoring scheme and the impact of the evolutionary model on the structure prediction. The advanced user has the possibility to change them (values between 0 and 1):
If the phylogenetic tree was not given then the phylogeny of the sequences in the RNA alignment is calculated using the neighbour joining approach. Then the branch length are estimated by a maximum likelihood approach. The latter is also done if the tree was submitted without distances between the nodes. The output consists of a PNG image and the alternative download links to a PS and PDF file and the newick format.
The plain text output shows the command-line output of the program. The first line shows the consensus RNA secondary structure predicted by Pfold. In the second line you see the evolutionary constrained structure consisting of constrained base pairs as pairs of "(" and ")" and constrained unpaired bases as "x". The parameters Reliability threshold for evolutionary constrained base pairs and Reliability threshold for evolutionary constrained unpaired bases determine the constrained structure which is not influenced by the thermodynamic model. However, the default setting of the Reliability threshold for evolutionary constrained unpaired bases as 1 allows no constrained unpaired bases. The third line shows the consensus RNA secondary structure predicted by PETfold. In the fourth the score of the maximal expected accuracy structure is listed (score is normalized by the alignment length). In addition, the PETfold output in verbose mode can be downloaded.
The alignment is shown as PNG figure annotated by sequence position, the consensus structure, colored base pairs using the Vienna RNA conservation coloring schema, the reliability of each nucleotide to be base paired (reliab_paired) and the sequence conservation. Furthermore, the consensus secondary structure is shown as PNG figure with colored base pairs using again the Vienna RNA conservation coloring schema. The compensatory mutation supporting the consensus structure are marked by color. The color scheme is the same employed by RNAalifold and alidot: Red marks pairs with no sequence variation; ochre, green, turquoise, blue, and violet mark pairs with 2,3,4,5,6 different tpyes of pairs, respectively. Paired reliability and conservation are drawn as barplots with values from 0 to 1. Both figures are created by adapted versions of the Vienna RNA Utilities colorrna.pl and coloraln.pl. Alternatively the figures can be downloaded as PS and PDF file.
The PETfold reliabilities of all possible base pairs are illustrated as rectangles in the upper triangle of the PNG figure. The base pairs which are part of the predicted consensus RNA secondary structure are illustrated as rectangles in the lower triangle. On the axes the single stranded reliabilities of the PETfold model are shown as well as rectangles. The size of the rectangles are proportional to the reliabilities. Alternatively the figure can be downloaded as PS and PDF file or the reliabilities as plain text file.