PETfold web server : Help

Contact

Please contact Stefan Seemann (seemann@rth.dk) for any technical issues or general questions about the software.

Introduction

PETfold performs Probabilistic Evolutionary and Thermodynamic folding of a multiple alignment of RNA sequences.

PETfold is a computational method that predicts the conserved secondary structure of RNA sequences with maximum expected accuracy (MEA). PETfold is the first tool which integrates the duality of energy-based and evolution-based approaches for folding of multiple aligned RNA sequences into a single optimization problem.

PETfold is an extended version of Pfold which identifies basepairs that are most conserved and energetically most favorable using a maximum expected accuracy scoring. The probabilities of single stranded and base paired positions in the thermodynamic structure ensemble are calculated by the Vienna RNA package.

The algorithm was tested on a set of 46 well curated Rfam families and its performance compared to that of Pfold and RNAalifold. On average, PETfold performs best when comparing the predicted structures to that of Rfam. We obtained the following averages of Matthews correlation coefficient: PETfold: 0.85, Pfold: 0.71, RNAalifold: 0.79.

The PETfold algorithm was first described together with its performance evaluation in:
Seemann SE, Gorodkin J, Backofen, R.
Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments.
Nucleic Acids Research, 36(20):6355-6362, 2008

Input

Multiple Sequence Alignment

PETfold reads an RNA sequence alignment in FASTA format.

Example

>gca_bovine
AGCCCUGUGGUGAAUUUACACGUUGAAUUGCAAAUUCAGAGAAGCAGCUUCAAU-UCUGCCGGGGCUU
>gca_chicken
GACUCUGUAGUGAAGU-UCAUAAUGAGUUGCAAACUCGUUGAUGUACACUAA-AGUGUGCCGGGGUCU
>gca_mouse
GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAAU-CUCUACUAAGACUU
>gca_rat
AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAUCU-UCUACUAAGGCUU

Optional constraints

Phylogenetic tree

By default, a phylogenetic tree is calculated from pairwise distances using the neighbour joining (NJ) algorithm. However, the user can specify his own tree, which contains the same species that are also contained in the sequence alignment. The tree must be in Newick format, which is described here. The node names must be the same as the sequence names in the alignment file. If the branch lengths are not given then they are estimated by maximum likelihood.

Example

Tree with 4 species and branch lengths:

((gca_rat:0.61783,gca_mouse:0.070947):0.29012,gca_chicken:0.372963,gca_bovine:0.159582):0.001

Tree with 4 species and without branch lengths:

((gca_rat,gca_mouse),gca_chicken,gca_bovine)

RNA secondary structure

If the Pseudo-knot free secondary structures is already known then its PETfold score and reliability can be calculated. The structure must be in dot bracket notation, which is described here.

Example

(((((((..(((......))).(((((((...)))))))....(((((.......)))))))))))).

Optional parameters

The PETfold algorithm is optimized by several parameters. They influence the scoring scheme and the impact of the evolutionary model on the structure prediction. The advanced user has the possibility to change them (values between 0 and 1):

Reliability threshold for evolutionary constrained base pairs (default: 0.9)
Reliability threshold for evolutionary constrained unpaired bases (default: 1)
Weighting factor Alpha for single stranded probabilities (default: 0.2)
Weighting factor Beta for thermodynamic overlap (default: 1)
Maximal percentage of gaps in alignment column (default:0.25)
Alignment columns with an higher amount of gaps as allowed by this parameter are ignored during the entire calculation and are marked as '-' in the PETfold output.
Output
The output shows the input, phylogenetic tree, the PETfold plain text output including the predicted RNA secondary structure and score, images of the predicted consensus RNA secondary structure, and a dotplot of base pair and single stranded reliabilities calculated by the PETfold model. See the example section to see an example how the output looks like.
Phylogenetic tree

If the phylogenetic tree was not given then the phylogeny of the sequences in the RNA alignment is calculated using the neighbour joining approach. Then the branch length are estimated by a maximum likelihood approach. The latter is also done if the tree was submitted without distances between the nodes. The output consists of a PNG image and the alternative download links to a PS and PDF file and the newick format.

PETfold output

The plain text output shows the command-line output of the program. The first line shows the consensus RNA secondary structure predicted by Pfold. In the second line you see the evolutionary constrained structure consisting of constrained base pairs as pairs of "(" and ")" and constrained unpaired bases as "x". The parameters Reliability threshold for evolutionary constrained base pairs and Reliability threshold for evolutionary constrained unpaired bases determine the constrained structure which is not influenced by the thermodynamic model. However, the default setting of the Reliability threshold for evolutionary constrained unpaired bases as 1 allows no constrained unpaired bases. The third line shows the consensus RNA secondary structure predicted by PETfold. In the fourth the score of the maximal expected accuracy structure is listed (score is normalized by the alignment length). In addition, the PETfold output in verbose mode can be downloaded.

Predicted consensus RNA secondary structure

The alignment is shown as PNG figure annotated by sequence position, the consensus structure, colored base pairs using the Vienna RNA conservation coloring schema, the reliability of each nucleotide to be base paired (reliab_paired) and the sequence conservation. Furthermore, the consensus secondary structure is shown as PNG figure with colored base pairs using again the Vienna RNA conservation coloring schema. The compensatory mutation supporting the consensus structure are marked by color. The color scheme is the same employed by RNAalifold and alidot: Red marks pairs with no sequence variation; ochre, green, turquoise, blue, and violet mark pairs with 2,3,4,5,6 different tpyes of pairs, respectively. Paired reliability and conservation are drawn as barplots with values from 0 to 1. Both figures are created by adapted versions of the Vienna RNA Utilities colorrna.pl and coloraln.pl. Alternatively the figures can be downloaded as PS and PDF file.

Dotplot of PETfold reliabilities of base pairs and single stranded positions

The PETfold reliabilities of all possible base pairs are illustrated as rectangles in the upper triangle of the PNG figure. The base pairs which are part of the predicted consensus RNA secondary structure are illustrated as rectangles in the lower triangle. On the axes the single stranded reliabilities of the PETfold model are shown as well as rectangles. The size of the rectangles are proportional to the reliabilities. Alternatively the figure can be downloaded as PS and PDF file or the reliabilities as plain text file.

Contact

Introduction

Input

Multiple Sequence Alignment

Example

Optional constraints

Phylogenetic tree

Example

RNA secondary structure

Example

Optional parameters

Output

Phylogenetic tree

PETfold output

Predicted consensus RNA secondary structure

Dotplot of PETfold reliabilities of base pairs and single stranded positions