Contact

Please contact Stefan Seemann (seemann@rth.dk) for any technical issues or general questions about the software.

Introduction

PETfold performs Probabilistic Evolutionary and Thermodynamic folding of a multiple alignment of RNA sequences.

PETfold is a computational method that predicts the conserved secondary structure of RNA sequences with maximum expected accuracy (MEA). PETfold is the first tool which integrates the duality of energy-based and evolution-based approaches for folding of multiple aligned RNA sequences into a single optimization problem.

PETfold is an extended version of Pfold which identifies basepairs that are most conserved and energetically most favorable using a maximum expected accuracy scoring. The probabilities of single stranded and base paired positions in the thermodynamic structure ensemble are calculated by the Vienna RNA package.

The algorithm was tested on a set of 46 well curated Rfam families and its performance compared to that of Pfold and RNAalifold. On average, PETfold performs best when comparing the predicted structures to that of Rfam. We obtained the following averages of Matthews correlation coefficient: PETfold: 0.85, Pfold: 0.71, RNAalifold: 0.79.

The PETfold algorithm was first described together with its performance evaluation in:
Seemann SE, Gorodkin J, Backofen, R.
Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments.
Nucleic Acids Research, 36(20):6355-6362, 2008

Input

Multiple Sequence Alignment

PETfold reads an RNA sequence alignment in FASTA format.

Example
>gca_bovine
AGCCCUGUGGUGAAUUUACACGUUGAAUUGCAAAUUCAGAGAAGCAGCUUCAAU-UCUGCCGGGGCUU
>gca_chicken
GACUCUGUAGUGAAGU-UCAUAAUGAGUUGCAAACUCGUUGAUGUACACUAA-AGUGUGCCGGGGUCU
>gca_mouse
GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAAU-CUCUACUAAGACUU
>gca_rat
AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGCAAAUUCGAAGGUGUAGAGAAUCU-UCUACUAAGGCUU

The user can type an alignment in the text area, she can upload a FASTA-file, she can select a MAF block, or an Rfam family from a drop down list.

Human (Hg18) 17way MULTIZ alignment

The 17way MULTIZ multiple alignments of the human genome (hg18, Mar. 2006) are accessable by chromosome, start position, end position, and strand. After pressing the Update bottom, either a drop down list appears below with all alignment regions (MAF-blocks) in human that are covered by the query, or the following text area shows the alignment if only one MAF-block is covered by the query. If the query covers several MAF-blocks the user has to select one MAF-block from the drop down list and confirm it by pressing the Update bottom again. Now the text area with the MAF-block alignment appears. The settings are reset by choosing "---" from the chromosome drop down list, or typing/selecting another multiple alignment.

Rfam 10.0 seed alignment

The seed alignment of one Rfam 10.0 family can be chosen from a drop down list. By default, the entire seed alignment is taken (the radio button "seed" is set). Alternatively, the user can select the radio button "no paralogs" which returns seed alignments without paralogs by choosing only the sequence that has a distance closest to the mean distance in the phylogenetic tree. After pressing the Update bottom, the following text area shows the alignment. The settings are reset by choosing "---" from the Rfam family drop down list, or typing/selecting another multiple alignment.

Optional constraints

Phylogenetic tree

By default, a phylogenetic tree is calculated from pairwise distances using the neighbour joining (NJ) algorithm. However, the user can specify his own tree, which contains the same species that are also contained in the sequence alignment. The tree must be in Newick format, which is described here. The node names must be the same as the sequence names in the alignment file. If the branch lengths are not given then they are estimated by maximum likelihood.

Example
Tree with 4 species and branch lengths:
((gca_rat:0.61783,gca_mouse:0.070947):0.29012,gca_chicken:0.372963,gca_bovine:0.159582):0.001
Tree with 4 species and without branch lengths:
((gca_rat,gca_mouse),gca_chicken,gca_bovine)

RNA secondary structure

If the Pseudo-knot free secondary structures is already known then its PETfold score and reliability can be calculated. The structure must be in dot bracket notation, which is described here.

Example
(((((((..(((......))).(((((((...)))))))....(((((.......)))))))))))).

Optional parameters

The PETfold algorithm is optimized by several parameters. They influence the scoring scheme and the impact of the evolutionary model on the structure prediction. The advanced user has the possibility to change them (values between 0 and 1):