PETcofold web server : Help

Contact

Please contact Stefan Seemann (seemann@rth.dk) for any technical issues or general questions about the software.

Introduction

PETcofold is an integrated framework with PETfold to fold and search for RNA-RNA interactions between two multiple alignments of RNA sequences.

PETcofold predicts the joint secondary structure of two RNA alignments including RNA-RNA interactions with maximum expected accuracy (MEA). PETcofold is an extension of PETfold which is the first tool integrating the duality of energy-based and evolution-based approaches into a single optimization problem for folding of aligned RNA sequences.

PETcofold, like PETfold, applies Pfold which identifies base pairs that are most conserved and energetically most favorable using a maximum expected accuracy scoring. The free energies of single stranded and base paired positions are taken from RNAfold. Inter-molecular thermodynamic folding probabilities are calculated by RNAcofold.

The PETcofold pipeline consists of two steps: (1) intra-molecular folding of both alignments by PETfold and selection of a set of highly reliable base pairs (partial structure); (2) inter-molecular folding of the concatenated alignments by an adapted version of PETfold using constraints from step 1. In the end, partial structures from step 1 and constrained inter-molecular structures from step 2 are combined to the RNA-RNA joint structure including pseudoknots.

The PETcofold algorithm is described in:
Seemann SE, Richter AS, Gorodkin J, Backofen R "Hierarchical folding of multiple sequence alignments for the prediction of structures and RNA-RNA interactions", Algorithms Mol Biol., 5:22, 2010.

and the application on known RNA-RNA interactions is presented in:
Seemann SE, Richter AS, Gesell T, Backofen R, Gorodkin J "PETcofold: Predicting conserved interactions and structures of two multiple alignments of RNA sequences", Bioinformatics, 27(2):211-9, 2011.

Input

Multiple Sequence Alignments

PETcofold reads two RNA sequence alignments in FASTA format whereas only identifiers are considered that exist in both alignments. At least three unique identifiers are necessary.

Example

>gca_bovine
AGCCCUGUGGUGAAUUUACACGUUGAAUUGGGGGCUU
>gca_chicken
GACUCUGUAGUGAAGU-UCAUAAUGAGUUGGGGGUCU
>gca_mouse
GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGGAGACUU
>gca_rat
AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGAGGGCUU

The user can type the alignments in the text areas, she can upload FASTA-files, she can select a MAF block, or/and an Rfam family from a drop down list.

Human (Hg18) 17way MULTIZ alignment

The 17way MULTIZ multiple alignments of the human genome (hg18, Mar. 2006) are accessable by chromosome, start position, end position, and strand. After pressing the Update bottom, either a drop down list appears below with all alignment regions (MAF-blocks) in human that are covered by the query, or the following text area shows the alignment if only one MAF-block is covered by the query. In the latter case all MAF-block identifiers are offered as input for the 2nd aligment in another drop down list. If the query covers several MAF-blocks the user has to select one MAF-block from the drop down list and confirm it by pressing the Update bottom again. Now the text area with the MAF-block alignment and the drop down list with all its identifiers should appear. The user can choose multiple identifiers which will appear in the text area of the 2nd alignment. Afterwards the sequences for the 2nd alignment can be added by hand. The settings are reset by choosing "---" from the chromosome drop down list.

Rfam 10.0 seed alignment

The seed alignment of one Rfam 10.0 family can be chosen from a drop down list as 2nd multiple sequence alignment. If the seed contains paralogs then we choose the sequence that has a distance closest to the mean distance in the phylogenetic tree. After pressing the Update bottom, the following text area shows the alignment and all its identifiers are offered as input for the 1st aligment in another drop down list below. The user can choose multiple identifiers which will appear in the text area of the 1st alignment. Afterwards the sequences for the 1st alignment can be added by hand.
If the user already selected a MULTIZ alignment as 1st alignment then the drop down list offers only Rfam seed alignments which contain at least 1 species from the selected MAF-block. The settings are reset by choosing "---" from the Rfam family drop down list.

Optional constraints

Phylogenetic tree

By default, a phylogenetic tree is calculated in step 1 for both alignments and in step 2 for the concatenated alignment from pairwise distances using the neighbour joining (NJ) algorithm. However, the user can specify an own tree that will be used in both steps. The tree must be in Newick format, which is described here. The node names must be the same as the sequence names in the alignment files. If the branch lengths are not given then they are estimated by maximum likelihood.

Example

Tree with 4 species and branch lengths:

((gca_rat:0.61783,gca_mouse:0.070947):0.29012,gca_chicken:0.372963,gca_bovine:0.159582):0.001

Tree with 4 species and without branch lengths:

((gca_rat,gca_mouse),gca_chicken,gca_bovine)

RNA secondary structure

If the Pseudo-knot free consensus secondary structure of the first and/or second RNA alignment is already known then step 1 of the PETcofold algorithm looks only for highly reliable base pairs in these structures. The structure must be in dot bracket notation, which is described here.

Example

((((((....((......))..........)))))).

If the secondary structure of the RNA duplex including the RNA-RNA interaction is already known then its PETcofold score and base pair reliabilities can be calculated too. Step 1 constraints intra-molecular structures marked as '(' and ')' and step 2 inter-molecular structure marked as '[' and ']'. Both structures have to be concatenated by '&'.

Example

.((((.................[[[[[[[[.))))..&..((((((.]].]]]]]].......)))))).

Optional parameters

The PETcofold algorithm is optimized by several parameters. The advanced user has the possibility to change them (values between 0 and 1 for all parameters except extstem which is boolean):

Maximal percentage of gaps in alignment column (default:0.25)
Alignment columns with an higher amount of gaps as allowed by this parameter are ignored during the entire calculation and are marked as '-' in the PETcofold output.
Intra-molecular base paired reliability threshold Delta for accessibility (default: 0.9)
This threshold descides which base paires are not accessible (constrained) for interaction sites in step 2. These base pairs form the partial structure.
Minimal partial structure probability Gamma (default: 0.1)
Gamma determines the value of parameter Delta. The ensemble of structures compatible with the partial structure has to have a probability greater than Gamma. The parameter Delta is increased until this is the case.
Constraint stems get extended by inner and outer base pairs (default: off)
Sometimes highly reliable intra-molecular stems are only partly constrained because inner and/or outer base pairs have a reliability that is slightly lower as the threshold Delta. During step 2 these base pairs are not predicted because of the constraint of the rest of the stem. These incomplete stems can be avoided by setting this parameter extstem.

The following parameters are PETfold specific. They influence the scoring scheme and the impact of the evolutionary model on the structure prediction.

Reliability threshold for evolutionary constrained base pairs (default: 0.9)
Reliability threshold for evolutionary constrained unpaired bases (default: 1)
Weighting factor Alpha for single stranded probabilities (default: 0.2)
Weighting factor Beta for thermodynamic overlap (default: 1)

Output

The output shows the input, phylogenetic tree, the PETcofold plain text output including the predicted joint RNA secondary structure and score, an image of the covariance and paired reliabilities in the concatenated RNA multiple alignment, and a dotplot of base pair and single stranded reliabilities calculated by PETcofold. See the example section to see an example how the output looks like.

Joint phylogenetic tree

If the phylogenetic tree was not given then the phylogeny of the sequences in the RNA alignment is calculated using the neighbour joining approach. Then the branch length are estimated by a maximum likelihood approach. The latter is also done if the tree was submitted without distances between the nodes. The output presents the joint phylogenetic tree of the concatenated RNA multiple alignments. The output consists of a PNG image and the alternative download links to a PS and PDF file and the newick format.

PETcofold output

The plain text output shows the command-line output in step 1 and step 2 of the program. The output of step 1 shows the partial structure with its probabilities in the thermodynamic and evolutionary model for both RNA alignments for different parameters Delta. Remember that step 1 is repeated for increased values of Delta until the partial structure probability is greater Gamma either in the thermodynamic or the evolutionary model or both. The output of step 2 shows in the first line the intra-molecular consensus RNA secondary structures of both RNA alignments predicted by PETfold. The second line shows the joint RNA secondary structure of the concatenated alignments predicted by PETcofold. The constrained base pairs from step 1 (partial structures) are indicated by pairs of “{” and “}”, intra-molecular base pairs predicted in the second step by pairs of “(” and “)”, and interaction sites (intra-molecular base pairs) by pairs of “[” and “]”. The third line again shows the interaction sites. The last line shows the score of the joint RNA secondary structure with maximal expected accuracy (score is normalized by the alignment length).

Predicted consensus joint RNA secondary structure

The concatenated alignments are shown as PNG figure annotated by sequence position, the consensus structure, colored base pairs using the Vienna RNA conservation coloring schema, the reliability of each nucleotide to be base paired (reliab_paired) and the sequence conservation. The compensatory mutation supporting the consensus structure are marked by color. The color scheme is the same employed by RNAalifold and alidot: Red marks pairs with no sequence variation; ochre, green, turquoise, blue, and violet mark pairs with 2,3,4,5,6 different tpyes of pairs, respectively. Paired reliability and conservation are drawn as barplots with values from 0 to 1. The figure is created by an adapted version of the Vienna RNA Utility coloraln.pl. In addition, the consensus secondary structure is shown as PNG figure whereas red lines are inter-molecular bindings and blue arcs are intra-molecular bindings. The sequence data of both alignments is presented as sequence logo whereas the size of nucleotides is proportional to their relative occurrence in the alignment column. This figure was generated with RILogo. Alternatively the figures can be downloaded as SVG, PS or PDF file.

Dotplot of PETcofold reliabilities of base pairs and single stranded positions

The PETcofold reliabilities of all possible base pairs are illustrated as rectangles in the upper triangle of the PNG figure. The base pairs which are part of the predicted joint consensus RNA secondary structure are illustrated as rectangles in the lower triangle. The two bold lines seperate intra-molecular from inter-molecular base pairs. The latter are shown in the upper right as well as lower left rectangle. On the axes the single stranded reliabilities calculated by the PETcofold scoring scheme are shown as well as rectangles. The size of the rectangles are proportional to the reliabilities. Alternatively the figure can be downloaded as PS or PDF file or the reliabilities as plain text file.