PETcofold: Probabilistic Evolutionary and Thermodynamic cofolding algorithm


Outline

Introduction

PETcofold is a computational method for determining the joint secondary structure of two RNA alignments including RNA-RNA interactions. PETcofold is an extension of PETfold (see [1]) which is the first tool integrating the duality of energy-based and evolution-based approaches for folding of multiple aligned RNA sequences into a single optimization problem.

PETcofold, like PETfold, applies Pfold (see [2]) which identifies basepairs that are most conserved and energetically most favorable using a maximum expected accuracy scoring. The free energies of single stranded and base paired positions are taken from RNAfold (see [3]). Inter-molecular thermodynamic folding probabilities are calculated by RNAcofold in PETcofold.

The PETcofold pipeline consists of two steps: (1) intra-molecular folding by PETfold of both alignments and selection of a set of highly reliable base pairs (partial structure) that only decreases the probability of the ensemble of the partial structure in some pre-defined range; (2) inter-molecular folding by adapted PETfold of concatenated alignments using constraints from step 1. In the end, partial structures and constrained inter-molecular structures are combined to the RNA-RNA cofolded structure including pseudoknots.

Download

Installation

PETcofold needs Pfold (see [2]) and the Vienna RNA Package (see [3]) installed on your computer. An adapted Pfold version is delivered with PETcofold (Thanks to Bjarne Knudsen). It is located in './programs/pfold/bin/'. Make sure the Vienna RNA Package is correctly installed.

The binary folder of the Vienna RNA Package (location of RNAfold and RNAcofold) has to be made public in an environmental variable called 'RNAfold_bin' and the binary folder of Pfold (location of fasta2col, findphyl, scfg.rate, mltree, scfg.rate, scfg, article.grm, drawplot) in an environmental variable called 'Pfold_bin'.

In a bash the commands would be:

$ export RNAfold_bin='/opt/ViennaRNA/bin/'
$ export Pfold_bin='./programs/pfold/bin/'

Furthermore, some Pfold scripts need GNU awk (gawk) installed in '/bin/'.

Now PETcofold is ready!

Usage

PETcofold reads two RNA sequence alignments in fasta format (-fasta). PETcofold wants at least three common identifiers (text before the first occurrence of a dot) in both alignments to build a phylogenetic tree, otherwise it stops.

The standard output shows the partial structure and its probabilities for the intra-molecular folding of both alignments. Then, it shows the PETfold RNA secondary structures for both alignments and the PETcofold RNA secondary structure of the concatenated alignments. Here, curly brackets stay for constrained base pairs (as part of the partial structures), round brackets stay for intra-molecular base pairs predicted in the second step and squared brackets for RNA-RNA interactions. In the end, the PETcofold score, reliability (score divided by the alignments length) and delta Reliability binding (difference of cofolding reliabilty and arithmetric mean of intra-molecular structure reliabilities of both alignments) of the RNA cofolding structure are listed. An alternative output is provided in fasta format (--war).

By default, a phylogenetic tree is calculated from pairwise distances using the neighbour joining (NJ) algorithm. However, the user can specify his own tree (--settree). It is also possible to write the probabilities of the PET model in a pp-file (--ppfile) that can be drawn as dotplot by 'drawplot'.

PETcofold has several parameters to influence the model. First of all, columns with more/equal than 25% gaps are ignored. This threshold can be changed by the parameter (-setgap). Further parameters set the maximal intra-molecular base pair reliability to be free for RNA-RNA interaction (-setpetcon) and the minimal partial structure probability (-partprob). You can also decide if lonely base pairs should be forbidden in the thermodynamic folding (-noLP) and if constraint stems should be extended by reliable inner and outer base pairs (-extstem).

From PETfold the following parameters are adapted: --setevocon_bp, --setevocon_ss,--setalpha, --setbeta. The remaining parameters have been used to evaluate PETcofold, but they have not been shown to increase or change the performance of PETcofold. All available paramters are listed when the program is called without the mandatory parameters.

Here the usage:

PETcofold v3.1
==============
by Stefan E Seemann (seemann@rth.dk)

Usage: PETcofold.pl -fasta -fasta [ options ] [ parameter settings ]

Mandatory Input:
-fasta <file1> -fasta <file2>... 2 alignments with same organisms in fasta format
Options:
-settree <tree>... calculates score for given tree in Newick tree format
-war... fasta format output
-intermol... structure output of intermolecular base pairs
-ppfile <file>... writes PET probabilities in pp-file
... which can be drawn (as ps) using 'programs/pfold/bin/drawdot'
Parameter settings:
-setgap <nr>... max. percent of gaps in alignment column (default:0.25)
-setpetcon <reliability>... max. intra-mol. base pair rel. to be free for RNA-RNA interaction (default: 0.9)
-setpartprob <probability>... minimal partial structure probability (default: 0.1)
-noLP... RNA(co)fold option: disallows pairs that can only occur isolated
-extstem... constraint stems get extended by inner and outer base pairs
PETfold specific parameters:
-setevocon_bp <reliability>... rel.threshold for conserved base pairs (default: 0.9)
-setevocon_ss <reliability>... rel.threshold for conserved single stranded pos. (default: 1)
-setalpha <nr>... weighting factor for single stranded probs (default: 0.2)
-setbeta <nr>... weighting factor for thermodynamic overlap (default: 1)
Experimental parameters:
-setbetaduplex <nr>... weighting factor for thermodynamic overlap in RNA duplex folding - step 2 (default: 1)
... if the average reliability of the extended stem is larger than 'setpetcon'
-mrnanotconstr... mRNA (second sequence by convention) is not constrained in RNA duplex folding - step 2
-partprob_and... thermodynamic AND evolutionary partial structure probability have to be larger than 'setpartprob'
... by default OR

Example

$ cd /PETcofold

$ bin/PETcofold_3_1_2.pl -fasta example/example1.fasta -fasta example/example2.fasta

PETfold on alignment 1:
petcon = 0.9; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.998503333124646
Partial structure 1: .<<............--................>>..
PETfold on alignment 2:
petcon = 0.9; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.980380211147457
Partial structure 2: ..<<<<<<..............--->>>>>>.
Input Files: example/example1.fasta example/example2.fasta
Common Identifiers: gca_bovine, gca_chicken, gca_mouse, gca_rat,
PETfold RNA sec.struct.:((((((....((...--.))..........)))))). ..((((((..............---)))))).
PETcofold RNA sec.struct.:.{{......(((...--.))).[[[[[[[[...}}..&..{{{{{{.].]]]]]]]....---}}}}}}.
Score_{model,structure}(tree,alignment) = 36.322593231148
Reliability_{model,structure}(tree,alignment) = 0.567540519236688
delta Reliability binding = 0.0379148081205763

$ bin/PETcofold_3_1_2.pl -fasta example/example1.fasta -fasta example/example2.fasta -war

>gca_bovine
AGCCCUGUGGUGAAUUUACACGUUGAAUUGGGGGCUU&GAGGCCGGUCAAAUUCAGAUCAAU-CCGGCCA
>gca_chicken
GACUCUGUAGUGAAGU-UCAUAAUGAGUUGGGGGUCU&GAGGCCCACCAAACUCGUUUAA-AGUGGGCCA
>gca_mouse
GGUCUUAAGGUGAUA-UUCAUGUCGAAUUGGAGACUU&GGCGUUGGGCAAACUCGAAAAAU-CCCAACGU
>gca_rat
AGCCUUAAGGUGAUU-AUCAUGUCGAAUUGAGGGCUU&GGGGUUGGGCAAACUCGAAAAUCUACCAACUA
>structure
.{{......(((......))).[[[[[[[[...}}..&..{{{{{{.].]]]]]]].......}}}}}}.

$ bin/PETcofold_3_1_2.pl -fasta example/example1.fasta -fasta example/example2.fasta -setpetcon 0.8 -extstem -noLP

PETfold on alignment 1:
petcon = 0.8; thermodynamic partial structure probability = 0.983905604336076; evolutionary partial structure probability = 0.440970109828062
Partial structure 1: <<<<<<.........--.............>>>>>>.
PETfold on alignment 2:
petcon = 0.8; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.980380211147457
Partial structure 2: ..<<<<<<..............--->>>>>>.
Input Files: example/example1.fasta example/example2.fasta
Common Identifiers: gca_bovine, gca_chicken, gca_mouse, gca_rat,
PETfold RNA sec.struct.:((((((....((...--.))..........)))))). ..((((((..............---)))))).
PETcofold RNA sec.struct.:{{{{{{...(((...--.))).[[[[[[[[}}}}}}.&..{{{{{{.].]]]]]]]....---}}}}}}.
Score_{model,structure}(tree,alignment) = 37.7899705234401
Reliability_{model,structure}(tree,alignment) = 0.590468289428752
delta Reliability binding = 0.0611194109769494

$ bin/PETcofold_3_1_2.pl -fasta example/example1.fasta -fasta example/example2.fasta -settree '(gca_bovine:0.9,gca_chicken:0.01,(gca_mouse:0.01,gca_rat:0.1):0.9)'

PETfold on alignment 1:
petcon = 0.9; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.998503333124646
Partial structure 1: .<<............--................>>..
PETfold on alignment 2:
petcon = 0.9; thermodynamic partial structure probability = 1; evolutionary partial structure probability = 0.980380211147457
Partial structure 2: ..<<<<<<..............--->>>>>>.
Input Files: example/example1.fasta example/example2.fasta
Common Identifiers: gca_bovine, gca_chicken, gca_mouse, gca_rat,
PETfold RNA sec.struct.: ((((((...(((...--.))).........)))))). ..((((((..............---)))))).
PETcofold RNA sec.struct.: .{{......(((...--.))).[[[[[[[[...}}..&..{{{{{{.].]]]]]]]....---}}}}}}.
Score_{model,structure}(tree,alignment) = 37.1869426243743
Reliability_{model,structure}(tree,alignment) = 0.581045978505848
delta Reliability binding = 0.0406535135290625

References

[1]Seemann SE, Richter AS, Gorodkin J, Backofen R.
Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments.
Nucleic Acids Research, 36(20):6355-6362, 2008
[2]Knudsen, B. and Hein, J.
Pfold: RNA secondary structure prediction using stochastic context-free grammars.
Nucleic Acids Research, 31 (13), 3423-3428, 2003
[3]I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster
Fast Folding and Comparison of RNA Secondary Structures.
Monatshefte f. Chemie 125:167-188, 1994
 
If you find this software useful for your research, please cite the following work:
[i]Seemann SE, Richter AS, Gorodkin J, Backofen R.
Hierarchical folding of multiple sequence alignments for the prediction of structures and RNA-RNA interactions
Algorithms Mol Biol., 5:22, 2010
[ii]Bioinf in progress

Contact

seemann@rth.dk