RNAcop

About

RNA Context Optimization by Probability (RNAcop) is a tool for optimizing lengths of flanking regions up- and downstream of a constrained structure with respect to the probability of folding into the structure. The motif is defined by constraints in form of a dot bracket string. Starting from the first constraint in the 5′ end and the last constraint in the 3′ end, RNAcop calculates the sequence with the highest probability for observing the constrained structure by progressively adding nucleotides to both flanking regions. Flanking regions are extended from the input sequence. The constrained structure assumed to be part of this input structure. Minimum and maximum lengths for each, the flanking region in 5′ direction and 3′ direction, can be specified.

The probability for oberserving the structure motif is calculated by constrained folding implemented in the ViennaRNA package. More precisely, the partition function over all secondary structures satisfying the constraints is compared to the partition function over all possible secondary structures without constraints. The optimal probability is obtained using dynamic programming, i.e. two function calls are evaluated, one for constrained folding and one for folding without constraints. RNAcop prints the optimal sequence, suggested alternative suggestions, as well as the probability for the structure to be observed for all pair-wise combinations of flanking region extensions. The probability for observing a structure S given subsequence x is determined using the Boltzmann distribution of two ensembles of structures:

P(S|x) = e^{-( ΔΔG / (R*T) )} = Z_constrained / Z_{no constraint}

where ΔΔG = ΔG_constrained - ΔG_{no constraint}
is the difference between the free energies associated with the two partition functions Z_constrained and Z_{no constraint}, respecitively, and ΔG = - RT ln Z.

The RNAcop webserver displays all pair-wise combinations of flanking regions as log₁₀ probability landscape plots. In addition, flanking regions are suggested based on given minimum and maximum sizes for flanking regions and a heuristic which selects suitable lengths of flanking regions by comparing choices to the maximum probability inside the user-specified window of allowed flanking region sizes. The heuristic selects flanking regions that differ less than a specified log₁₀ fold-change to the maximum probability in the user-specified window area. Here, choices are preferred that allow flanking regions to differ in size, but which still have a log₁₀ fold-change lower than the specified threshold. In other words, choices of flanking regions that allow some tolerance both in 5′- and 3′ direction. For this purpose, large square areas that satisfy the log₁₀ fold-change are identified.