compseqanal

The Alignment

Comparative sequence analysis for the purpose of RNA structure determination usually begins with an alignment of related sequences. In our alignment procedure, closest relatives are aligned first on the basis of primary structure similarity; each group of aligned sequences is then treated collectively and aligned against other groups. Sets of conserved nucleotides are then identified and used for aligning in the more variable regions. Finally, where little or no primary structure similarity exists, common secondary structural elements are used as additional markers.

In our derivation of secondary structure, we distinguish clearly between base pairs that are supported by covariances, and those that are not contradicted. (A covariance is the observation that a base pair in one organism is different by both bases when compared to the equivalent base pair in another organism.) If the two different pairs are of Watson-Crick type (G-C, A-U), we observe a compensating base change (CBC). Covariances and CBCs support the existence of a base pair, because during evolution, random single mutations that introduce an unstable pairing would not generally have been compensated for by a further mutation that restored the stability, unless it was required. Thus, such observation is positive evidence, and the more CBCs, the stronger the evidence. Negative evidence is a mismatch, which we define here as neither Watson-Crick pairs nor G-U pairs. Please note that sequence conservation provides neither positive nor negative evidence.

For each base pair we estimate positive and negative evidence by counting the number of CBCs and mismatches. The set of sequences that align unambiguously at a given alignment position are first identified; the most conserved base pair in this set is then found and the remaining pairs added as CBCs where they covary. Our guideline is to consider base pairs supported, if there is at least twice as much positive evidence as negative. As a general rule, when there is less, we prefer not to include a base pair. However, when a base pair is supported in one primary kingdom (also termed 'domain') and disproven in others, we include it as specific for that group.

Secondary Structures

The secondary structure models are derived directly from the alignment. (Note, however, that alignments are also extremely useful to prove/disprove tertiary foldings or even interactions between more than one RNA molecule.) Supported base pairs are juxtapositioned and connected with a line; bases of unsupported pairings at helical ends are placed adjacently with no line between them. Finally, when there is more negative than positive evidence, the base symbols are spaced apart with no line between them. In addition to the secondary structure diagrams, Watson-Crick and G-U pairs of supported helices in all SRP-RNAs are shown in reverse print in the alignment.

This text is modified from: Larsen N. and Zwieb C. SRP-RNA Sequence Alignment and Secondary Structure Nucleic Acids Research, Vol. 19, No. 2, 209-215 (1990)