Protein Sequence Logos using Relative Entropy
The sequence part applied in the
RNA structure logos
is applied to protein sequence logos. The standard sequence logo by Schneider
and Stephens have been extended to cope with any prior amino acid distribution
as well as allowing for gaps in the (multiple) alignments of protein
sequences. The total height of the sequence information part is computed as
the relative entropy between the observed fractions of a given symbol and the
respective a priori probabilities, with the constraint that the a priori
``probability'' of the gap always is one. The a priori probabilities for the
amino acids sum to one. Note that this might lead to negative ``information''
if sufficiently many gaps are present at a given position. The height of each
symbol can be displayed in two ways: ``type 1 logo'' where the height is
proportional to its frequency, or ``type 2 logo'' where the height is in
proportion to the fraction of the observed frequency and the expected (a
priori) frequency. In both cases, when a symbol appears less than expected the
symbol will be displayed up-side-down. You can get the
script here or you can ``
click in'' your alignment below. For usage please quote
|
T. D. Schneider and R. M. Stephens. Sequence logos: a new way to
display consensus sequences. Nucleic Acids Research, Vol. 18, No 20,
p. 6097-6100. (Also check out
Tom Schneiders
page.)
|
You can also ``click in'' your multiple protein alignment below. The final
logo in postscript can then be downloaded. You can see an example of the
data format here . You are welcome to send your comments or bug reports
to
webmaster@rth.dk. The a
priori probabilities for amino acids must be greater or equal to equal zero.
One line of probabilities result in the same background distribution to be
used throughout the alignment. Alternatively enter as many lines as there are
positions in the alignment, corresponding to a position wise background
distribution of nucleotides.