Align2Symvec version 1.1.

by
 Jan Gorodkin
 Center for Biological Sequence Analysis
 The Technical University of Denmark
 B206, DK-2800 Lyngby
 Denmark
 gorodkin@cbs.dtu.dk



INTRODUCTION:
-------------
This nawk script generates a "symvec" file which can be used to generate
a sequence-structure logo using Tom Schneiders program
(see http://www-lmmb.ncifcrf.gov/~toms/).
Verify the nawk path by typing "which nawk" on command line.


For details see the page http://www.cbs.dtu.dk/gorodkin/appl/slogo
and paper (1) listed below.



ARGUMENTS:
----------
First an example on how to execute the align2symvec.awk script:

./align2symvec.awk Itype=2 startpos=1 sl=1 ntdistfile=ntdist1 pAU=1 pCG=1 pGU=0.8 TorUs=N alignfile > symvecfile

for which nthe arguments are:

Itype (=1 or =2):       Use type 1 logo or type 2 logo, in computing the height
                        of the symbols.

sl (=0 or =1):          Make sequence logo (=0) or structure logo (=1).

startpos (integer):     Label the startpos of the alignment (eg. -10).

ntdistfile:             The file containing the distribution of nucleotides.
                        This file should either consist of one line or
                        as many lines as there are positions in the alignment.
                        Each line should consist of four numbers (representing
                        prob(A) prob(C) prob(G) prob(U)) with sum equal to
                        one, and thereby provide the a priori distribution
                        of nucleotides for that position (the latter case).
                        For the one line case one background distribution for
                        the entire alignment.
                        (See the files ntdist1 and ntdist2 as examples.)

pAU:                    The weight (or strength given to AU basepairs).
pCG:                    The weight (or strength given to CG basepairs).
pGU:                    The weight (or strength given to GU basepairs).

TorUs:                  If TorUs=Y display T's rather than U's. If TorUs=N
                        display U's. Default is N.

alignfile:              The file containing the data (the alignment). Note that
                        the format requires the first line to assign secondary
                        structure. (In case you only are interested in a
                        sequence logo just let this line consist of dots only.)
                        (Data is from Tuerk et. al. 1992.)



This script generates a file "align.info" which contain a list of information
content for each position and the total information of the alignment.


Once you have generated a logo.ps file using Tom Schneiders program you
may insert the structure field assignment using the script logo2structlogo.awk.
Example:
logo2structlogo.awk alignfile logo.ps > alignfile.logo.ps





CITATION:
---------
For usage please quote the papers:

(1)  J. Gorodkin, L. J. Heyer, S. Brunak and G. D. Stormo.
     Displaying the information contents of structural RNA alignments:
     the structure logos.
     Comput. Appl. Biosci., Vol. 13, No 6 pp 583-586, 1997. 

(2)  T. D. Schneider and R. M. Stephens.
     Sequence logos: a new way to display consensus sequences.
     Nucleic Acids Research, Vol. 18, No 20, pp 6097-6100, 1990.








