Foldalign: RNA Structure and Sequence Alignment

Contents

Web-server specific

  1. Data format Information about maximum sequence lengths, and nucleotide types.
  2. Parameters Description of the web server parameters
  3. Web-server output examples
    1. Scan
    2. Local
    3. Global
    4. Long scan. The output from the scan of two long sequences containing a long motif. Both the motifs and the sequences are constructed. Please see Sundfeld et al. Bioinformatics, 2015.

    Web-server and Command-line

  4. P-value Description of the P-value and its parameters.
  5. NS score Description of the No single strand Substitution score.
  6. FOLDALIGN output file format Description of the FOLDALIGN output file format.
    1. Header section
    2. Sequence section
    3. Local alignment scores
  7. Score matrix format file Explains about the FOLDALIGN score matrix file format.
    1. Score matrix Examples


    1. Data format
    2. Data should be in fasta format. If more than two sequences are submitted, only the first two sequences will be processed.

      The only nucleotides allowed are A, C, G, T, U, N, -.
      The N and - are treated as gaps. In the output T is printed as U.

      Fasta format for scanning

      Search for multiple local hits. Max sequence length 10,000 and Max motif length (lambda) 1,000.
      >AE000516/4118797-4118870
      GGCUCGAUCCGGCCGGGCCGGCUGCGGCGCUGGUGGAAUUCGAGCGCUCCUGUCGAUGGA
      CGUGACGGUCGGACCUGCGGUUUGGCUAGUCAACGGUCCGGUGCGAUAGGCUGUCGUGGC
      UUCAAGCGGGGUGUGGCGCAGCUUGGUAGCGCGCUUCGUUCGGGACGAAGAGGCCGUGGG
      UUCAAAUCCCGCCACCCCGACCGAGAGAUCGCUGACGACAGCCUUACCCGGCGCAGCGUG
      GUAGCUUGCUGCAGUCUGCUCGGGCGGCAGCGCCACCCUGACGGUGCUGGUUGACCAUGC
      CGGACAGCACGUCAACGCACAGGCAUUUCCAACGGAAGUUGUAGGUUACCGGCCGCCCUA
      AAACACGGUGCACUUUUCGUUAAAGGUUGUGGGUGUGGAUCCAACGAAAUUCGUUGCCCC
      GGCGUGGGCAGCGCCGUGUCCACAGGGGGACCCGCCGCGCAUUACGCCUAUGGGCCCACC
      CCCGUACCGCGGGAGUUGGC
      >AE000051/8942-9018
      UAAAUUUAAGUAUUAAGUAUUAAUUAAACGGGAAAGGAAUAGGACGAUUUACCAGGGGUU
      AAACCUAAUCGGAACCGACCGGUACUAAAGUAGUCUUUUUGUGAAUUCCUUCUUUUAUAG
      CUAUAACUGCUGCCAUUGCCGUUGGCAUUAUUAGCGUUAGCAGCUUCGCUGAUGUCUAAC
      GCGGUAUUAAUUAACGUACUAAAGGCGUUUACAAUCAUAAUUACUCCCGCACUGAAGCUU
      CAAAAUGAUAAGCCUCCAACUACUUGUUUAGCUUCUGAUUUUUUUAAUCUCUGCAUUAUU
      GUGUCUCCAUUUAAAUGAAUUACAUAACUAAAGUAAAAUGCCAAAGUUUUAUAAUUAACU
      UAACUGUCAUCAUAGCUCAAUAGGACAGAGUAUCAGCUUGCGGAGCUGAGGGUUACAGGU
      UCGAUUCCUGUUGGUGACGCCAUAAUACUUUCUAACCUACCAGUGUUACCCUGGUAGGUU
      UUUUAUUUGCUCCGUUGGCU
      

      Fasta format for local and global alignment

      Local or global alignment of the sequences. Max sequence length is 1,000. The motif can be as long as the sequences. For global alignment the delta parameter has to be larger than the length difference between the sequences
      >AE000516/4118797-4118870
      CGGGGUGUGGCGCAGCUUGGUAGCGCGCUUCGUUCGGGACGAAGAGGCCGUGGGUUCAAA
      UCCCGCCACCCCGA
      >AE000051/8942-9018
      GUCAUCAUAGCUCAAUAGGACAGAGUAUCAGCUUGCGGAGCUGAGGGUUACAGGUUCGAU
      UCCUGUUGGUGACGCCA
      


    3. Parameters
    4. Type of comparison. Choices: Scan, Local, or Global
      These are the three different server types.
      Scan makes a local foldalignment between the two input sequences and reports a ranked list of local foldalignment hits. The input sequences can be long, up to 1,000 nucleotides, but the length of the foldalignments is limited to Maximum motif length lambda nucleotides. A scan can take from a few minutes to a few days depending on the length of the sequences, the GC-content of the sequences, and the parameter settings- An example of the output can be seen here.
      Local makes a local foldalignment. It can be as long as the input sequences. Only one foldalignment is reported. The input sequences are limited to 1,000 nucleotides. For short sequences a local foldalignment takes seconds or minutes. For long sequences it can take several hours. An example of the output can be seen here.
      Global makes a global foldalignment. The sequences are foldaligned from end to end. The length of the input sequences is limited to 1,000 nucleotides. The maximum length difference delta parameter must be larger than or equal to the length difference between the sequences. For short sequences a global foldalignment takes seconds or minutes. For long sequences it can take several hours. An example of the output can be seen here.

      Email
      If an email address is specified, an email will be sent to the address when the alignment is done. The email will contain a link to the results page. The e-mail address will not be used for anything else. Note: If your sequences are not in FASTA format you will not recieve an email about the error. This is due to spammers.

      Comment/ID
      This field can be used for marking and/or tracking different submissions.

      Maximum length difference (delta). Ranges: Scan 1-15. Local and global 1-25.
      A unconstrained version of FOLDALIGN takes a very long time to run. It also demands a lot of memory. To lower the time and memory requirements of the algorithm delta limits the length difference between subsequences being compared. This for example means that the alignment of a ten nucleotide subsequence and a 60 nucleotide subsequence will not be considered. For structures that are similar this is not a problem, but if there are long inserts in one of the sequences, then the correct structure might not be found.

      Gap opening cost. Range: -1000 -> 0
      This is the cost of opening a new gap. We have found that a gap opening of around -50 gives good results when scanning. For local and global foldalignments the gap opening cost is dependend on the RNA type. If it is not known what kind of RNA the sequences contain, then try several values. The range we have found useful, is -10 -> -100.

      Gap elongation cost. Range: -1000 -> 0
      This is the cost of elongating an already started gap. We usually fix this value at half the gap opening cost.

      Maximum motif length (lambda). Range: 1 - 1,000. Only effects scan type comparisons.
      This is the maximum length of a foldalignment not counting gaps. The use of this parameter makes it in theory possible to scan sequences of any biologically relevant length on an ordinary desktop computer. But to limit the resources needed by the server the sequence length has been limited. The time needed to run FOLDALIGN is greatly affected by the value of lambda. Unfortunately lambda and Lambda are two completely different things. They should not be mistaken for each other.

      Maximum number of structures. Range: 1 - 10. Only effects scan type comparisons.
      This is the maximum number of structures reported by the server. For each server FOLDALIGN has to realign and backtrack the foldalignment. The position of these alignments are drawn as bars on the Z-score plot.


    5. Web-server output examples
    6. Examples of the output from the three different types of comparisons. The sequences used are those from the Data format section. The parameters used are the default parameters. The archive file available for download on the result page contains among other files a file named index.html which is a copy of the result page.
      1. Scan
      2. Local
      3. Global

    7. P-value
    8. The P-value is an estimate of the probability that a foldalignment with a given score would be found chance.
      The P-value depends on the Lambda and K which are the parameters of the extreme value distribution. Please note that this Lambda is not the same as the FOLDALIGN lambda. Lambda and K are reported on the web page. Do not trust the P-value to much.


    9. NS score
    10. The No single strand Substitution score is the standard FOLDALIGN score minus the sequence similarity score for the single stranded regions of the structure.

    11. FOLDALIGN output file format
    12. Contents

      1. Header section
      2. Sequence section
      3. Local alignment scores

      FOLDALIGN produces output in col format. The format aims at keeping information about data, parameters, and results in one file. At the same time the data has to be in a format which is easy to work with.

      The default FOLDALIGN output has three sections. The first part, the header, holds some general information and the structure of the best foldalignment found. The second section holds general information and information about the first sequence. The third section is similar to the second, but contains information about the second sequence.

      1. The header section
      2. A typical standard header looks something like this:
        ; FOLDALIGN           2.5.0
        ; REFERENCE           JH. Havgaard, R. Lyngs�, GD. Stormo, J. Gorodkin
        ; REFERENCE           Pairwise local structural alignment of RNA sequences
        ; REFERENCE           with sequence similarity less than 40%
        ; REFERENCE           In press Bioinformatics 2005
        ; ALIGNMENT_ID         Structure 1
        ; ALIGNING            V00158/694-623 against AC069454/82192-82263
        ; ALIGN               V00158/694-62
        ; ALIGN               AC069454/8219
        ; ALIGN               Score: 561
        ; ALIGN               Identity: 39 % ( 28 / 72 )
        ; ALIGN               Begin
        ; ALIGN
        ; ALIGN               V00158/694-62 GCAGAUGUAG CUCAGUGG-U AGAGCGCAAC CUUGCCAAGG
        ; ALIGN               Structure     (((((((..( (((....... .)))).(((( (.......))
        ; ALIGN               AC069454/8219 GGUCCCAUGG UGUAAUGGUU AGCACUCUGG ACUUUGAAUC
        ; ALIGN
        ; ALIGN               V00158/694-62 UUGAUGCCAU GGGUUCGAGU CCCAUUAUCU GC
        ; ALIGN               Structure     ))).....(( (((....... )))))))))) ))
        ; ALIGN               AC069454/8219 CAG-CGAUCC GAGUUCAAAU CUCGGUGGGA CC
        ; ALIGN
        ; ALIGN               End
        ; ==============================================================================
        

        ; FOLDALIGN           2.5.0
        This field indicates which version of FOLDALIGN was used to produce the file.
        ; REFERENCE
        These fields contain citation information.
        ; ALIGNMENT_ID
        Holds a comment either set by the user or the web server. The default value is n.a.
        ; ALIGNING            V00158/694-623 against AC069454/82192-82263
        The name of sequence one against name of sequence two
        ; ALIGN               V00158/694-62
        Name and comment for the first sequence. The comment field is not available in the web server.
        ; ALIGN               AC069454/8219
        Name and comment for the first sequence. The comment field is not available in the web server.
        ; ALIGN               Score: 561
        The score of the best foldalignment.
        ; ALIGN               Identity: 39 % ( 28 / 72)
        The sequence identity of the alignment. There are 28 identical nucleotides out of 72.
        ; ALIGN               Begin
        ; ALIGN
        ; ALIGN               V00158/694-62 GCAGAUGUAG CUCAGUGG-U AGAGCGCAAC CUUGCCAAGG
        ; ALIGN               Structure     (((((((..( (((....... .)))).(((( (.......))
        ; ALIGN               AC069454/8219 GGUCCCAUGG UGUAAUGGUU AGCACUCUGG ACUUUGAAUC
        ; ALIGN
        ; ALIGN               V00158/694-62 UUGAUGCCAU GGGUUCGAGU CCCAUUAUCU GC
        ; ALIGN               Structure     ))).....(( (((....... )))))))))) ))
        ; ALIGN               AC069454/8219 CAG-CGAUCC GAGUUCAAAU CUCGGUGGGA CC
        ; ALIGN
        ; ALIGN               End
        
        This shows the alignment and structure of the best local alignment between the two sequences.
        ; ==============================================================================
        A separation line.

      3. A sequence section
      4. The sequence section has two parts. The information part and the sequence part. A typical sequence section:

        ; TYPE                RNA
        ; COL 1               label
        ; COL 2               residue
        ; COL 3               seqpos
        ; COL 4               alignpos
        ; COL 5               align_bp
        ; COL 6               seqpos_bp
        ; ENTRY               V00158/694-623
        ; ALIGNMENT_ID         Structure 1
        ; ALIGNMENT_LIST      V00158/694-623 AC069454/82192-82263
        ; FOLDALIGN_SCORE     561
        ; GROUP               1
        ; FILENAME            data.fasta
        ; START_POSITION      1
        ; END_POSITION        71
        ; ALIGNMENT_SIZE      2
        ; ALIGNMENT_LENGTH    72
        ; SEQUENCE_LENGTH     72
        ; PARAMETER           max_length=71
        ; PARAMETER           max_diff=15
        ; PARAMETER           min_loop=3
        ; PARAMETER           score_matrix=sm.gap_open_-50.gap_elongation_-25.fmat
        ; PARAMETER           nobranching=<false>
        ; PARAMETER           global=<false>
        ; ------------------------------------------------------------------------------
        N         G          1         1        72        71
        N         C          2         2        71        70
        N         A          3         3        70        69
        .
        .
        G         -          .        19         .         .
        .
        .
        N         U         69        70         3         3
        N         G         70        71         2         2
        N         C         71        72         1         1
        ; ******************************************************************************
        

        ; TYPE                RNA
        Different types of data can be stored in col format. The type field indicates what type this is.
        ; COL 1               label
        ; COL 2               residue
        ; COL 3               seqpos
        ; COL 4               alignpos
        ; COL 5               align_bp
        ; COL 6               seqpos_bp
        
        These fields are the headers of the columns in the sequence part of this section. labal has two values. N for nucleotide or G for gap. residue is a nucleotide or a gap. seqpos is the position in the orginal sequence. alignpos is the postion in the foldalignment. align_bp indicates which position in the foldalignment this position is base-paired with. "." indicates no base-pairing. seqpos_bp is the base-pair position in the orginal sequence coordinates.
        ; ENTRY               V00158/694-623
        The name of the sequence.
        ; ALIGNMENT_ID        Structure 1
        A comment field. Set either by the user or the web server.
        ; ALIGNMENT_LIST      V00158/694-623 AC069454/82192-82263
        The sequences in this alignment.
        ; FOLDALIGN_SCORE     561
        The score of this alignment.
        ; GROUP               1
        Currently not used.
        ; FILENAME            data.fasta
        Name of the sequence input file.
        ; START_POSITION      1
        The start position of the foldalignment.
        ; END_POSITION        71
        The end position of the foldalignment.
        ; ALIGNMENT_SIZE      2
        Currently always two.
        ; ALIGNMENT_LENGTH    72
        The length of the foldalignment.
        ; SEQUENCE_LENGTH     72
        The length of the input sequence.
        ; PARAMETER           max_length=71
        The lambda value.
        ; PARAMETER           max_diff=15
        The delta value.
        ; PARAMETER           min_loop=3
        The minimum number of nucleotides between two nucleotides base-paired to each other.
        ; PARAMETER           score_matrix=sm.gap_open_-50.gap_elongation_-25.fmat
        The score matrix used.
        ; PARAMETER           nobranching=<false>
        If false branching structures are allowed. If true only stem loops are allowed. Default value false. Always false for web server runs.
        ; PARAMETER           global=<false>
        The foldalignment is global if this parameter is true. Default value is false
        ; ------------------------------------------------------------------------------
        This separation line separates the information and the sequence parts of the sequence section.
        N         G          1         1        72        71
        N         C          2         2        71        70
        N         A          3         3        70        69
        .
        .
        G         -          .        19         .         .
        .
        .
        N         U         69        70         3         3
        N         G         70        71         2         2
        N         C         71        72         1         1
        
        Each row is a position in the alignment. The columns are explained at the ; COL lines.
        ; ******************************************************************************
        This indicates the end of the sequence section. This line can occur multiple times in the normal output. A FOLDALIGN output file should end with one of these lines.

      5. Local alignment scores. The -plot_score option
      6. FOLDALIGN can be used to scan for more than the best scoring RNA structures. When the -plot_score option is used, then FOLDALIGN prints coordinates and the score of the best local alignment starting at any pair of positions along the two sequences. The extra output produced has two parts. A header and a series of local score (LS) lines.

        Many of the local alignment header fields have been explained above. A typical local alignment score header look like this:


        ; FOLDALIGN           2.0.1
        ; REFERENCE           JH. Havgaard, R. Lyngs�, GD. Stormo, J. Gorodkin
        ; REFERENCE           Pairwise local structural alignment of RNA sequences
        ; REFERENCE           with sequence similarity less than 40%
        ; REFERENCE           In press Bioinformatics 2005
        ; ALIGNMENT_ID        Structure 1
        ; ALIGNING            V00158/694-623 against AC069454/82192-82263
        ; SEQUENCE_1_COMMENT
        ; SEQUENCE_2_COMMENT
        ; LENGTH_SEQUENCE_1   72
        ; LENGTH_SEQUENCE_2   72
        ; FILENAME            data.fasta
        ; PARAMETER           max_length=72
        ; PARAMETER           max_diff=15
        ; PARAMETER           min_loop=3
        ; PARAMETER           score_matrix=sm.gap_open_-50.gap_elongation_-25.fmat
        ; PARAMETER           nobranching=<false>
        ; PARAMETER           global=<false>
        ; TYPE                Foldalign_local_scores
        ; COL 1               label
        ; COL 2               Alignment_start_position_sequence_1
        ; COL 3               Alignment_end_position_sequence_1
        ; COL 4               Alignment_start_position_sequence_2
        ; COL 5               Alignment_end_position_sequence_2
        ; COL 6               Alignment_score
        ; ------------------------------------------------------------------------------
        .
        .
        .
        LS 1 71 3 69 362
        LS 1 70 2 70 408
        LS 1 71 1 71 561
        

        ; SEQUENCE_1_COMMENT
        ; SEQUENCE_2_COMMENT
        This is the comment of the sequences. This field is empty when the web server is used.
        LS 1 71 3 69 362
        These are the local score lines. There is one for each pair of positions along the two sequences. In the Z-score plot this would position (1, 3). The score 362 is recalculated to the Z-score.

    13. Score matrix file format
    14. A FOLDALIGN score matrix contains several elements. The elements are separated by empty lines. The elements can with one exception be placed in any order. The exception is the Alfabet: element which must be the first element in a score matrix file, if it is present. A score matrix do not have to have all elements present. Anything missing will be given a default value. See below for examples. A score matrix holding the default values is distributed with the FOLDALIGN package. The current default energy parameters are taken from mfold. The default substitution matrices are a variation of the Ribosum matrices, but produced in a fashion more similar to the BLOSUM matrices.

      Comment and empty lines can be placed between the elements. Comment lines starts with a #.

      The score matrix elements are:

      Alfabet: This is the alphabet of the sequences. The first character is also the gap/unknown character. The last field on the line is the size of the alphabet. There is no difference between upper and lower case letters. T's are read as U's.

      Alfabet:
      - A C G U 5
      

      Stacking: This is the cost for stacking one base-pair on to another in a stem. Stackings which promotes stems structures, are positive.

      Stacking:
      A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U    
      A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U    
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A A
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A C
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A G
      0    0    0    9    0    0   21    0    0   24    0   13   13    0   10    0 A U
      .
      .
      

      Hairpin Close: This is the cost for stacking the last unpaired pair of nucleotides in a hairpin loop on to the first base-pair of the closing stem.

      Hairpin Close:
      A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U    
      A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U    
      0    0    0    3    0    0   15    0    0   11    0   -2    5    0    5    0 A A
      0    0    0    5    0    0   15    0    0   15    0    5    3    0    3    0 A C
      0    0    0    3    0    0   14    0    0   13    0    3    6    0    6    0 A G
      0    0    0    3    0    0   18    0    0   21    0    3    5    0    5    0 A U
      .
      .
      

      Internal loop: This is the cost of stacking the first / last unpaired pair of nucleotides in an internal loop on to the last / first base-pair of the surround stems.

      Internal loop:
      A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U    
      A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U    
      0    0    0   -7    0    0    0    0    0    0    0   -7   -7    0   -7    0 A A
      0    0    0   -7    0    0    0    0    0    0    0   -7   -7    0   -7    0 A C
      0    0    0    4    0    0   11    0    0   11    0    4    4    0    4    0 A G
      0    0    0   -7    0    0    0    0    0    0    0   -7   -7    0   -7    0 A U
      .
      .
      

      5' Dangle: The cost of adding a 5' dangle nucleotide to a stem. Only used in multibranched loops.

      5' Dangle:
      A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U 
      A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U 
      0    0    0    3    0    0    2    0    0    5    0    3    3    0    3    0 A
      0    0    0    1    0    0    3    0    0    3    0    1    3    0    3    0 C
      0    0    0    2    0    0    0    0    0    2    0    2    4    0    4    0 G
      0    0    0    2    0    0    0    0    0    1    0    2    2    0    2    0 U
      

      3' Dangle: The cost of adding a 3' dangle nucleotide to a stem. Used in multibranched loops.

      3' Dangle:
      A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U 
      A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U 
      0    0    0    8    0    0   17    0    0   11    0    8    7    0    7    0 A
      0    0    0    5    0    0    8    0    0    4    0    5    1    0    1    0 C
      0    0    0    8    0    0   17    0    0   13    0    8    7    0    7    0 G
      0    0    0    6    0    0   12    0    0    6    0    6    1    0    1    0 U
      

      Loop length costs: This table holds the length depended cost for hairpin, bulge, and internal loops. The head line must be followed by a line telling how many lines should be read.

      Loop length costs:
      30 lines
      # Size Hairpin Bulge Internal
        1     -57     -38    -17
        2     -57     -28    -17
        3     -57     -32    -17
        4     -56     -36    -17
      .
      .
      

      Miscellaneous: This element holds a list of parameters.

      • Gap_open. This is the gap opening cost.
      • Elongation_bonus. This is the cost of continuing a gap.
      • Multibranchloop. This is the cost of closing a multibranched loop with a base-pair.
      • Multibranchloop_helix. The cost of adding an extra stem to a multibranched loop.
      • Multibranchloop_nucleotide. The cost of adding an extra unpaired nucleotide in a multibranched loop.
      • Multibranchloop_non_GC_stem_end. An extra cost added when a stem ends with a base-pair which is not GC. The values in the Hairpin close and Internal loop matrices are assumed to have already been corrected. This value is therefore only used in multibranched loops and bulges with a length longer than one nucleotide.
      • Asymmetric_cost. The cost of asymmetric internal loops.
      • Asymmetric_cost_limit. The maximum assymmetric cost.
      • Long_hairpin_loop_factor. Used to estimate hairpin Loop length costs not included in the table. Only used for lengths above 30.
      • Long_bulge_loop_factor.Used to estimate bulge Loop length costs not included in the table. Only used for lengths above 30.
      • Long_Internal_loop_factor.Used to estimate internal loop Loop length costs not included in the table. Only used for lengths above 30.

      Base-pair: This matrix indicates which nucleotides base-pair.

      - A C G U 
      0 0 0 0 0 -
      0 0 0 0 1 A
      0 0 0 1 0 C
      0 0 1 0 1 G
      0 1 0 1 0 U
      

      Base-pair substitution: The cost of substituting a base-pair in one sequence with a base-pair in the other sequence.

      Base-pair substitution:
      A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U
      A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A A
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A C
      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A G
      0    0    0   11    0    0    3    0    0    5    0    0    2    0   -3    0 A U
      .
      .
      

      Single strand substitution: The cost of substituting unpaired nucleotides.

      Single strand substitution:
          A     C     G     U 
         19   -22   -18   -19 A
        -22    11   -25   -15 C
        -18   -25     9   -20 G
        -19   -15   -20    13 U
      

      1. Score matrix examples
      2. A score matrix which only changes the gap penalties, would look like this:
        # Changing the gap penalties.
        
        Miscellaneous:
        Gap_open:                        -80
        Elongation_bonus:                -40
        

        A score matrix which have no single strand substitution cost, would look like this:

        # No single strand substitution cost.
        
        Single strand substitution:
            A     C     G     U 
            0     0     0     0 A
            0     0     0     0 C
            0     0     0     0 G
            0     0     0     0 U
        

        A score matrix combining several elements would look like this (no sequence similarity cost):

        # Scorematrix format
        # X-axis: i & j. Y-axis k & l
        Base-pair substitution:
        A    A    A    A    C    C    C    C    G    G    G    G    U    U    U    U
        A    C    G    U    A    C    G    U    A    C    G    U    A    C    G    U
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A A
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A C
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A G
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 A U
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 C A
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 C C
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 C G
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 C U
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 G A
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 G C
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 G G
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 G U
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 U A
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 U C
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 U G
        0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 U U
        
        # Initmatrix format
        # X-axis i. Y-axis j
        Single strand substitution:
            A     C     G     U
            0     0     0     0 A
            0     0     0     0 C
            0     0     0     0 G
            0     0     0     0 U