This is the README file for FoldalignM version 1.0

CONTENTS

1.  LEGAL NOTICE
2.  CITATION
3.  WEBSITE
4.  REQUIREMENTS
5.  INSTALLATION
6.  FOLDALIGN USAGE SUMMARY
7.  DIRECTORIES
8.  EXAMPLES
9.  TROUBLESHOOTING


-------------------------------------------------------------------------------
1. LEGAL NOTICE
===============

/******************************************************************************
*                                                                             *
*   Copyright 2006 Elfar Torarinsson, elfar7@gmail.com	                      *
*                                                                             *
*   This file is part of FoldalignM                                           *
*                                                                             *
*   FoldalignM is free software; you can redistribute it and/or modify        *
*   it under the terms of the GNU General Public License as published by      *
*   the Free Software Foundation; either version 2 of the License, or         *
*   (at your option) any later version.                                       *
*                                                                             *
*   FoldalignM is distributed in the hope that it will be useful,             *
*   but WITHOUT ANY WARRANTY; without even the implied warranty of            *
*   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the             *
*   GNU General Public License for more details.                              *
*                                                                             *
*   You should have received a copy of the GNU General Public License         *
*   along with FoldalignM; if not, write to the Free Software                 *
*   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA *
*                                                                             *
******************************************************************************/



-------------------------------------------------------------------------------
2. CITATION:
============

E. Torarinsson, J.H. Havgaard and J. Gorodkin
Multiple structural alignment and clustering of RNA sequences
In preparation



-------------------------------------------------------------------------------
3. WEBSITE:
===========

http://foldalign.kvl.dk



4. REQUIREMENTS:
================

 - Perl
 - Java runtime environment
 - The binaries for Foldalign and RNAfold are in the src directory, but if their
   compilations do not work on your architecture, you need the following:
 - Foldalign installed and the bin directory in your path
   (http://foldalign.kvl.dk)
 - Vienna package installed and in your path (actually you only need RNAfold)
   (http://www.tbi.univie.ac.at/~ivo/RNA/)



-------------------------------------------------------------------------------
5. INSTALLATION:
================

Unpack the tar file: 
  - tar -zxvf FoldalignM.tar.gz

Add the following to your CLASSPATH and your PATH:
  - PATH_TO_FoldalignM
  - PATH_TO_FoldalignM/src

  In bash you simply add the following to your .bashrc file:
    - export CLASSPATH=${CLASSPATH}:/PATH_TO_FoldalignM:/PATH_TO_FoldalignM/src:
    - export PATH=${PATH}:/PATH_TO_FoldalignM:/PATH_TO_FoldalignM/src:

  In csh you simply add the following to your .cshrc file:
    - setenv CLASSPATH ${CLASSPATH}:/PATH_TO_FoldalignM:/PATH_TO_FoldalignM/src:
    - setenv PATH ${PATH}:/PATH_TO_FoldalignM:/PATH_TO_FoldalignM/src:

Foldalign and RNAfold must be in your path, the rest should only be necessary if
you want to able to run the program other places than in the FoldalignM folder



-------------------------------------------------------------------------------
6. FOLDALIGNM USAGE SUMMARY:
============================


There are three different programs you can run:

  1. FoldalignM_McCaskill does multiple alignment using McCaskill's BP-matrices
    
    - Usage: java FoldalignM_McCaskill [-fast] [-delta value] [-gap value]
      [-seqw value] [no_pruning] [-nolog] [-col] <input> <output>
	
	-input is the the input file in FASTA format (required)
	-output is the output name (required)
	-fast: Fast is global alignment that uses more memory but is often
         faster, default is not to use this.
        -delta: The maximum allowed length difference between any given pair,
         default is max(10,length difference)
        -gap: The gap cost, default is -300
        -seqw: The score for a sequence match, default is 5
        -no_pruning: Turn off pruning of low scoring cells
        -nolog: Set this if you don't want to use log-odds score, default is to
         use them
	-col: Will also out the alignment in col format 

	-The output files will be written to the folder .fold_out which is made
         in the location where the program is called. Three files are written to
         .fold_out/; output.original.out and output.refined.out wich contain
         the initial and the re-defined alignments, and outname.prob which
         contains the base-pair probability matrix which was used to make the
         re-defined alignments Other .fold_ folders are also made and contain
         various intermediate informations.


  2. FoldalignM_Foldalign.pl does multiple alignment using Foldalign BP-matrices

    - Usage : FoldalignM_Foldalign.pl <-f fasta file> <-o output name> 
 
     -Optional arguments

      [-c <on|off>] perform clustering (default off) 
      [-i <existing foldalign output>]
      [-s <score cutoff for clustering>] (default 100)
      [-n] Turn off pruning of low scoring cells (default pruning on)
      [-x <java heap size in MB>] (default 500)
      [-l <on|off> also output column format (default off)


	-f the input file in fasta format (required)
	-o the output name (required)
	-c if the program should try to cluster the sequences
	-i if you have already ran Foldalign all-against-all you can give it as
           an input and avoid the time consuming process of doing that again
	-s score cutoff to use with the clustering
	-x java heap size, if you receive out of memory error, try to increase
	   this number to be close to the amount of RAM you have. Most new
	   computers have generally 512 MB (default) but if you have a computer
	   with more RAM you can increase this number to f.ex. 1000

	-As for FoldalignM_McCaskill the results are written to .fold_out/. If
	 clustering is on ".CLUSTER NUMBER" is added to the output name for each
	 cluster made. In addition a file called out output.cluster.info, which
	 contains information about the clusters, is written to .fold_out/


  3. AlignToStructure aligns a set of sequences to a given sequence and structure

    - Usage: java AlignToStructure <fasta file> <outname> <consensus>

	-fasta file is the file with the sequences you want to align
	-outname is the name of the output
	-consensus is a file with a given sequence and structure, to which the
         set of sequences from the fasta file will be aligned. The format is one
	 line with sequence, use A,G,C,U,R,Y,N, then line break and one line with
	 structure, using () to indicate base pairs and . for unpaired, i.e. a
	 hairpin is (((....)))

	-As before the results are written to .fold_out/. An example of a
	 consensus file can be found in examples/tRNA.consensus



-------------------------------------------------------------------------------
7. DIRECTORIES:
===============

  src              contains most of the source code for FoldalignM.
  options          contains source code for the options
  examples         contains example input files
  .fold_out        contains the output alignments
  .fold_matrix     contains the base pairing matrices
  .fold_rnafold    contains the rnafold results
  .fold_cons  	   contains the consensus base pair matrices
  .fold_foldalign  contains the foldalign results



-------------------------------------------------------------------------------
8. EXAMPLES:
============

FoldalignM_McCaskill:

  - java FoldalignM_McCaskill examples/tRNA.fa tRNA


FoldalignM_Foldalign:

  - FoldalignM_Foldalign -f examples/tRNA.fa -o tRNA

With clustering:

  - FoldalignM_Foldalign -f examples/clusterTest.fa -o cluster -c on
  
  NB! This takes ~15-30 min (depending on the computer), you can also avoid the
      time consuming Foldalign all-against-all step providing the Foldalign
      output by running:

  - FoldalignM_Foldalign -f examples/clusterTest.fa -o cluster -c on -i examples/clusterTest.gz


AlignToStructure:

  - java AlignToStructure examples/tRNA.fa tRNA examples/tRNA.consensus



-------------------------------------------------------------------------------
9. TROUBLESHOOTING:
===================

NB! Especially when there is large difference in lengths between some of the
sequences FoldalignM can be quite memory expensive. For the Java programs, if
you receive "Out of memory" error try to increase this with the argument -Xmx
and the approx. the amount of RAM you have, f.ex.

  - java FoldalignM_McCaskill -Xmx500m examples/tRNA.fa tRNA  #for 500MB or
  - java FoldalignM_McCaskill -Xmx1000m examples/tRNA.fa tRNA  #for 1000MB

The perl script also calls java and you can pass the same paramter via the perl
script using -x


If the program exits with the message:

  - "Found no good global alignment!"

Then the alignment has been pruned away. If you want to get an alignment anyway
then use the -no_pruning option with the java program or -n for the perl scripts.
