/***********************************************************
  RIsearch v 1.0   --   RNA-RNA interaction search

  Copyright 2012 Anne Wenzel <wenzel@rth.dk>

  This file is part of RIsearch.

  RIsearch is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  RIsearch is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with RIsearch, see file COPYING.
  If not, see <http://www.gnu.org/licenses/>.

***********************************************************/



=== HOWTO: compile & run ===

$ make RISEARCH
$ ./RIsearch -q query.fa -t target.fa

query and target may contain several sequences, RIsearch will scan all vs. all
Alternatively single sequences can be given directly on commandline with
-Q acgu -T acgu
It's also possible to pass the target sequence(s) from commandline, e.g.:
$ gunzip -c huge.fasta.gz | ./RISa-0-2 -q query.fa -t -

This way RIsearch will only report the highest scoring interaction (per
query-target pair).
To include suboptimals, set some threshold score s, i.e.:
$ ./RIsearch -q query.fa -t target.fa -s 2400
NB, threshold on score not energy (subject to change).
The first hit is the 'best' one from before, followed by the suboptimals
(includes the top hit again).
Here, we first get subsets of the top hit at, followed by some alternative
duplexes (and subsets).

With the additional flag -n 20 these spurious hits are avoided 
(only report the highest-scoring hit from a neighborhood of 20).
$ ./RIsearch -q query.fa -t target.fa -s 2400 -n 20
This will report the 4 interactions that are also given in TarBase for the
interaction of human let-7e with the 3' UTR of SMC1L1.

-d sets the per-nucleotide extension penalty in dacal/mol (30 seems to be a good
value), which favors short stable interactions. This is especially important if
larger query sequences are used and one does not expect the whole sequence to be
part of the interaction.

-p1 gives a shorter output (one line per interaction), the 'top hit' is always
first and contained a second time in the list.

-p2 does not produce a header per query/target pair, but instead repeats sequence
names on each line, for an easy to parse tsv file with the following order:
'Qname Qbeg Qend Tname Tbeg Tend score energy' (Q for query, T for target)
The 'top hit' is only printed once and not necessarily first.

-e energy threshold. Is checked after backtrack, only print interactions with 
energies lower than or equal to this.

With '-m t99' one can specify to use the scoring matrix based on Turner 1999
parameters (default is Turner 2004)



=== Citation ===

If you find this software useful for your research, please cite the following work:

Anne Wenzel, Erdinç Akbaşlı, and Jan Gorodkin.
RIsearch: fast RNA–RNA interaction search using a simplified nearest-neighbor energy model.
Bioinformatics (2012) 28(21): 2738–2746. 



=== Contact ===

wenzel@rth.dk


