GETSEQS
Section: User Commands (1) Updated: September 2000 Index
Return to Main Contents
NAME
getseqs - makes BLAST searches and GenBank over the internet, and
make data subject to further refinement
SYNOPSIS
- getseqs
-
[options] [file]
DESCRIPTION
getseqs
can for a set of sequences getseqs perform BLAST and GenBank retrieval
over the internet to obtain a raw core of sequences that can automatically
be refined, by discarding already known hits, and apply programs such
as align0 and qrna.
getseqs
requires installation of
- BLAST, either a local version of blast or the netblast
program blastcl3
- lynx (works with version 2.8.3dev.9).
- blast2col which is a part of this package.
- extendlist which is specifically designed as part of this program.
- align0 from the Pearson FASTA package. (This is only needed
you wish to use align0 as part of the refinement.)
- qrna (by Rivas and Eddy) to search for rna structure. As
this program is not yet public available, this option is
suppressed for time the being. (This is only needed you wish to use
qrna as part of the refinement.)
OPTIONS
getseqs
accepts the following options.
- -nseq <number>
-
Makes the blast search using nseq sequences at the time. Default is 25.
- -colformat
-
- -col
-
Read sequences column (col) format instead of fasta format.
Default is fasta format.
- -runname <string>
-
The name of temporary data dir. By default it combines date,
time and process id, to create a unique identifier. If "runname"
exists the extension of time will be added prior to making new
"runname" dir. Retrieved GenBank entries are stored in the file
(in runname dir) entries.gb. All used fasta files are stored in the
subdir fasta.
- -blast <'string'>
-
The blast commandline execution. Default is 'blastcl3 -p blastn -d nr'. Data is
piped to this command. Note that even the netblast execution, blastcl3, can be
replaced with your local version of blast and even a complete path to that
executable.This program must be installed locally in order to be used by
getseqs. The results of the blast search is stored in runname dir as
search.blast.
- -align0 <'string'>
-
The align0 commandline executable. Default is 'align0'. To turn align0 usage
off, use -align0 ''. The command is executed on query and subject data files.
This program must be installed locally in order to be used by getseqs.
The output of align0 is stored in runname dir as align0.out.
- -qrna <'string'>
-
The qrna commandline execution. Default is -qrna '', that
the program is not used by default. (se man page for details).
To turn qrna usage off, use -qrna ''. This program must be
installed locally in order to be used by getseqs.
The output of qrna is stored in runname dir as qrna.out.
- -alength <number>
-
Filter the blast search by minimum allowable alignment length.
Default is zero.
- -discgb <file>
-
File containing the list of (GenBank) entries to be discarded
from the blast search. The file search.blast.col contains only
the filtered hits, in column format.
- -crange <number>
-
The sequence context range to extend GenBank hit with. The
extension is in both directions. Default is 100, but its
recommend that size is of the size of the search sequence.
- -help
-
- -
-
prints this list.
EXAMPLES
To search BLAST file foo.fasta against GenBank, and discard the hits of
already the known hits in file foo.discard. Realign query and hit by using
align0. Dump all data in dir foodatadir
-
getseqs.awk -runname foodatadir -discgb foo.discard foo.fasta
To extend the region in GenBank hit when realign with align0. Extend the
region to 200 in both directions of the sequence.
-
getseqs.awk -runname foodatadir -discgb foo.discard -crange 200 foo.fasta
To avoid doing align0 realign:
-
getseqs.awk -runname foodatadir -discgb foo.discard -align0 '' foo.fasta
('' is two single ')
BUGS
Report bugs to col-bugs@bioinf.au.dk.
AUTHOR
Bjarne Knudsen (bk@daimi.au.dk)
SEE ALSO
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- EXAMPLES
-
- BUGS
-
- AUTHOR
-
- SEE ALSO
-
Comments, questions, etc., email
gorodkin@rth.dk.
|