DragonFly On-Line Manual Pages
SSEARCH(1) DragonFly General Commands Manual SSEARCH(1)
NAME
ssearch - scan a protein or DNA sequence library for similar sequences
SYNOPSIS
ssearch [-a -b # -d # -E # -f # -g # -h -i -l FASTLIBS -L -r STATFILE
-m # -O filename -Q -s SMATRIX -w # -z ] query-sequence-file library-
file
ssearch [-QabdEfghilmOrswz] query-file @library-name-file
ssearch [-QabdEfghilmOrswz] query-file "%PRMVI"
ssearch [-aEfghilmrsw] - interactive mode
DESCRIPTION
ssearch compares a protein or DNA sequence to all of the entries in a
sequence library using the rigorous Smith-Waterman algorithm (Smith and
Waterman, J. Mol. Biol. (1983) 147:195-197. For example, ssearch can
compare a protein sequence to all of the sequences in the NBRF PIR
protein sequence database. ssearch will automatically decide whether
the query sequence is DNA or protein by reading the query sequence as
protein and determining whether the `amino-acid composition' is more
than 85% A+C+G+T. The program can be invoked either with command line
arguments or in interactive mode. ssearch compares a query sequence to
a sequence library which consists of sequence data interspersed with
comments, see below. The fasta programs, including ssearch, use a
standard text format sequence file. Lines beginning with or lower
case, blanks,tabs and unrecognizable characters are ignored. ssearch
expects sequences to use the single letter amino acid codes, see
protcodes(1) . Library files for ssearch should have the form shown
below.
OPTIONS
ssearch can be directed to change the scoring matrix, search
parameters, output format, and default search directories by entering
options on the command line (preceeded by a `-'). All of the options
should preceed the file name and ktup arguments). Alternately, these
options can be changed by setting environment variables. The options
and environment variables are:
-a (SHOWALL) Modifies the display of the two sequences in
alignments. Normally, both sequences are shown only where they
overlap (SHOWALL=0); If -a or the environment variable SHOWALL =
1, both sequences are shown in their entirety.
-b # The number of similarity scores to be shown when the -Q option
is used. This value is usually calculated based on the actual
scores.
-d # The number of alignments to be shown. Normally, ssearch shows
the same number of alignments as similarity scores. By using
ssearch -Q -b 200 -d 50, one would see the top scoring 200
sequences and alignments for the 50 best scores.
-E # The expectation value threshold for displaying similarity scores
and sequence alignments. fasta -Q -E 2.0 would show all library
sequences with scores expected to occur no more than 2 times by
chance in a search of the library.
-f # Penalty for the first residue in a gap (-12 by default).
-g # Penalty for additional residues in a gap (-2 by default).
-h Do not display histogram of similarity scores.
-l file
(FASTLIBS) The name of the library menu file. Normally this
will be determined by the environment variable FASTLIBS.
However, a library menu file can also be specified with -l.
-L display more information about the library sequence in the
alignment.
-m # (MARKX) =0,1,2,3. Alternate display of matches and mismatches in
alignments. MARKX=0 uses ":","."," ", for identities,
consevative replacements, and non-conservative replacements,
respectively. MARKX=1 uses " ","x", and "X". MARKX=2 does not
show the second sequence, but uses the second alignment line to
display matches with a "." for identity, or with the mismatched
residue for mismatches. MARKX=2 is useful for aligning large
numbers of similar sequences. MARKX=3 writes out a file of
library sequences in FASTA format. MARKX=3 should always be
used with the "SHOWALL" (-a) option, but this does not
completely ensure that all of the sequences output will be
aligned.
-O filename
Sends copy of results to "filename".
-Q Quiet option. This allows ssearch to search a database and report
the results without asking any questions. ssearch -Q file
library > output can be put in the background or run at a later
time with the unix 'at' command. The number of similarity
scores and alignments displayed with the -Q option can be
modified with the -b (scores) and -d (alignments) options.
-r STATFILE Causes ssearch to write out the sequence identifier,
superfamily number (if available), and similarity scores to
STATFILE for every sequence in the library. These results are
not sorted.
-s str (SMATRIX) the filename of an alternative scoring matrix file.
For protein sequences, BLOSUM50 is used by default; PAM250 can
be used with the command line option -s 250.
-w # (LINLEN) output line length for sequence alignments. (normally
60, can be set up to 200).
-z Do not do statistical significance calculation.
EXAMPLES
(1) ssearch musplfm.aa $AABANK
Compare the amino acid sequence in the file musplfm.aa with the
complete PIR protein sequence library. This is extremely slow and
should almost never be done. ssearch is designed to search very small
libraries of sequences.
>LCBO bovine preprolactin
WILLLSQ ...
>LCHU human ...
...
(2) ssearch -a -w 80 musplfm.aa lcbo.aa
Compare the amino acid sequence in the file musplfm.aa with the
sequences in the file lcbo.aa using ktup = 1. Show both sequences in
their entirety, with 80 residues on each output line.
(3) ssearch
Run the ssearch program in interactive mode. The program will prompt
for the file name for the query sequence, list alternative libraries to
be seached (if FASTLIBS is set), and prompt for the ktup.
You can use your own sequence files for ssearch, just be certain to put
a '>' and comment as the first line before the sequence.
SEE ALSO
rss(1), align(1), fasta(1), rdf2(1),protcodes(5), dnacodes(5)
AUTHOR
Bill Pearson
wrp@virginia.EDU
local SSEARCH(1)