DragonFly On-Line Manual Pages

SSEARCH(1)             DragonFly General Commands Manual            SSEARCH(1)

NAME
       ssearch - scan a protein or DNA sequence library for similar sequences

SYNOPSIS
       ssearch [-a -b # -d # -E # -f # -g # -h -i -l FASTLIBS  -L -r STATFILE
       -m # -O filename -Q -s SMATRIX -w # -z ] query-sequence-file library-
       file

       ssearch [-QabdEfghilmOrswz] query-file @library-name-file

       ssearch [-QabdEfghilmOrswz] query-file "%PRMVI"

       ssearch [-aEfghilmrsw] - interactive mode

DESCRIPTION
       ssearch compares a protein or DNA sequence to all of the entries in a
       sequence library using the rigorous Smith-Waterman algorithm (Smith and
       Waterman, J. Mol. Biol. (1983) 147:195-197.  For example, ssearch can
       compare a protein sequence to all of the sequences in the NBRF PIR
       protein sequence database.  ssearch will automatically decide whether
       the query sequence is DNA or protein by reading the query sequence as
       protein and determining whether the `amino-acid composition' is more
       than 85% A+C+G+T.  The program can be invoked either with command line
       arguments or in interactive mode.  ssearch compares a query sequence to
       a sequence library which consists of sequence data interspersed with
       comments, see below.  The fasta programs, including ssearch, use a
       standard text format sequence file.  Lines beginning with or lower
       case, blanks,tabs and unrecognizable characters are ignored.  ssearch
       expects sequences to use the single letter amino acid codes, see
       protcodes(1) .  Library files for ssearch should have the form shown
       below.

OPTIONS
       ssearch can be directed to change the scoring matrix, search
       parameters, output format, and default search directories by entering
       options on the command line (preceeded by a `-'). All of the options
       should preceed the file name and ktup arguments). Alternately, these
       options can be changed by setting environment variables.  The options
       and environment variables are:

       -a     (SHOWALL) Modifies the display of the two sequences in
              alignments. Normally, both sequences are shown only where they
              overlap (SHOWALL=0); If -a or the environment variable SHOWALL =
              1, both sequences are shown in their entirety.

       -b #   The number of similarity scores to be shown when the -Q option
              is used.  This value is usually calculated based on the actual
              scores.

       -d #   The number of alignments to be shown.  Normally, ssearch shows
              the same number of alignments as similarity scores.  By using
              ssearch -Q -b 200 -d 50, one would see the top scoring 200
              sequences and alignments for the 50 best scores.

       -E #   The expectation value threshold for displaying similarity scores
              and sequence alignments.  fasta -Q -E 2.0 would show all library
              sequences with scores expected to occur no more than 2 times by
              chance in a search of the library.

       -f #   Penalty for the first residue in a gap (-12 by default).

       -g #   Penalty for additional residues in a gap (-2 by default).

       -h     Do not display histogram of similarity scores.

       -l file
              (FASTLIBS) The name of the library menu file.  Normally this
              will be determined by the environment variable FASTLIBS.
              However, a library menu file can also be specified with -l.

       -L     display more information about the library sequence in the
              alignment.

       -m #   (MARKX) =0,1,2,3. Alternate display of matches and mismatches in
              alignments. MARKX=0 uses ":","."," ", for identities,
              consevative replacements, and non-conservative replacements,
              respectively. MARKX=1 uses " ","x", and "X".  MARKX=2 does not
              show the second sequence, but uses the second alignment line to
              display matches with a "."  for identity, or with the mismatched
              residue for mismatches.  MARKX=2 is useful for aligning large
              numbers of similar sequences.  MARKX=3 writes out a file of
              library sequences in FASTA format.  MARKX=3 should always be
              used with the "SHOWALL" (-a) option, but this does not
              completely ensure that all of the sequences output will be
              aligned.

       -O filename
              Sends copy of results to "filename".

       -Q Quiet option.  This allows ssearch to search a database and report
              the results without asking any questions. ssearch -Q file
              library > output can be put in the background or run at a later
              time with the unix 'at' command.  The number of similarity
              scores and alignments displayed with the -Q option can be
              modified with the -b (scores) and -d (alignments) options.

       -r     STATFILE Causes ssearch to write out the sequence identifier,
              superfamily number (if available), and similarity scores to
              STATFILE for every sequence in the library.  These results are
              not sorted.

       -s str (SMATRIX) the filename of an alternative scoring matrix file.
              For protein sequences, BLOSUM50 is used by default; PAM250 can
              be used with the command line option -s 250.

       -w #   (LINLEN) output line length for sequence alignments.  (normally
              60, can be set up to 200).

       -z     Do not do statistical significance calculation.

EXAMPLES
       (1)    ssearch musplfm.aa $AABANK

       Compare the amino acid sequence in the file musplfm.aa with the
       complete PIR protein sequence library.  This is extremely slow and
       should almost never be done.  ssearch is designed to search very small
       libraries of sequences.

            >LCBO bovine preprolactin
            WILLLSQ ...
            >LCHU human ...
            ...

       (2)    ssearch -a -w 80 musplfm.aa lcbo.aa

       Compare the amino acid sequence in the file musplfm.aa with the
       sequences in the file lcbo.aa using ktup = 1.  Show both sequences in
       their entirety, with 80 residues on each output line.

       (3)    ssearch

       Run the ssearch program in interactive mode.  The program will prompt
       for the file name for the query sequence, list alternative libraries to
       be seached (if FASTLIBS is set), and prompt for the ktup.

       You can use your own sequence files for ssearch, just be certain to put
       a '>' and comment as the first line before the sequence.

SEE ALSO
       rss(1), align(1), fasta(1), rdf2(1),protcodes(5), dnacodes(5)

AUTHOR
       Bill Pearson
       wrp@virginia.EDU

                                     local                          SSEARCH(1)