DragonFly On-Line Manual Pages
OMINDEX(1) User Commands OMINDEX(1)
NAME
omindex - Index static website data via the filesystem
SYNOPSIS
omindex [OPTIONS] --db DATABASE [BASEDIR] DIRECTORY
DESCRIPTION
omindex - Index static website data via the filesystem
DIRECTORY is the directory to start indexing from.
BASEDIR is the directory corresponding to URL (default: DIRECTORY).
OPTIONS
-d, --duplicates=ARG
set duplicate handling: ARG can be 'ignore' or 'replace'
(default: replace)
-p, --no-delete
skip the deletion of documents corresponding to deleted files
(--preserve-nonduplicates is a deprecated alias for --no-delete)
-e, --empty-docs=ARG
how to handle documents we extract no text from: ARG can be
index, warn (issue a diagnostic and index), or skip. (default:
warn)
-D, --db=DATABASE
path to database to use
-U, --url=URL
base url BASEDIR corresponds to (default: /)
-M, --mime-type=EXT:TYPE
assume any file with extension EXT has MIME Content-Type TYPE,
instead of using libmagic (empty TYPE removes any existing
mapping for EXT; other special TYPE values: 'ignore' and 'skip')
-G, --mime-type-match=GLOB:TYPE
assume any file with leaf name matching shell wildcard pattern
GLOB has MIME Content-Type TYPE (special TYPE values: 'ignore'
and 'skip')
-F, --filter=M[,[T][,C]]:CMD
process files with MIME Content-Type M using command CMD, which
produces output (on stdout or in a temporary file) with format T
(Content-Type or file extension; currently txt (default), html
or svg) in character encoding C (default: UTF-8). E.g.
-Fapplication/octet-stream:'strings -n8' or
-Ftext/x-foo,,utf-16:'foo2utf16 %f %t'
--read-filters=FILE
bulk-load --filter arguments from FILE, which should contain one
such argument per line (e.g. text/x-bar:bar2txt --utf8). Lines
starting with # are treated as comments and ignored.
-l, --depth-limit=LIMIT
set recursion limit (0 = unlimited)
-f, --follow
follow symbolic links
-i, --ignore-exclusions
ignore meta robots tags and similar exclusions
-S, --spelling
index data for spelling correction
-m, --max-size=N[SUFFIX]
maximum size of file to index (in bytes or with a suffix of
'K'/'k', 'M'/'m', 'G'/'g') (default: unlimited)
--sample=SOURCE
what to use for the stored sample of text for HTML documents -
SOURCE can be 'body' or 'description' (default: 'body')
-E, --sample-size=SIZE
maximum size for the document text sample (supports the same
formats as --max-size). (default: 512)
-T, --title-size=SIZE
maximum size for the document title (supports the same formats
as --max-size). (default: 128)
-R, --retry-failed
retry files which omindex failed to extract text from on a
previous run
--opendir-sleep=SECS
sleep for SECS seconds before opening each directory - sleeping
for 2 seconds seems to reliably work around problems with
indexing files on Microsoft DFS shares.
-C, --track-ctime
track each file's ctime so we can detect changes to ownership or
permissions.
--date-terms
ignored for forward compatibility with Omega 1.5.x.
--no-date-terms
don't index D, M and Y prefixed terms to support date range
filtering using terms (we now recommend using a value slot for
this instead).
-v, --verbose
show more information about what is happening
--overwrite
create the database anew (the default is to update if the
database already exists)
-s, --stemmer=LANG
set the stemming language (default: english). Possible values:
arabic armenian basque catalan danish dutch earlyenglish english
finnish french german german2 hungarian indonesian irish italian
kraaij_pohlmann lithuanian lovins nepali norwegian porter
portuguese romanian russian spanish swedish tamil turkish (pass
'none' to disable stemming)
-h, --help
display this help and exit
-V, --version
output version information and exit
Please report bugs at: https://xapian.org/bugs
xapian-omega 1.4.22 February 2023 OMINDEX(1)