DragonFly On-Line Manual Pages

RECOLL.CONF(5)           DragonFly File Formats Manual          RECOLL.CONF(5)

NAME
       recoll.conf - main personal configuration file for Recoll

DESCRIPTION
       This file defines the index configuration for the Recoll full-text
       search system.

       The system-wide configuration file is normally located inside
       /usr/[local]/share/recoll/examples. Any parameter set in the common
       file may be overridden by setting it in the personal configuration
       file, by default: $HOME/.recoll/recoll.conf

       Please note while we try to keep this manual page reasonably up to
       date, it will frequently lag the current state of the software. The
       best source of information about the configuration are the comments in
       the system-wide configuration file.

       A short extract of the file might look as follows:

              # Space-separated list of directories to index.
              topdirs =  ~/docs /usr/share/doc

              [~/somedirectory-with-utf8-txt-files]
              defaultcharset = utf-8

       There are three kinds of lines:

              o      Comment or empty

              o      Parameter affectation

              o      Section definition

       Empty lines or lines beginning with # are ignored.

       Affectation lines are in the form 'name = value'.

       Section lines allow redefining a parameter for a directory subtree.
       Some of the parameters used for indexing are looked up hierarchically
       from the more to the less specific. Not all parameters can be
       meaningfully redefined, this is specified for each in the next section.

       The tilde character (~) is expanded in file names to the name of the
       user's home directory.

       Where values are lists, white space is used for separation, and
       elements with embedded spaces can be quoted with double-quotes.

OPTIONS
       topdirs = directories
              Specifies the list of directories to index (recursively).

       skippedNames = patterns
              A space-separated list of patterns for names of files or
              directories that should be completely ignored. The list defined
              in the default file is:

              *~ #* bin CVS  Cache caughtspam  tmp

              The list can be redefined for subdirectories, but is only
              actually changed for the top level ones in topdirs

       skippedPaths = patterns
              A space-separated list of patterns for paths the indexer should
              not descend into. Together with topdirs, this allows pruning the
              indexed tree to one's content.  daemSkippedPaths can be used to
              define a specific value for the real time indexing monitor.

       skippedPathsFnmPathname = 0/1
              The values in the *skippedPaths variables are matched by default
              with fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR
              flags. This means that '/' characters must be matched
              explicitly. You can set skippedPathsFnmPathname to 0 to disable
              the use of FNM_PATHNAME (meaning that /*/dir3 will match
              /dir1/dir2/dir3).

       followLinks = boolean
              Specifies if the indexer should follow symbolic links while
              walking the file tree. The default is to ignore symbolic links
              to avoid multiple indexing of linked files. No effort is made to
              avoid duplication when this option is set to true. This option
              can be set individually for each of the topdirs members by using
              sections. It can not be changed below the topdirs level.

       indexedmimetypes = list
              Recoll normally indexes any file which it knows how to read.
              This list lets you restrict the indexed mime types to what you
              specify. If the variable is unspecified or the list empty (the
              default), all supported types are processed.

       compressedfilemaxkbs = value
              Size limit for compressed (.gz or .bz2) files. These need to be
              decompressed in a temporary directory for identification, which
              can be very wasteful if 'uninteresting' big compressed files are
              present.  Negative means no limit, 0 means no processing of any
              compressed file. Defaults to -1.

       textfilemaxmbs = value
              Maximum size for text files. Very big text files are often
              uninteresting logs. Set to -1 to disable (default 20MB).

       textfilepagekbs = value
              If this is set to other than -1, text files will be indexed as
              multiple documents of the given page size. This may be useful if
              you do want to index very big text files as it will both reduce
              memory usage at index time and help with loading data to the
              preview window. A size of a few megabytes would seem reasonable
              (default: 1000 : 1MB).

       membermaxkbs = value in kilobytes
              This defines the maximum size for an archive member (zip, tar or
              rar at the moment). Bigger entries will be skipped. Current
              default: 50000 (50 MB).

       indexallfilenames = boolean
              Recoll indexes file names into a special section of the database
              to allow specific file names searches using wild cards. This
              parameter decides if file name indexing is performed only for
              files with mime types that would qualify them for full text
              indexing, or for all files inside the selected subtrees,
              independent of mime type.

       usesystemfilecommand = boolean
              Decide if we use the file -i system command as a final step for
              determining the mime type for a file (the main procedure uses
              suffix associations as defined in the mimemap file). This can be
              useful for files with suffixless names, but it will also cause
              the indexing of many bogus "text" files.

       processbeaglequeue = 0/1
              If this is set, process the directory where Beagle Web browser
              plugins copy visited pages for indexing. Of course, Beagle MUST
              NOT be running, else things will behave strangely.

       beaglequeuedir = directorypath
              The path to the Beagle indexing queue. This is hard-coded in the
              Beagle plugin as ~/.beagle/ToIndex so there should be no need to
              change it.

       indexStripChars = 0/1
              Decide if we strip characters of diacritics and convert them to
              lower-case before terms are indexed. If we don't, searches
              sensitive to case and diacritics can be performed, but the index
              will be bigger, and some marginal weirdness may sometimes occur.
              The default is a stripped index (indexStripChars = 1) for now.
              When using multiple indexes for a search, this parameter must be
              defined identically for all. Changing the value implies an index
              reset.

       maxTermExpand = value
              Maximum expansion count for a single term (e.g.: when using
              wildcards). The default of 10000 is reasonable and will avoid
              queries that appear frozen while the engine is walking the term
              list.

       maxXapianClauses = value
              Maximum number of elementary clauses we can add to a single
              Xapian query. In some cases, the result of term expansion can be
              multiplicative, and we want to avoid using excessive memory. The
              default of 100 000 should be both high enough in most cases and
              compatible with current typical hardware configurations.

       nonumbers = 0/1
              If this set to true, no terms will be generated for numbers. For
              example "123", "1.5e6", 192.168.1.4, would not be indexed
              ("value123" would still be). Numbers are often quite interesting
              to search for, and this should probably not be set except for
              special situations, ie, scientific documents with huge amounts
              of numbers in them. This can only be set for a whole index, not
              for a subtree.

       nocjk = boolean
              If this set to true, specific east asian (Chinese Korean
              Japanese) characters/word splitting is turned off. This will
              save a small amount of cpu if you have no CJK documents. If your
              document base does include such text but you are not interested
              in searching it, setting nocjk may be a significant time and
              space saver.

       cjkngramlen = value
              This lets you adjust the size of n-grams used for indexing CJK
              text. The default value of 2 is probably appropriate in most
              cases. A value of 3 would allow more precision and efficiency on
              longer words, but the index will be approximately twice as
              large.

       indexstemminglanguages = languages
              A list of languages for which the stem expansion databases will
              be built. See recollindex(1) for possible values.

       defaultcharset = charset
              The name of the character set used for files that do not contain
              a character set definition (ie: plain text files). This can be
              redefined for any subdirectory.

       unac_except_trans = list of utf-8 groups
              This is a list of characters, encoded in UTF-8, which should be
              handled specially when converting text to unaccented lowercase.
              For example, in Swedish, the letter "a with diaeresis" has full
              alphabet citizenship and should not be turned into an a.
              Each element in the space-separated list has the special
              character as first element and the translation following. The
              handling of both the lowercase and upper-case versions of a
              character should be specified, as appartenance to the list will
              turn-off both standard accent and case processing.
              Note that the translation is not limited to a single character.
              This parameter cannot be redefined for subdirectories, it is
              global, because there is no way to do otherwise when querying.
              If you have document sets which would need different values, you
              will have to index and query them separately.

       maildefcharset = charactersetname
              This can be used to define the default character set
              specifically for email messages which don't specify it. This is
              mainly useful for readpst (libpst) dumps, which are utf-8 but do
              not say so.

       localfields = fieldname = value:...
              This allows setting fields for all documents under a given
              directory. Typical usage would be to set an "rclaptg" field, to
              be used in mimeview to select a specific viewer. If several
              fields are to be set, they should be separated with a colon
              (':') character (which there is currently no way to escape). Ie:
              localfields= rclaptg=gnus:other = val, then select specifier
              viewer with mimetype|tag=... in mimeview.

       dbdir = directory
              The name of the Xapian database directory. It will be created if
              needed when the database is initialized. If this is not an
              absolute pathname, it will be taken relative to the
              configuration directory.

       idxstatusfile = file path
              The name of the scratch file where the indexer process updates
              its status. Default: idxstatus.txt inside the configuration
              directory.

       maxfsoccuppc = percentnumber
              Maximum file system occupation before we stop indexing. The
              value is a percentage, corresponding to what the "Capacity" df
              output column shows.  The default value is 0, meaning no
              checking.

       mboxcachedir = directory path
              The directory where mbox message offsets cache files are held.
              This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
              to share a directory between different configurations.

       mboxcacheminmbs = value in megabytes
              The minimum mbox file size over which we cache the offsets.
              There is really no sense in caching offsets for small files. The
              default is 5 MB.

       webcachedir = directory path
              This is only used by the Beagle web browser plugin indexing
              code, and defines where the cache for visited pages will live.
              Default: $RECOLL_CONFDIR/webcache

       webcachemaxmbs = value in megabytes
              This is only used by the Beagle web browser plugin indexing
              code, and defines the maximum size for the web page cache.
              Default: 40 MB.

       idxflushmb = megabytes
              Threshold (megabytes of new text data) where we flush from
              memory to disk index. Setting this can help control memory
              usage. A value of 0 means no explicit flushing, letting Xapian
              use its own default, which is flushing every 10000 documents (or
              XAPIAN_FLUSH_THRESHOLD), meaning that memory usage depends on
              average document size. The default value is 10.

       autodiacsens = 0/1
              IF the index is not stripped, decide if we automatically trigger
              diacritics sensitivity if the search term has accented
              characters (not in unac_except_trans). Else you need to use the
              query language and the D modifier to specify diacritics
              sensitivity. Default is no.

       autocasesens = 0/1
              IF the index is not stripped, decide if we automatically trigger
              character case sensitivity if the search term has upper-case
              characters in any but the first position. Else you need to use
              the query language and the C modifier to specify character-case
              sensitivity. Default is yes.

       loglevel = value
              Verbosity level for recoll and recollindex. A value of 4 lists
              quite a lot of debug/information messages. 3 lists only errors.
              daemloglevel can be used to specify a different value for the
              real-time indexing daemon.

       logfilename = file
              Where should the messages go. 'stderr' can be used as a special
              value.  daemlogfilename can be used to specify a different value
              for the real-time indexing daemon.

       mondelaypatterns = list of patterns
              This allows specify wildcard path patterns (processed with
              fnmatch(3) with 0 flag), to match files which change too often
              and for which a delay should be observed before re-indexing.
              This is a space-separated list, each entry being a pattern and a
              time in seconds, separated by a colon. You can use double quotes
              if a path entry contains white space. Example:

              mondelaypatterns = *.log:20 "this one has spaces*:10"

       monixinterval = value in seconds
              Minimum interval (seconds) for processing the indexing queue.
              The real time monitor does not process each event when it comes
              in, but will wait this time for the queue to accumulate to
              diminish overhead and in order to aggregate multiple events to
              the same file. Default 30 S.

       monauxinterval = value in seconds
              Period (in seconds) at which the real time monitor will
              regenerate the auxiliary databases (spelling, stemming) if
              needed. The default is one hour.

       monioniceclass, monioniceclassdata
              These allow defining the ionice class and data used by the
              indexer (default class 3, no data).

       filtermaxseconds = value in seconds
              Maximum filter execution time, after which it is aborted. Some
              postscript programs just loop...

       filtersdir = directory
              A directory to search for the external filter scripts used to
              index some types of files. The value should not be changed,
              except if you want to modify one of the default scripts. The
              value can be redefined for any subdirectory.

       iconsdir = directory
              The name of the directory where recoll result list icons are
              stored. You can change this if you want different images.

       idxabsmlen = value
              Recoll stores an abstract for each indexed file inside the
              database. The text can come from an actual 'abstract' section in
              the document or will just be the beginning of the document. It
              is stored in the index so that it can be displayed inside the
              result lists without decoding the original file. The idxabsmlen
              parameter defines the size of the stored abstract. The default
              value is 250 bytes.  The search interface gives you the choice
              to display this stored text or a synthetic abstract built by
              extracting text around the search terms. If you always prefer
              the synthetic abstract, you can reduce this value and save a
              little space.

       aspellLanguage = lang
              Language definitions to use when creating the aspell dictionary.
              The value must match a set of aspell language definition files.
              You can type "aspell config" to see where these are installed
              (look for data-dir). The default if the variable is not set is
              to use your desktop national language environment to guess the
              value.

       noaspell = boolean
              If this is set, the aspell dictionary generation is turned off.
              Useful for cases where you don't need the functionality or when
              it is unusable because aspell crashes during dictionary
              generation.

       mhmboxquirks = flags
              This allows definining location-related quirks for the mailbox
              handler. Currently only the tbird flag is defined, and it should
              be set for directories which hold Thunderbird data, as their
              folder format is weird.

SEE ALSO
       recollindex(1) recoll(1)

                               14 November 2012                 RECOLL.CONF(5)